How does Shazam recognize a song in 3 seconds?

You just walked into a bar to get your favorite crook. And so far, everything's fine. But then part of a song you never heard, but after 15 seconds you are already imagining as a background of a moment of your life or training.

You want to know his nameBut people around don't know anything and you have neither time nor dignity to start shouting "WHEN WHAT CANCEONE IS?" between a cappuccino and the other.

So how do you do that?

Well, Shazam.

Open the app, press a button, wait two or three seconds and puff: title, artist, album... sometimes even text and links to listen to it everywhere. E I wonder every time, but how do you do that? And also this time... physical!

Spectacular how computer uses intelligently physics and Mathematics to solve everyday problems!

When Shazam was born (and because it was not an app at first)

Shazam was born in 1999 a London, with an idea as simple as crazy for the time... recognize a song by listening to a few seconds of audio.
Only... in 1999 no smartphones existed. So there was no "app", there was no "touch here", there was no "decent microphone", there was no "fast connection".

There were some practical problems:

background noise (bar, radio, people talking)
audio audio tablet and often I'm sorry.
hardware compared to today
slow nets

Plus there was a huge bond: had to work in seconds.

After years of development, in 2002 The first "real" service arrives. But it wasn't how we picture it today.

In the United Kingdom call a number (2580), make music listen on the phone and after a few seconds receive a SMS with title and artist.

And it was a choice genius!

Because at that time the phone needed only to call and send messages, point. Hardware components, such as CPUs and microphones, were embarrassing. The Internet was very slow (we talk about 9.6–14.4 kbps), you couldn't have traveled large amounts of data and demand immediate responses for thousands of users.

I mean, do it. Shazam "on the phone" was impractical with the constraints of the time.

So what did they think? Change paradigm, instead of building the software in the phone, we use the telephone only as microphone and the telephone line to communicate. Stop.

Everything else happened on some servers connected to the telephone network who received the audio stream, analyzed it and recognized the song. Then they sent a SMS with title and artist.

Simple, clear and linear but above all, it had to be something fast, scalable and robust to noise.

Obviously, there were some here too disadvantages. I server they had their own Cost, processing capacity it was limited and database could not be so big due to the costs to be incurred. In fact, for this reason the server could not save the audio and process it calmly on the call ended but had to work live.

His business model initial was based on payment for each use.

The brilliant idea behind Shazam

As said the database was limited to a given memory limit, we cannot afford to store all music in the world, it was necessary to choose what to include and what not.

So, what are the songs that users with more likely want to know the title? Probably more hit of the moment and the popular music. Well, these were the songs with greater priority that they had to be inside database.

Okay, so the server receive the audio stream of the call. But now the real problem: how do you recognize a song without comparing with "all the songs of the world" in a slow and heavy way? And above all, as I can maximise the number of tracks stored so you always find a result?

What was said was: "If we used one spectrogram?”

If we think about it, it’s an idea genius!

One spectrogram is a graph showing how change frequency of a signal in time, so instead of having a simple audio wave, you have a full map of what's going on inside the sound.

Don't remember the song as an audio. Store one of his signature signature, a fingerprint.

The secret sound map

As Shazam sees a song, describing it with a spectrogram

This in the image is exactly one spectrogram, on the board horizontal horizontal we have the time and the axis vertical le frequency of the song, while the color (or intensity) represents "how strong" frequency at that moment.

This thing, already in itself, is very powerful because transforms sound (which is a somewhat difficult wave to read) in a kind of visual map. In a feature.

How do you get it?

But how do you get this chart? You take the piece of audio and break it into small pieces over time. On each piece, a transformation called " appliesFourier transform". That says to you: "What frequencies are there in this piece of sound?". By repeating the process over time, you just get it spectrogram.

Okay, but the spectrogram is still too heavy.

At this point you could say: "Perfect, then Shazam compares spectrogram of the song that I am listening with the spectrogram in database". Yes, but... No..

Because a complete spectrogram is still so much stuff to store and compare on a huge scale.

Actually, from the spectrogram Not interested all the frequencies. We're interested in those. more obvious, more robust, more "recognizable" even if there is noise, the microphone is poor or the song feels in the distance.

So we extract the stronger peaks, the distinguishing points in time and frequency. And this already solves two problems in one:

reduces drastically the data
increases the resistance resistance to the noise noise

But there are still so many data to be stored. We need another idea. reduce and quickly compare two pieces of audio.

So what do we do?

And this is where thehash, the last mathematical ingredient we needed. A function that allows, given an input (lo spectrogram), of return a unique value and fast comparison and easily with other values. And with this we finally have adigital footprint of the track (or better, than that piece of song).

The whole engineering of the project was based on this way of thinking.

So what is the Shazam algorithm?

So what is the algorithm? Let's go through what I thought.

Shazam listens a few seconds of audio
generates spectrogram
takes only peaks stronger and "characteristics”
There they are. transform in hash (compared impromptu)
search those hash in database (matching)
when he finds enough consistent matches, he says, "That's it.”

And this is where you understand why Shazam It takes us very little, it is not "understanding" as a person but it is finding a signature… to compare, the one that generated from the audio and the one that has in memory in database associated with the song. Simple, clean and linear.

The arrival of smartphones

With the arrival of smartphones, Shazam explodes thanks to the app we all know today. It changes everything also thanks to the improvements introduced. Best microphones, fast internet and apps always available. Not to mention that the app is always super intuitive. Open it, tap and immediately the full-screen result.

But the thing that drives us crazy is basic concept was already ready from the early 2000s. The introduction of smartphones did not change the principle to the base, everything that was thought when it was designed... still exists today. Obviously, with an app you can implement and insert more features, but the concept always remained the same.

With Apple’s acquisition of 400 million in 2018, it made a big leap forward. Integrations with Spotify, Apple Music, YouTube and no more advertising.

What about today?

Today Shazam use theartificial intelligence to improve matching, ha database much larger and provides the ability to work offline, but the mechanism in this case would be: listen to the track in a sufficient amount of time to understand it, save the file and as soon as it connects you make a chat with the servers!

In the end, as a person would think but translated into the form of a sequence of steps... a algorithm but in seconds.

Tags: algorithm, how it works, fingerprinting, physics in everyday life, hashing, Shazam, signal processing, spectrogram, transformed by fourier

Share this article

Elio Magliari

Bye, are Helium. Work as software engineer.

I share what I find out about digital world, the questions it all I'm doing it. and ideas that help me to understand and Tell it more clearly.

Learn more