Shazam: the maths behind the magic

Education, Featured — By on March 14, 2013 20:10

We’ve all been there; that great tune you heard on the radio is now being used to advertise a phone company or training shoes, but you simply cannot remember who sings it or what it’s called, but you blimmin’ love it and ABSOLUTELY NEED to know the artist and title or your evening will be ruined.

For those of us with a certain type of smartphone, this is where Shazam comes in handy. This brilliant service ‘listens’ to the tune, then algorithmically deduces who is behind the music and the name that was given to it, then displays the results on your screen. It’s remarkable and marvellous, has been the making of many a dinner party, and, I am sure, has been used to successfully seduce members of the opposite sex!

But how does it actually work? Well, here is the original academic paper, written by Avery Li-Chun Wang, which contains this introduction:

“We have developed and commercially deployed a flexible audio search engine. The algorithm is noise and distortion resistant, computationally efficient, and massively scalable, capable of quickly identifying a short segment of music captured through a cellphone microphone in the presence of foreground voices and other dominant noise, and through voice codec compression, out of a database of over a million tracks. The algorithm uses a combinatorially hashed time-frequency constellation analysis of the audio, yielding unusual properties such as transparency, in which multiple tracks mixed together may each be identified. Furthermore, for applications such as radio monitoring, search times on the order of a few milliseconds per query are attained,
even on a massive music database.”

As we become used to newer tools such as Wolfram Alpha, which describes itself as a “computational knowledge engine”, and as Siri becomes a better “intelligent personal assistant and knowledge navigator”, we will all become somewhat better informed, with the world’s knowledge literally at our fingertips. But it’s probably Shazam that we’ll turn to if we’re trying to pull.


Tags: ,

1 Comment

  1. Charlie King says:

    I need all the help I can get when I’m on the pull. You’re suggesting I go into detail about the distortion resistant algorithms, right?

Leave a Comment