Cloud computing & song scrobbling on Last.fm

I'm extremely late to this whole Last.fm thing, but then again, I'm not sure if I've missed much.  From what I can tell, Last.fm is a music discovery site that collects and maintains your "musical foot stream" while allowing you to share your information with the community.  The more you produce, participate, and listen within the social network, the more you are rewarded with new music, friends, and community relationships.  So, in a nutshell, it's a music social network founded on a beefed-up Pandora-like jukebox.  I can't exactly say that I use all the social networking tools -especially considering how much time I already spend on Facebook- but I appreciate the effort.  Truthfully, the only thing I care to do is scrobble.  For those of you unaware of the verb, to scrobble means to automatically add the tracks you play to your Last.fm profile using a piece of software called a scrobbler.  In business terms, scrobbling turns a user's listening history into a commodity, and Last.fm's public API allows for other companies/users/developers to partner and create.

That said, what I appreciate most about Last.fm is how they overcome the technical challenge of collecting, harvesting, and computing data from Scrobbling users.  Just think about it, how in the hell does Last.fm collect 40 million scrobbles per day?  Can you just imagine the kind of computational power you need to collect all that music without crashing?  It kinda' reminds me of this time I worked on a high-profile Wordpress blog that went down in flames after 100,000 people tried to view it in a single hour.  LOL.  Those were the days when your company used two servers max to host their site.

Well, the way Last.fm manages 40 million unique users/mo, and 800 scrobbles per second are through cloud computing.  Cloud computing -at this point- shouldn't be a surprise to anyone but what I find fascinating is how the team is mixing and matching a variety of technologies including Hadoop (Link 2) (Link 4) and dumbo to provide site stats and metrics, charts, reporting, neighbors, recommendations, indexing, evaluation, and data insights *.

So what does this mean?  It means that for the web to continue supporting the current trend of free user-data, technology is having to -once again- change its trajectory and look at new distributed systems to meet the demand.