The Social Network in the Music Blogosphere

Posted: January 16th, 2010 | Author: David | Filed under: Computer Science | Tags: , , , , , | No Comments »

About this time last year, I was busy putting the finishing touches to a data harvesting program which would go off to the internet and grab posts from music blogs when notified of updates via a feed. The motivation was my MSc Computer Science project, at the time untitled, and without much of a plan or a direction to go in. I knew I wanted to do something related to music, and probably to do with recommendations, with a view to creating a hopefully fresh take at how content can be discovered from editorially subjective sources, rather than behavioural sources such as playlists.

Skip to September, and with the end-of-month deadline nearing, I’m putting the finishing touches to the project report, now titled: ‘The Social Network in the Music Blogosphere’. It’s an exploration of the relationships between blogs and the artists that they write about, using network theory, with an analysis of some subject discovery and classification methods. There’s a chunk of Python code in the appendices (I used a lot of NetworkX), as well as some subject-based clusterings represented as dendrograms. If that sounds like your bag, you can get your copy of my project report here:

Download PDF: The Social Network in the Music Blogosphere (3.6MB)

The network dataset of blog-artist relationships is also available:

blog-artist_network.tar.gz (301K)
blog-artist_network.zip (299K)

It contains the following files:

artists.txt: id, and normalized and denormalized versions of all artist names

ARTIST_ID  ARTIST_NAME  DENORM_ARTIST_NAME

blogs.txt: id and name of each blog

BLOG_ID  BLOG_NAME

blog-artist.txt: the edges between a blog and an artist, and the weighting given to the relationship

BLOG_ID  ARTIST_ID  WEIGHT


Leave a Reply