In recent years, several developments led to a change in the consumption and acquisition of music. Prices for mass media storage dropped, allowing people to keep a growing collection of music. The music industry's distribution channels finally reached the 21st century with commercial platforms like iTunes or the recent napster. In addition to these changes in commercial outlets illegal and semi-legal networks provided access to a still unknown number of media files. Broadband internet access led to a fast and nearly on-demand access.
The problem a user faces today is not so much the acquisition of music as the navigation of large collection of music. The manual generation of playlists can be tedious for a large collection and, often leaves out unfamilar tracks. Several online services promise help: Swarm logic is supposed to help a user navigate her music collection, pointing out tracks seldom listened too or offering similar songs. These services rely on similarity matches in the swarm and often lead to surprising results. A smooth and calm jazz song can be followed by a melodic death metal song.
When asked, people describe music in terms of "soft" attributes. In addition to rough genre categorization they use terms like "smooth", "calm", "hot", "melancholic" or "groovy". It's not so much the instrumentation or general genre of a track that people try to describe but the mood the song conveys, the ambience created by the song and the general feel of the music. Genres help in giving a rough outline of the different kinds of music, but lack the detail needed to convey the mood of the song. When asked to describe their favourite track, participants in a study used verbs to describe their activity while listening and adjectives to convey the “feel” of the song.
The aim of mood|box is to provide the user with an interface to a database categorizing music in terms of genre, style, moods and instruments.
mood|box uses a tangible interface where users can move physical objects on a table to influence what kind of music is played. The user is offered a range of objects that represent different aspects of music. If the user wants to listen to relaxed jazz with a warm tone she would arrange objects on the table that represent a warm tone, a relaxed mood and jazz as genre. These objects are recognized by the system and are used to generate a new playlist.
The system keeps track of the user's database of music. Metadata is acquired through the use of a spider that collects information from several sites on the web and stores them in the database backend. If the arrangement on the table changes, the system assembles and loads a new playlist into the player.
One of the main problems was (and is) the huge number of attributes used to describe music. Allmusic.com uses a range of 180 different moods to describe their music, last.fm's tag system provided nearly 12000 different tags for ~500 artist. To reduce the complexity of the system a semantical approach was used. Allmusic.com's moods were clustered based on synonyms provided by the Institut des Sciences Cognitives, Bron. After clustering the moods a group of 15 general moods was selected. The network of relationships was described using networkX, a python package for social networks. NetworkX exports dot files for graphviz
In different approach, one could use last.fm's tag data as the source, but the free and unconstrained use of tags makes it even harder to generate a sensible set of categories to describe music. A mix of allmusic.com and last.fm would be best - using the mood categories from allmusic and the tag data last.fm that describes music activites.
Instead of using the ISC Synonym Search I should have used Princeton's WordNet - a lexical database for the English language that provides semantic and hierarchical information.
The evaluation clearly showed that the general concept of the mood|box works for a wide range of people^Wcs-students. The acceptance and usability of the mood|box was rated very highly - even though the participants had different assumptions on how the playlist generation works. One of the main flaws in the evaluation test setup was the small subset of music used: There was a poor selection of music, limiting the possibility for playful discovery of the mood|box.
Users clearly understood the idea of moving objects around the table to influence playlist generation but the small subset of music made it hard to determine the logic used to select the music (concentric weights where applied, center was highest and peripheral lowest priority).
One common mistaken assumption was that the marker exclusively defines what kind of music is played. If a genre marker (say r&b) was on the table, users were irritated when multiple tracks of a different genre were played.
Evaluation participants suggested several uses of the mood|box. Most felt that the system would be ideal for small parties and get-togethers as well as commercial applications in shops or meeting rooms.
Bring more "Rock" Marker.
There is still a lot of unfinished work to do™:
mood|box is written in python, a high level scripting language Optical recognition is done by reactivision, tracking objects with the use of fiducial markers. Communication between reactivision and mood|box is handled via Open Sound Control messages, which are parsed by the python osc kit. Playlist management and playback is provided by mpd, the music player daemon: A gpl'ed network music player.
This site and most of the code was handcrafted using vim and kate. Image effects use lightbox, a script by Lokesh Dhakar.
mood|box was written as a course project for the chair of interactive media/virtual enviroment at Hamburg University. Idea and concept by Adrian Treusch von Buttlar (moodbox et netzkind.org).
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.0 Germany License.