Visualizing Structured Data About Music the k-pie network layout algorithm and applications to music data Kurt Jacobson Centre for Digital Music Queen Mary University of London Mile End, London, UK E1 4NS kurt.jacobson@elec.qmul.ac.uk ABSTRACT In an effort to move towards intuitive visual interfaces for faceted browsing of structured data about music, we develop a visualization technique called k-pie. Derived from a network visualization technique know as k-cores decomposition, k-pie layout accounts for the semantic labels or colors associated with each vertex. Vertices of a graph are arranged in a 2 dimensional circle where slices in the circle correspond to a specific vertex label and the most connected vertices are found in the center of the visualization. We describe the k-pie algorithm and demonstrate how it can be useful in the context of Semantic Web technologies. Keywords k-pie layout, k-cores decomposition, complex networks, Semantic Web, music, visualization 1. INTRODUCTION Just as music is an important part of every culture, music is an important part of on-line culture. With the ubiquity of digital music on-line, we see an opportunity to realize the promise of Semantic Web technologies and bring exciting and innovative music experiences to the end user. Significant headway has already been made in developing ontologies for the music domain [1], re-describing on-line music-related data in a structured way [2], and automatically inter-linking music collections to the Semantic Web [3]. These efforts are helping to create a Web of structured data about music - a Global Music Graph - that can be leveraged for music discovery, intelligent music recommendation, metadata augmentation, digital rights management, and many other applications. We are now interested in making this powerful infrastructure palatable and accessible to the expert and non-expert users alike. In this work we develop an algorithm for visualizing large amounts of structured data. We call this algorithm k-pie. The algorithm is essentially an extension of the graph layout algorithm know as k-cores decomposition developed in [4]. However k-pie differs in that it accounts for vertex labels or color. Vertices of a graph are arranged in a 2 dimensional circle where slices in the circle correspond to a specific vertex label and the most connected vertices are found in the center of the visualization. WebSci 2009 Athens, Greece Mark Sandler Centre for Digital Music Queen Mary University of London Mile End, London, UK E1 4NS mark.sandler@elec.qmul.ac.uk By applying concepts from complex networks research to semantic graphs k-pie creates meaningful visualizations of the relationships between hundreds of thousands of individuals. By assigning an object property or a set of object properties as edges and a set of object classes or individuals as labels, k-pie visualizations can be created by parsing results from the SPARQL query language [5]. The rest of the paper is organized as follows. In Section 2 we briefly review some related work. In Section 3 we describe the k-pie algorithm and show how it can be used in the context of structured data. In Section 4 we show some of the applications of the k-pie method including the visualization of Myspace music artists and visualization of influence between classical composers. Finally in Section 5 we provide a discussion and ideas for future work. 2. BACKGROUND As mentioned above, the k-pie algorithm is a rather intuitive extension of the network visualization algorithm k-core decomposition developed by Alvarez-Hamelin et. al in [4]. The k-core decomposition algorithm begins with a k-core analysis on a given graph structure and places vertices in a 2 dimensional space using a pair of polar coordinates - a radius related to the shellness of a given vertex and an angle related to the cluster of that vertex after k-cores analysis. We will review these concepts again in Section 3.1. We are most interested in visualizing data about music artists. We describe this data using Music Ontology terms and other supporting Web onotolgies following the work of Raimond [1]. We are interested in utilizing the k-pie visualization algorithm as a tool for interacting with structured data on the Semantic Web [6]. There exists a growing number of tools for visualizing and browsing structured data [7]. 3. ALGORITHM To describe the k-pie algorithm first we will provide some definitions and concepts associated with the k-core decomposition algorithm. 3.1 Definitions Let us consider a graph G = (V, E) where V = n vertices and E = e edges. As described in [8], a k-core is defined as follows: Definition 1. A subgraph H = (C, E C) induced by the set C V is a k-core or a core of order k iff v C :
shellness and its semantic label. Each vertex i is positioned using a pair of polar coordinates (ρ i, α i). The radius ρ i depends on the shellness of i and that of its neighbors while the angle α i depends on the label associated with i. In the resulting visualization, k-shells are presented as layers of concentric circles with the innermost circle corresponding to the vertices with the highest shellness. Then vertices sharing the same label are positioned within in a certain angular range. This creates pie-like slices of vertices sharing the same label across the k-shell layers. Hence the name of the algorithm k-pie. Figure 1: The k-core decomposition for a small graph. Each closed line contains the set of vertices belonging to a given k-core, and the color of the vertices distinguish different k-shells. degree H(v) k, and H is the maximum subgraph with this property. Definition 2. A vertex i has a shellness c if it belongs to the c-core but not to the (c+1)-core. We denote the shellness of vertex i by c i. Definition 3. A shell C c is composed of all vertices whose shellness is c. The maximum value c such that C c is not empty is denoted c max. The k-core is then the union of all C c with c k. It is important to note the distinction between a k-core and a k-shell - a k-shell implies a certain range of vertex degrees while a k-core only implies a lower limit to vertex degree. We are more interested in k-shells for our visualization algorithm. In the k-core analysis all vertices of a connected graph belong to the 1-core. In Figure 3.1 this is indicated by the largest line encircling the entire graph. Then, all vertices of degree d < 2 are recursively cut out. In Figure 3.1 these are the blue vertices and they constitute the 1-shell. All other vertices maintain a degree of d 2 after pruning the blue vertices, and are not eliminated in this step. The remaining vertices form the 2-core, enclosed by another dotted line. In the next step vertices with degree d < 3 are pruned revealing the 3- core. Note that c max = 3 in the graph in Figure 3.1 as after pruning no vertex has a degree d > 3. Also note that the coloring of the vertices in the Figure indicate their shellness. In addition to the above definitions, let us also consider that each vertex i has a label associated with it l i and L is the set of all labels found in the graph G and s is the number of distinct labels found in L. 3.2 Layout equations The k-pie visualization algorithm positions vertices in 2 dimensional space. The position of each vertex depends on its The calculation of ρ i is as follows: ɛ ρ i = (1 ɛ)(c max c i) + V cj c i(i) X j V cj c i (i) (c max c i) where V cj c i is the set of neighbors of i having shellness c j greater or equal to c i. The parameter ɛ is a tuning parameter to control the possibility of rings overlapping. Then, the angle α i is calculated as follows: α i = 2π X L m n + N 1 m<l i Lli 2n, 2π L «l i n where L is an ordered list of the labels in the graph and L m is the number of vertices with the label m, N is a normal distribution of mean L l i and width 2π L l i. Assuming L 2n n is an ordered list of labels, referring to m < l i allows us to allocate the appropriate portion of the angular space to a given label. 3.3 Color and size of vertices and edges The size of each vertex in the visualization corresponds to the logarithm of its degree. Generally, the larger (higher degree) vertices be nearest the center of the visualization although it is possible for a higher degree vertex to exist in one of the more outer shells. The coloring of the vertices is related to the label associated with each vertex. Each distinct label found in L is given a unique color. The drawing of edges can be considered optional. In fact drawing all the edges in a larger graph will result in an unintelligible tangle. With no edges drawn at all, the resulting visualization is still useful. A homogeneously randomly sampled fraction of edges can be drawn. This approach does not add to computational cost significantly and is used in [4]. A more computationally expensive approach is to only draw edges with a higher betweenness centrality - those edges which are found more often in the shortest paths between a pair of vertices [9]. 3.4 Complexity The k-pie algorithm has a complexity that is nearly identical to that of k-core decomposition. If we assume no re-ordering of L we can index our list of labels for the angular calculation in O(s n) where s is the number of labels. Generally s will be small compared to n the number of vertices. The k-core (1) (2)
decomposition takes time O(n + e) - O(n) to build a list of vertex s degeree and O(e) to perform the pruning in the recursive decomposition step where e is the number of edges. So our total time complexity for k-pie is O(s n + n + e) or simply O(n + e) if the number of distinct labels s is small. 4. APPLICATION The k-pie algorithm could be applied to most any graph-like data that includes vertex labels of some kind. Here we will discuss the application that motivated our development of k-pie - visualization of the Myspace artist network. First discussed in [2] the Myspace artist network consists of a subset of the Myspace social network that includes only users who specify that they are artists. A network is constructed from the directed top friend relationship between pairs of artists. This relation is chosen over the general undirected friend relation because it is assumed to be more meaningful and salient in terms of musical similarity. That is to say, a top friend relationship is more likely to imply some musical relationship between two artists - collaboration, co-membership, or stylistic influence. The data set from [2] was republished as structured data in RDF and a live wrapper service 1 was created to dereference Myspace-related URIs on demand. We want to create a k-pie visualization where the vertices are Myspace music artists, the vertices labels are the musical genre labels specified by the artists on Myspace, and the edges are top friend relationships to other artists. We can fetch this data using the SPARQL query language and the endpoint containing out data set 2. The query is as follows PREFIX myspo:<http://purl.org/ontology/myspace#> SELECT?from?to?fromLabel?toLabel FROM <http://dbtune.org/myspace-fj-2008> WHERE{ {?from myspo:topfriend?to. } OPTIONAL{?from myspo:genrelabel?fromlabel.?to myspo:genrelabel?tolabel} } After execution the from and to fields will specify our directed edge list for constructing our graph. The fromlabel and tolabel will contain the labels l i for each vertex i. We specify these triple patterns as OPTIONAL to include vertices that have no genre label specified. We can see the results of this query and visualization in Figure 2. Each vertex represents a music artist and the color of each vertex indicates the primary genre label associated with that artist. This data set contains n = 15, 019 vertices (music artists) and e = 114, 606 edges (top friend relations) and s = 106 labels (genre). Notice that a few highly 1 available at http://dbtune.org/myspace 2 available at http://dbtune.org/sparql connected music artists gravitate towards the center belonging to the highest k-shells around c max = 29. However the shells immediately lower than c max are mostly empty. This behavior is indicative of scale-free networks where the cumulative degree distribution follows a power law decay - P c(d) d (α 1) [10]. Also note that the highly connected vertices in the center k-shells constitute a rich club where the music artists with the highest degree values are also connected to each other [11]. We can also get a sense for what musical genres dominate this sample of the Myspace artist network. We can see that Hip Hop (yellow) and Rap (bright green) account for nearly half of all the genre labels in the data set. We also see that the rich club in the center of the visualization includes vertices with genre labels Hip Hop, Rap, Soul, Reggae, and Hardcore - a somewhat surprising addition to the list. Note that the genre label appears as text only for those genre labels that are associated with more than 1% of the vertices in the network. The data set actually includes 106 unique genre labels and therefore the visualization contains 106 distinct colors. However, it can be exceedingly difficult for the viewer to accurately distinguish between so many colors therefore text labels are used in favor of a color legend. The vertices are drawn to be translucent so the viewer can get a better sense vertex concentration where vertices fall on top of one another. In this visualization only 0.2% of the network edges are chosen uniformly at random and included in the visualization. The remainder of the edges are ignored. The k-pie algorithm has also been applied to a data set of classical composers and the network of influence between composers. 3 This data set is essentially a re-publication of the data about classical composer influence compiled by Charles H. Smith in the 1990 s for the Classical Music Navigator (CMN) website [12]. Following the Linked Data guidelines for publishing structured data on the Web [13], this data set contains links to the DBpedia data set. 4 We can follow these links to obtain additional data about the composers. Of the 444 composers in the CMN data set, we have obtained links to DBpedia for 245. The k-pie algorithm can be used to visualize the data using, for example, rdf:type or dbpedia:birthplace, as the vertex label parameter. A small proof-of-concept software package has been developed to visualize this small data set. 5 5. DISCUSSION AND FUTURE WORK The k-pie layout algorithm allows for the creation of useful and intuitive visualizations of large collections of labeled network data. This algorithm can be useful in the context of the Semantic Web and combined with technologies like SPARQL to create visualizations of moderately large sets of structured data. A quick inspection of the visualization gives the viewer a sense of the make up of the network in terms of labels and a sense of what labels are associated with the most connected vertices. However there are some difficulties with the k-pie algorithm in its current state. Perhaps most notably, there is no clear method for ordering the list of vertices labels L found in the 3 available at: http://dbtune.org/cmn 4 http://dbpedia.org 5 http://omras2.org/classicalmusicuniverse
Figure 2: k-pie layout visualization of a sample of the Myspace artist network where the slices and vertex colors correspond to genre labels. Note that genre label text is only printed for genre labels that constitute > 1% of vertex labels. target graph. In the current implementation, L is naively ordered as the labels appear when constructing the graph - the label that is found first appears as the first label in L. The ordering of L dictates the ordering of the slices in the visualization. It would be preferable if this ordering had some meaning or justification. One option would be to build an adjacency matrix for the labels in L and approach this as a circular layout problem and use a method like spectral re-ordering [14] to organize the labels. This would have the effect of placing the most connected labels in the original graph closer together in the re-ordered list. Another short coming of the k-pie algorithm is that it allows for each vertex to be associated with only one label. This is often at odds with the complexity of real world entities and their descriptions on the Semantic Web, which by design, allows for an entity to be associated with many labels. Even in our Myspace music artist genre application, we naively select only the first genre label associated with a given artist assuming it is the most important. According to the interface design of Myspace, an artist may have between 0 and 3 genre labels. We simply ignore the additional labels in the visualization. To the best knowledge of the authors, there is no clear way to address multiple labels in the context of the k-pie layout algorithm. As can be seen in 2 nodes tend to bunch together in large graphs. Although the viewer can clearly see what portion of the vertices are associated with different labels and which vertices are the most connected in the graph, much of the detail is lost. For this reason we purpose k-pie is most appropriate for generating global over views of medium to large data sets and would be most effective as one aspect of a multi-view interactive data exploration experience. Because the complexity of k-pie is relatively low, it could be incorporated into a faceted browsing interface where the k- pie layout is re-caclulated as the browsing context changes. In future development we hope to use k-pie visualizations in a read-write tool for RDF data following the principles outlined in [15]. Currently k-pie requires the construction of ad hoc SPARQL queries to visualize data sets published on the Semantic Web. The original aim was to develop a more general framework for visualizing structured data and Linked Data [13]. Unfortunately the current work falls short of this aim and focuses instead on a basic network visualization algorithm and its application to a specific Semantic Web data set. Creating a more general framework for visualizing and interacting with Linked Data and structured data on the Semantic Web is a primary goal of future work. 6. WEB RESOURCES The following Web resources supplement the material presented in this work: An open-source Java implementation of k-pie available by svn checkout at http://grasstunes.net/subclipse/ KPieLayout
Fields-Jacobson Myspace data available via SPARQL at http://dbtune.org/sparql Myspace RDF wrapper service available at http://dbtune.org/myspace Classical Music Navigator data set available at http://dbtune.org/cmn Classical Music Universe data browser available at http://omras2.org/classicalmusicuniverse 7. ACKNOWLEDGEMENTS The authors would like to acknowledge Ben Fields at Goldsmiths University of London for his assistance in collecting the original Myspace data set. This work is supported as a part of the OMRAS2 project, EPSRC grants EP/E02274X/1 and EP/E017614/1. A survey of measurements, Advances In Physics, vol. 56, p. 167, 2007. [Online]. Available: doi:10.1080/00018730601170527 [12] The classical music navigator. [Online]. Available: http://www.wku.edu/ smithch/music/ [13] C. Bizer, R. Cyganiak, and T. Heath. How to publish linked data on the web. [Online]. Available: http://linkeddata.org/docs/how-to-publish [14] D. J. Higham, G. Kalna, and M. Kibble, Spectral clustering and its use in bioinformatics, J. Comput. Appl. Math., vol. 204, no. 1, pp. 25 37, 2007. [Online]. Available: http://portal.acm.org/citation.cfm?id=1238339 [15] T. Berners-Lee, J. Hollenbach, K. Lu, J. Presbrey, E. P. d ommeaux, and m.c. schraefel, Tabulator redux: Writing into the semantic web, 2007. [Online]. Available: http://eprints.ecs.soton.ac.uk/14773/ 8. REFERENCES [1] Y. Raimond, A distributed music information system, Ph.D. dissertation, Queen Mary University of London, 2009. [2] K. Jacobson and M. Sandler, Musically meaningful or just noise, an analysis of on-line artist networks, in Proc. of CMMR, 2008, pp. 306 314. [3] Y. Raimond, C. Sutton, and M. Sandler, Automatic interlinking of music datasets on the semantic web, 2008. [4] J. I. Alvarez-Hamelin, L. Dall Asta, A. Barrat, and A. Vespignani, k-core decomposition: a tool for the visualization of large scale networks, CANADA, p. 41, 2006. [Online]. Available: cs/0504107 [5] SPARQL query language for RDF, W3C recommendation, 2008. [Online]. Available: http://www.w3.org/tr/rdf-sparql-query/ [6] T. Burners-Lee, J. Hendler, and O. Lassila, The semantic web, Scientific American, vol. 284, no. 5, May 2001. [Online]. Available: http://www-sop.inria.fr/acacia/cours/essi2006/ Scientific%20American %20Feature%20Article %20The%20Semantic%20Web %20May%202001.pdf [7] S. Kushro and A. Tjoa, Fulfilling the needs of a metadata creator and analyst- an investigation of rdf browsing and visualization tools, Canadian Semantic Web, pp. 81 101, 2006. [Online]. Available: http://dx.doi.org/10.1007/978-0-387-34347-1 6 [8] V. Batagelj and M. Zaversnik, Generalized cores, 2002. [Online]. Available: cs/0202039 [9] L. C. Freeman, A set of measures of centrality based on betweenness, Sociometry, vol. 40, no. 1, pp. 35 41, 1977. [10] M. E. J. Newman, The structure and function of complex networks, SIAM Review, vol. 45, p. 167, 2003. [Online]. Available: cond-mat/0303516 [11] L. F. Costa, F. A. Rodrigues, G. Travieso, and P. R. V. Boas, Characterization of complex networks: