Generating Visualizations From RDF Graphs

Size: px
Start display at page:

Download "Generating Visualizations From RDF Graphs"

Transcription

1 Generating Visualizations From RDF Graphs Zhuo Ma U Supervisor: Tom Gedeon, Armin Haller COMP8715 Computing Project Australian National University Semester 1, May,

2 ACKNOWLEDGEMENT I would like to express my greatest gratitude to my supervisors Tom Gedeon and Armin Haller for their enthusiastic and patient guidance. With their help and supports on this researching topic, I have gained much new knowledge. I would also like to thanks Dr Weifa Liang for guiding us the technical writing skills. In addition, I am very appreciated for the help from PHD student Anila Sahar Butt, and the supports from my family and friends. 2

3 ABSTRACT RDF language becomes increasingly significance for the studies of developing Semantic Web. For users to have better understanding in this area, this requires advanced methodologies and tools to visualize RDF data in a nice and intuitive way. In this project, we have designed a new method called Concept-Matching to visualise RDF graphs that contain schema and data in particular. We processed data from dbpedia database as an example to implementing this approach, and designed an algorithm to retrieve the required data. Moreover, we designed experiments to test the algorithms efficiency and worked on the algorithm optimization. Based on the results and analysis, we conclude that the new approach can be implemented successfully with the core algorithm. Keywords: RDF visualization, Semantic Web, Concept-Matching 3

4 CONTENTS Acknowledgement... 2 Abstract Introduction Background Related Work Non-graph based RDF Visualization Graph based RDF Visualization WebVOWL: Web-based Visualization of Ontologies RDF Graphs with LodLive VisualRDF Visual representation of RDF Discussion of Related Work Methodology RDF data structure analysis Retrieve concept- match information Construct Mapping model Build Base Layer Build higher layers Large graph layout - Five point scale Algorithm and Time Complexity Evaluation Experiment Environment and Steps Experiment Experiment Testing Environment Experiment Results and Analysis Optimization Conclusion and Future Work Conclusion Future Work Reference

5 CHAPTER 1 1 INTRODUCTION In recent years, data is being gathered from daily life as a general way to represent existing information and knowledge, and is frequently analysed in order to assist in making future decisions. The analysis of the web of data has attracted both data researchers and users. RDF language as a way to store web of data, can be used for the studies on Semantic Web development. However, because of the complex data structure in RDF, expert, let alone causal users often have difficulties understanding the details of RDF and employing the information they provide. From the humans perspective, to recognize and analyse information provided by Semantic web, the best and most friendly way is to implement visualization and exploration. The purpose of visualization is to convert and transform big data to visual representation that can be understood and interacted easily by humans. RDF visualization plays a key role to help users to better understand and interact with data. Many previous works usually visualize RDF data as a tree graph with linked nodes and edges, which will be discussed in more details in the section related work. While those approaches are well suited for small dataset, the visualization will result in very complex graphs and hard for users to manage and understand when dealing with large datasets that may contain billions of triples. In addition, as more and more data appeared to users, some visualization technic choose visualize the overall representation of a whole data set in order to avoid this issue. Although it may bring merits if users want to explore the high-level structure of a vast amount of information, it also causes problems when users want to discover the detail data. For instance, if users want to know the label and relations of source, then the overall representation of data would not be adaptable. In order to conquer these problems and provide a more intuitive, effective and user-oriented visualization for RDF data, we have developed a new visualization approach called Concept- Matching. This approach combines a Compound-Fisheye view to visualize RDF data as different size of bubbles around main source. In this paper, we present previously related work, the process of this methodology in details and analyze the efficiency of the core algorithm inside of Concept-Matching. In addition, we designed two different experiments to test the time complexity of our algorithm, and compared with the time that spend on retrieving real data. 5

6 CHAPTER 2 2 BACKGROUND RDF stands for Resource Description Frameworks, a data model, utilising metadata, being used to store and describe data resources on the web [1]. It stores the linked data as triples of class relations and uses URIs (Uniform Resource Identifier) to indicate relationships and two end links, which can and has been widely employed for different purposes by users of a variety of skill levels [2]. RDF is widely used to empower linked data in the development of Semantic Web as the evolution of the World Wide Web, which is proposed by Tim Berners-Lee s article in 2001 [3]. In the humanities, the word semantic refers to the distinctions and similarities between the meanings of words. The term Semantic Web, therefore, refers to a web of meanings. The Semantic Web can be considered as a web of data, which provides a common framework for data integration and combination across various applications [4]. The reason for developing the Semantic Web is that different data is stored and controlled by different applications with little communication between them, resulting in the World Wide Web being otherwise unable to provide precise information to deal with users semantic requests. For example, search anything on the Internet or on application databases by sending the semantic request find another person who called Obama. The information retrieved will be based on all keywords and it will not provide precise data. This result is due to the fact that the search engine could not understand the user s request, because the different domains lack integration and consistency. Thus, to deal with more complex search terms, data across different domains needs to be shared and understood and for this reason, the Semantic Web is critical to the Internet revolution. Figure 1: Evolution on the Web [5] 6

7 Figure 1 shows that the Semantic Web refers also to Web 3.0 as being the evolution and extension of Web 2.0, and aims to link separate data on the Web through URIs (Uniform Resource Identifiers) in order to achieve better search results regardless of language, in terms of sharing and reusing data on the Web [6]. The World Wide Web Consortium (W3C) has developed the Semantic Web Architecture, illustrated in the figure below, to developers to assist in the development of technology. Figure 2: Semantic Web Architecture in layers [7] The first layer, Unicode and URI, is used to standardize development languages used in the web and to identify original web resources, and then the higher layers are extrapolated from lower layers [8]. The core layer for Semantic Web is the third one, RDF and RDFs (RDF schema), which provides the standard format to represent metadata about web resources [8]. For different website, the data can be merged if data exists in both, and others can be linked together if the data are relevant, and finally will group up to a huge data structure such as ontology models that are syntactically based on RDF. As such complex data structures in RDF, the visualization of RDF still has problems, as mentioned in Introduction section. A later section will demonstrate the related works done by others and discuss how our work is different. 7

8 CHAPTER 3 3 RELATED WORK Research on visualizing linked data has become a popular area over the last few years. Numerous research works considered better and more intuitive visualization of ontologies for end users, either for experts in Semantic Web or casual users who are interested in this area [9][10][11]. Meanwhile, many visualization tools are developed to make better performance of visualizing RDF data, which is generally classified into two aspects, non-graph based and graph based approaches respectively [12]. 3.1 NON-GRAPH BASED RDF VISUALIZATION Non-graph based methodology presents data in a logical sequence containing facets, categories and subject description, such as in the Haystack [13] and mspace platform [14]. In addition, in the article The Pathetic Fallacy of RDF [15] by David and Schraefel also mentioned that these non-graph based approaches will have better performance than some traditional graph based tools such as RDF-gravity [16] and IsaViz [17] since the graph will become massive as the size of RDF data becomes larger. For instance, the figure 3 shows the FOAF (Friend of a Friend) Vocabulary Specification [18] as a table of data. Figure 3: FOAF Vocabulary specification data table on Protege 8

9 Although in some aspects non-graph based representation may have more merits than graph based representation, we still believe the future should be graph-based visualization that provides users with more intuitive feeling. Moreover, the graph-based method will have clear representations of overall structure, interrelationship, patterns and trends [19]. Thus, the next section will discuss the benefits and weakness of graph based visualization tools in more detail. 3.2 GRAPH BASED RDF VISUALIZATION As this area has become a popular research topic in recent years, there are many tools developed for RDF visualization; such as WebVOWL [20], lodlive [21], and the JavaScript based Visual RDF [22]. The next few sections will show how the FOAF ontology can be viewed by the use of those three tools and explore if this graph can be the best representation WEBVOWL: WEB-BASED VISUALIZATION OF ONTOLOGIES WebVOWL is a standalone application for the user-oriented visualization of ontologies, which is based on the web technologies and D3 visualization library [20][23]. It uses force directed algorithm to formalize the data graph layout with the implementation of Visual Notation for OWL Ontologies (VOWL) that identifies the visual language for Ontology visualization [24]. The Visual Notation for OWL Ontologies (VOWL) contains graphical primitives and color scheme ingredients to form the basic constructions, which are shown in figure 4 (a) and (b) [24]. Figure 4: Graphical primitives and Color scheme for VOWL [25] 9

10 Graph Primitives: VOWL uses a list of symbols to demonstrate ontology concepts. The circles represent classes and the labeled arrows represent property relations between different sources. The ontology only has two types of objects, which are datatype that usually use literals and object property that contains URI. Thus, the object property still use circle to be visualized but the datatype is depicted as rectangles. Color Scheme: Many studies shows that the color chosen may make it easier for users to interact with different elements. For example, the color red is often used to attract attention and is therefore used to illustrate highlighted elements. Based on the FOAF vocabulary specification, the visualization graph with the usage of VOWL notation using a force-directed layout is presented in Figure 5. Advantage Figure 5: Friend of a Friend (FOAF) visualization in WebVOWL It clearly shows the overall structure of FOAF ontology. The basic details including information about FOAF, metadata and the graph statistics can be found on the side bar. When click any element on the graph, the corresponding information will be showed under the Selection Details such as its name, type, domain and range. Disadvantage The graph will become very complex and it is hard to find the useful information as the data becomes larger. 10

11 3.2.2 RDF GRAPHS WITH LODLIVE LodLive can parse RDF resources whether they are stored in a SPARQL endpoint, and generate user-oriented graphs with the use of proper navigation model throughout the data [26]. This tool uses a JavaScript application layer without using any application servers to browser a SPARQL endpoint, which transforms any configured endpoints to JSON format in order to parser to JavaScript and visualization in an HTML5 web page [26]. LodLive is comprised of 5 different components [26]: LodLive-core.js: jquery plug-in LodLive-profile.js: JSON configuration map HTML5 page Few images sprites Some other jquery public plug-ins LodLive operations In the first place, choose a database and an endpoint such as FOAF class to retrieve the URI and access the detail of FOAF. Figure 6: Single endpoint search panel After the endpoint request, JSONP is called to generate a central circle representing the main class, and many small circles representing Object properties surround the core class. 11

12 Figure 7: Central class with surrounding object properties The object properties can be expanded by user s interaction with small circles to display more data and each new resource is connected with the main class through an arrow representing the value of given properties. Figure 8: Object Properties in FOAF expanded 12

13 Advantage Use dynamic visual graph to traverse RDF data with users interaction Discover relations in the linked data step by step For the different resources, there is also corresponding description that show relevant type and comments Inverse relation between different resources is showed with arrow going back and forth Disadvantage Does not visualize the whole FOAF ontology Hard for casual users to understand since all URI appear in each circle rather than labels Graph will become complex and hard to be visualized as more object properties are expanded VISUALRDF VISUAL REPRESENTATION OF RDF VisualRDF is developed by Alangrafu at 2014 [27], which use D3 JavaScript library [28] for a nice data visualization and ARC2 [29] for parsing RDF. Operations This tool provides a easy model to be operated, which only require users to type a URI Figure 9: Single URI access panel The overall graph of data about FOAF will be generated automatically 13

14 Figure 10: Overall FOAF visualization graph There is also a function panel provided to help user better interact with the graph Figure 11: Function panel The details of each node can be displayed while move the mouse to its position Figure 12: Node details graph 14

15 Advantage: Easy for users to operate. Easy to display the basic structure of the linked data model automatically. Disadvantage: Many intersection lines. The relations between classes are vague. The graph become disorder when dealing with large dataset DISCUSSION OF RELATED WORK By investigating the related work, most visualization tools focus on the whole ontology visualization, but only few tools provide the comprehensive and specifying visualization model. All these tools are trying to implement the classes and properties in a nice and clear way. However, there also exist some major deficiencies: The graph become hard to be recognized while the data become larger. Redundant properties are showed by arrows. No clear visualization of individuals. Most tools have implemented the visualization approach Visual Information- Seeking Mantra overview first, zoom and filter, then details on demand [30]. It provides users with an overview of the whole ontology and then allows users to explore each node accordingly. To consider these deficiencies, we developed the Concept-Matching approach to visual data as different size of bubbles center around a main class while all the instances will be showed by subsequently exploring the bubble in depth. Moreover, for considering the user-oriented visualization, we also introduce the method Compound-Fisheye Views [31] on the tree map to visualize large graphs when there is a large amount of triples in the to be visualized RDF graph. Another important fact is that our approach mainly focuses on the users who have less knowledge about the Ontology and RDF, rather than most tools developed for experts. 15

16 CHAPTER 4 4 METHODOLOGY Overcoming the shortcomings of those tools that are mentioned in the previous chapter while also finding a new approach to visualize RDF data is also the purpose of our project. To achieve this, we have done extensive research especially on the RDF data structure analysis and data visualization approaches. Finally, we come up with the idea to use bubbles to represent different type of data and use Concept Matching method to restrict the size and content of bubbles. In this chapter, we will use endpoint Canberra as resources from the dbpedia database as an example to explore our approach. In the first sub section, we will analyse the basic RDF data structure and in the following sections we will describe the process of building the visualization model. Figure 13 presents the high-level structure of the methodology in this project. Figure 13: Structure of the Concept-Matching methodology 4.1 RDF DATA STRUCTURE ANALYSIS This part will explore basic RDF statement and the RDF Model including objects as both literals and resources, and illustrate how a SPARQL query can be used to find the necessary information. 16

17 The RDF Statement Triples RDF/XML stores data as triples: Subject, Property and Object. For example, a simple sentence The author of is Jan Egil Refsnes will have a triple relation as follows: Subject (Resource) Property (Predicate) Object (either literal or resource) Author Jan Egil Refsnes Table 1: Subject-Property-Object We adopt another example generated from the resource Canberra in dbpedia database ( which is shown below in terms of some simple RDF statements: Figure 14: Simple RDF Statement for Canberra 17

18 Interpreting this RDF statements Subject: Property: dbpedia-owl:country Refers to dbpedia-owl:populationtotal dbpedia-owl:wikipageid Refers to Refers to Object: dbpedia-owl:date Refers to September 2011 RDF Namespace URIs Line 4 xmlns: rdf= shows the standard W3C namespace, which indicates that the enclosing document is an RDF document tagged by rdf:rdf. Moreover, the namespace xmlns:dbpedia-owl specifies the elements with the dbpedia-owl prefix RDF Model The set of statements inside the RDF documents can be viewed as a directed labeled graph since the data is stored as triples. The resources including subject and object are represented by nodes and all properties are presented by edges. Thus, the above RDF can be illustrated in figure 15: 18

19 Figure 15: Simple Canberra RDF model We can see the graph becomes very difficult to parse when stating it with fully qualified URIs, so we adopt namespace prefix as labels for representing each node and edge to make the visualization simple and clear. SPARQL query SPARQL in terms of SPARQL Protocol and RDF Query Language is the W3C recommendation language for RDF query [32]. SPARQL is similar to SQL, which allows us to use the query words including that the use of SELECT clause choose which set of data should be queried and the use of WHERE statement find a match through the query data set. For example, we can use the following query to return every person s name in the FOAF database. Figure 16: Name Return Query on FOAF database 19

20 This query will search all the triples in FOAF database, and return each person s name. It notes that SELECT?name clause request all the variables names return from the set found in WHERE statement. The statements inside WHERE are also triples formats; for example?person foaf:name?name searches all the persons who have names, as well as the statement?person a foaf:person that a is a type predicate. 4.2 RETRIEVE CONCEPT- MATCH INFORMATION In our Concept-Matching visualization approach, we only show the important concept related to the resource as bubbles around the central class and we use the number of instances that a concept has to decide the size of bubbles. Thus, in order to retrieve the necessary concepts most relevant to the resource, we are supposed to retrieve the number of instances count for each concept and the relations between each concept and its sub concepts. We choose the endpoint Canberra as the resource from the dbpedia database to retrieve its instance count and concept relations. To accomplish this, we need to use the Virtuoso SPARQL Query Editor [33] for querying the dbpedia database. To get properties and their count attached to a type that exported to file InstanceCountPerType.csv, we wrote the SPARQL query language as it is shown on Figure17: Figure 17: SPARQL query for Instance Count Per Type of Canberra 20

21 To retrieve the type and subtype relations among those concepts/types related to Canberra in dbpedia, we wrote the SPARQL query to export it to file Concept-Subconcept.csv, which is showed as follows: Figure 18: SPARQL query for concept relations of Canberra 4.3 CONSTRUCT MAPPING MODEL In this section, we built a program to scan both the InstanceCountPerType.csv (a set of triples) and Concept-Subconcept.csv (a set of concept-relations) dataset to get the concepts that are most relevant to the source Canberra in which to draw the different layers in Canberra data visualization. To have a better understanding of what is the most relevant concept to resource Canberra, we used a simple example to illustrate: Figure 19: Canberra-ANU demo 21

22 As the above graph showed, Canberra has school ANU and ANU has concept University but University is a sub concept of Organization. Thus, the most relevant concept to Canberra is University. The next section explains the way to retrieve most relevant concept from the two dataset of Canberra. The data in two dataset looks like graph 1 and graph 2 in Appendix A. We separate this process into two different stages BUILD BASE LAYER Constructing the base layer requires recursive iteration through the dataset. The overview process model is shown as the following figure: Figure 20: Process mode of building base layer 22

23 Process 1 Filter process When we analyzed set of triples, we found there many concepts that have URI not only from dbpedia database but also from other source. We filtered all the concepts that are not starting at since we dealt with dbpedia resources. Thus after this process, we will have a new dataset that only contains all the concepts with URI starting at Process 2 Ranking process (deal with InstanceCountPerType.csv dataset) Situation 1: ranking the number of instance Firstly, we rank the number of instance from the largest to smallest, which is shown on Figure 21: Figure 21: First 10 lines of data in InstanceCountPerType.csv dataset If there two concepts that have the same number of instances and property, those concepts will be waited for checking their relations. For instance, the concepts Agent and Person have the same number of instance 186 and same property birthplace. Thus, the relations between these two concepts will be compared, and the program will return the concept that is the sub concept of another. Situation 2: ranking the property If various concepts that have the same number of instance but different properties as it is shown on Figure 22, we ranked their property. Figure 22: Concepts with instance count 92 23

24 Thus, there will have two pairs of concepts (Pair 1: PhysicalEntity and CausalAgent ; Pair 2: Person and Agent ) need to be checked for their relations, since the pair of concept has the same number of instance and same property. Situation 3: more than two concepts with same number of instance and property If more than two concepts have the same number of instance and property, each pair of concept need to be checked for its relation. Figure 23: Concepts with instance count 145 The Figure 23 shows many concepts have same instance count and property, so each concept needed to compare with the others. Finally, the program returns any one concept that is the sub concept but not being a super concept. For instance, (eg. A->B, C->D, E->F, B->C where -> stands for is sub concept of), it returns either A or E. Process 3 Check concept relations: return the most relevant concept For each pair of concepts, we scan the concept-relations dataset to check its relation. For instance, in the situation 1 above, both concepts Person and Agent with the same number of instance and property are waiting for check the relation. Then, by searching concept-relation, it found the relation that is the sub concept of Then, the program returns the concept Base Layer demo After we apply the steps described above, we got a list of triples (concept property number of instance). We add all the number of instances together when their concepts are the same and keep the record of their properties. For example, the property birthplace has 186 instances for concept Person and the property deathplace has 124 instances for concept Person. In this way, we can calculate the concept Person has the largest number of instance that will be showed by the largest size of bubble, and we still keep the record of their relations. The demo of the base layer will look like: 24

25 Figure 24: Base layer demo The concepts filtered are most relevant and important to Canberra and are arranged by how many instances they have. The black dots between concept labels Organization and Dom here mean that many bubbles are omitted in this demo. At the base layer, if it still can be expanded, then we chose not to show either the property arrow or instances, unless they cannot be expanded any more. This will be discussed more on the next section BUILD HIGHER LAYERS As mentioned above, if the bubbles around Canberra in the base layer can be expanded further, and then it has go through the process to build its higher layers. To illustrate this, we chose to expand the concept Person in the following. When a user clicks the Person bubble, it will show its SubConcept as various sized bubbles around it. We built a recursive method to expand the higher layer as below. Step1- finds sub concepts We use the program to search the concept-relations to find all the concepts that are SubConcepts of Person, which is shown on table 2 below. SubConcept Concept 25

26 Step2 finds number of instances Table 2: SubConcepts of Concept Person Now back to read the dataset InstanceCountPerType.csv to looking for how many instances those sub concepts have, in order to decide the size of bubbles around source Person ; simultaneously, the property and relations are recorded for the instances visualization. Then, we run the program to get the triple relations that are shown in Appendix B. The total number of instance for each concept is: Concept/Type Total No of Instance Table 3: Total number of instance for sub concepts of concept Person Step3 visualizing Person After retrieving those data, the higher layer concept label Person will be generated based on the number of instance, which looks like: 26

27 Figure 25: Higher layer of Person demo Recursive step After reaching the second level of source Canberra, we check if those sub concepts of Person could be expanded further. If any concept that represented by bubbles could be expanded to the next level, then the above process is repeated to determine what concepts would be involved and use pointers to record the properties and instances. The instances and properties will be shown until no concept has any more sub concepts. Figure 26: Higher level of concept Artist 27

28 For instance, the Figure 26 illustrates the next level of concept Artist where we found that the concept Writer does not have further sub concepts. Thus, when a user clicks on Writer, it will not show any more sub concepts around the Writer bubble; instead, the properties and instances that have concept Writer will be shown by arrows and rectangles. The instances are retrieved from the endpoint Canberra in dbpedia database. The graph looks like: Figure 27: The instances with concept Writer Place of Death relation means that Bryce Courtenay who was a Writer died in Canberra. We used the asterisk to represent that there is more than one instance that connect with Canberra and show the instance directly if only one instance exist. This method is briefly explained in [12]. Since we used the Compound-Fisheye Views [31], other bubbles will become far small than the one that the user is focusing on LARGE GRAPH LAYOUT - FIVE POINT SCALE When the data become large such as vast amount of concepts, we decided to use a simple ranking method to strict the grapy layout. 1) According to the first letter of concepts label, we separated the concepts into five different bubbles such as the following Figure: 28

29 Figure 28: Base layer by character 2) When a user clicks the bubble with label A-E, then the labels of concepts that starts from A to E will be showed respectively. Figure 29: A-E graph expanded 3) If there are still many concepts in the bubble A (such as 20), we compared the concepts second letter and separate to another five bubbles. It is shown in Figure 30 when a user click the bubble A : Figure 30: A graph expanded 29

30 This graph layout method combine with Compound-Fisheye view technic would works properly for visualizing large dataset. 4.4 ALGORITHM AND TIME COMPLEXITY Since we are dealing with a huge dataset in RDF, an effective algorithm is to be designed in order to decrease the complexity time in finding the data relations. For implementing the approach illustrated above to cope with the real RDF dataset Canberra, we considered the way that use list structure inside the hash map. Firstly, we used hash key to record the number of instances and used lists to contain concepts as the hash value, and then travel the Concept- Subconcept dataset for each list to find relations. However, by running the real data, the time consumption is very high and costs a quite long time to produce the result. Therefore, we redesigned a completed different algorithm that will be explained in details at the next part Algorithm. Algorithm We tried different ways to reduce the time complexity. Finally, by comparing the efficiency on different algorithms, we designed an appropriate algorithm that has the follow steps: Construct the Concept- Subconcept relation to be directed graphs. (Conceptrelation graph) As considering the time and space complexity, we used quick sort to sort the instances data. Construct a Breath- first search (BFS) algorithm to search the graph to find the required concept. The pseudo code is shown below: Pseudo-code for ranking instance data RANKING (Instance_DATA, p, r) 1 if p < r 2 then q PARTITION(Instance_DATA,p,r) 3 RANKING(Instance_DATA,p,q-1) 4 RANKING(Instance_DATA,q+1,r) RANKING() modify quick sort to rank the given data in a set of triples dataset. We modified the partition exchange sort in quick sort to get: Pseudo-code for partition-exchange instance data 30

31 PARTITION(Relation_DATA,Instance_DATA,p,r) 1 x Instance_DATA[r] 2 i p-1 3 for j p to r do remove 0 5 if ISGREATER(Relation_Data,Instance_DATA[j],x,remove) 6 then i i+1 7 exchange Instance_DATA[i] <-> Instance_DATA[j] 8 if remove = 2 9 then removelist {Instance_DATA[j]} 10 else if remove = 1 11 then removelist {x} 12 remove 0 13 ISGREATER(Relation_Data,x,Instance_DATA[i+1],remove) 14 exchange Instance_DATA[i+1] <-> Instance_DATA[r] 15 if remove = 2 16 then removelist {Instance_DATA[i+1]} 17 return i+1 Firstly, we compared the number of instance. If the numbers of instances are the same, then we compared the property. Until the properties are the same, then we used BFS (Breadth-first search) to search the concept-relation graph. Firstly, compare number of instance, return true if data1.no > data2.no. Return false if less, and move to next step if equal Secondly, compare property when number of instance is the same. Return false if different property and go to next step if same property Lastly, compare their relation if they have same property and same number of instance. Recall BFS in this step. Pseudo-code for instance and property comparison ISGREATER(Relation_Data, data1,data2, remove) 1 if data1.noinstance > data2.noinstance ##compare No. of instance 2 then return true 3 else if data1.noinstance < data2.noinstance 4 then return false 5 else if data1.property!= data2.property ##compare property if No. of instance 6 then return false ## are same 7 else if BFS(Relation_DATA, data1.concp, data2.concp) ##compare concept, 8 then p = 2 ##if property are same 9 return true 10 else return false We modify the Breadth-first search algorithm to search the concept graph in order to get the required concept. Pseudo-code for searching concept-relation graph 31

32 BFS(Relation_DATA,data1,data2) 1 for each vertex u Relation_DATA[G] - {s} 2 do color[u] WHITE 3 color[data1] GRAY 4 Q 5 ENQUEUE(Q,data1) 6 while Q!= 7 do u DEQUEUE(Q) 8 for each v Adj[u] 9 do if color[v] = WHITE 10 then color[v] GRAY 11 if v = data2 12 do return true 13 color[u] BLACK 14 return false Complexity Analysis The first process to build a directed graph from the dataset of Concept-Subconcept relations costs O(n) where n is the number of lines, since we read this dataset line by line. For each line, an edge will be crated between two nodes and the node will be added before adding the edge if the vertex does not exist. For ranking the item number of instance and property in the dataset of InstanceCountPerType, we chose to use Quicksort as we consider both time complexity and space complexity. The average case performance of using Quicksort cost O(n log n) and the worst case would be O(n^2). Although the worst case for other sorting methods such as Merge sort and Heapsort cost time complexity of O(n log n), their space complexity is up to O(n log n) unlike Quicksort which has a space complexity of O(log n) even in the worst case. That s because Merge sort use O(log n) stack space and the extra O(n) space for storing array, so the total space complexity is O(n log n). The same reason when using heap sort, it takes O(n log n) space to build the heap tree structure and use O(1) auxiliary space. Thus, the use of Quicksort can save a large amount of space especially on dealing with large dataset. In each process of Quicksort, we also need to recall Breath-first search for finding the concepts relation if it needs concept comparison. The BFS algorithm requires the time complexity of O( V + E ) in the worst case where the V is the set of vertex and E is the set of edges. In this Concept-Subconcept dataset, V is the set of concepts and E is the set of concepts relations. Therefore, the total time complexity of implementing the algorithm to retrieve the required data is O(( V + E ) * n log n). 32

33 CHAPTER 5 5 EVALUATION When we use the real data to test this visualization approach, we found that the algorithm efficiency could be the most difficult task to overcome while dealing with a large dataset. To test the usability of our algorithm implemented above, we designed two controlled experiments. The next few sections will explain the details of the experiment, analyse the experiments results and illustrate the ways to optimize the algorithms. 5.1 EXPERIMENT ENVIRONMENT AND STEPS In the first place, we have briefly view on how the graph of Concept- Subconcept relations looks like. Example: Subconcept A B B D C C E F Concept B C D F F E G G Figure 31: Directed graph When the number of concepts and subconcepts become large, the relations become very complex. Simultaneously, the time consumption of using BFS to traverse the graph also becomes larger. We designed two controlled experiments and used the experimental datasets to test the time consumption when increase the number of concept relations in Concept-Subconcept and the number of triple data in InstanceCountPerType respectively. 33

34 5.1.1 EXPERIMENT 1 We kept the number of triples (concept/type, property and number of instance) in InstanceCountPerType dataset as a constant at 5000 triples, while continuous increasing the number of data (lines) in Concept-Subconcept dataset by adding 100 data every time from 200 up to We check the time it costs through increasing the number of relations, which has the following steps: 1. Build 5000 numbers of triples data using recursive function, and the sample is shown in Appendix C graph Generate the different random relation of the concepts from those triples, which is shown in Appendix B graph 2. First trial, we generated 200 relations. 3. Run the program to test these two dataset to check what time it cost 4. Keep the number of triples and increase the number of relations by 200 and record the time consumed EXPERIMENT 2 We used the dataset generated from experiment1 in experiment 2 as follows: We kept the number of data (relations/ lines) in Concept- Subconcept data set as a constant at 1000 different lines, Increase the number of triples in InstanceCountPerType dataset in 500 steps from 500 to data. Record the time cost for each point (500, 1000, 1500 ). 5.2 TESTING ENVIRONMENT Experiments are processed via a Java program on a Mac system with the following specifications. Hardware / Software Information Eclipse Standard Version 1.0 Java SE Development Kit Jdk1.7.0_51 OSX Yosemite Version Processor 2.4 GHz Intel Core i5 Memory 4GB 1333 MHz DDR3 Graphics Intel HD Graphics MB Table 4: Testing environment 34

35 5.3 EXPERIMENT RESULTS AND ANALYSIS In the experiment 1, it takes a long time to produce the final results when the numbers of relations become very large. Compared with the experiment 1, producing the final results in experiment 2 takes shorter even with data increased. The results are shown in the following Figures: Figure 32: Result of experiment 1 Figure 33: Result of experiment 2 The results from two experiments show that the time cost in experiment 1 increase faster than in experiment 2 as the number of data increased. In experiment 1, when the number of data (relations / lines) increased, the trend of time cost is showing an exponential growth pattern. In experiment 2, along with the number of triples (concept, property and number of instance) increased, the trend of time cost is linear. When we test our algorithm with real dataset such as testing with Canberra (contains 5760 triples and 1674 lines of concept relations), the result is shown in Figure 34, Figure 34: Time cost for running Canberra The time consumed of testing real dataset has matched the time cost in experiments. 35

36 Although there exit some deviation on few data that may be due to the CPU efficiency, the trend of those results still satisfy the time complexity of our algorithm O(( V + E )*n log n). We can calculate the time complexity from O(( V + E )*n log n) for each experiment. O(( V + E )*n log n) where V is the set of vertex, E is the set of edges and n is the number of triples. The worst case for building the graph is E = V *( V -1)/2 For experiment 1, n is a constant C1, O V + E n log n = O V + V V 1 2 = O 1 C! logc! 2 V 2 + V C! logc! = O M! logc! Where M! =!! V 2 + V C! Time complexity becomes exponential function. For experiment 2, the number of relations is a constant, so V + E = C2, O V + E n log n = O (C! n logn) Time complexity becomes linear function Therefore, from the results analysis, the time complexity in our algorithm with the real time is totally matched. Special situation However, time consumption is still high due to that colouring vertex in BFS for searching directed graphs cost the most time. Colouring vertex in BFS is necessary when dealing with circles in directed graph such as circle B- C- F- D- B in Figure 35: Figure 35: Circle in the directed graph While the network may not exist a relation such that an object is a subset of another object and also the object itself is a superset of the other, we may ignore the steps of colouring vertex. Then, the time cost will be: 36

37 Figure 36: Time cost without using colouring In order to design a good approach, we here considered the entire possible relations including the circles. Thus, we did pruning on the algorithm to reduce the frequent use of BFS in order to promote its efficiency. The next section will illustrate the details of using pruning to optimize our algorithm. 5.4 OPTIMIZATION We used the pruning approach to decrease the searching steps such as the times of calling BFS, there three main steps. Step1 1. Create another graph called no relation graph to contain a set of unrelated vertices. 2. Update relation graph 3. Avoid the adjacent vertex to be null We created another graph to record the concept as vertex and all other concepts that has no relations with the target concept as its adjacent vertex during each time to call BFS for traversing the directed graph to compare the relation between pair of concepts. Then, we can check if the concept is included in the no relation graph before calling BFS to check if two concepts have relation. If it appears as vertex and edges in no relation graph, then it does not need to call BFS and can return no relation between these two concepts. For example when we used BFS searching the graph (Figure 31) to check if C is the sub concept of B, it needs to traverse the entire vertex in the graph and finally return False and generate a no relation graph to concept C such as Figure 37. Figure 37: No relations to concept C 37

38 When check if C is the sub concept of A, the program does not need to use BFS to search graph; instead, it check no relation graph first and found that A is the adjacent vertex of C and then return False. This way essentially reduced the times of calling BFS. Step 2 When comparing two concepts (A and B) by calling BFS, it searches if the concept A has a super concept that has distance more than 1 to A. If it does, it updated the original graph by adding the super concept to become an adjacent vertex of the concept A. For example in Figure 31, when we check if C is the sub concept of B, it recalled the BFS to search the directed graph and update the original graph to be: Figure 38: updated graph to vertex C Thus, it can save times of using BFS for finding path while comparing relation between C and G. Step 3 When check if concept 1 is the sub concept of concept 2, the program checked if the concept 1 in directed concept-relation graph has no adjacent vertex. If the concept 1 has no adjacent vertex that means it has no super concept relation, the program returns False directly and it does not need to call BFS. For example (based on the Figure 31), if the program needs to check if concept G is the sub concept of F, the program will return False directly without calling BFS for searching. The reason is that in the directed graph, the vertex G has no adjacent, which present G has no super concept. Those three steps are the most important steps in pruning methodology, which has the core strategy that is to reduce the times of calling BFS for searching. Since the BFS searching always consume a large amount of time, reducing the frequent use of BFS can make huge contribution on reducing the time complexity. We have applied the pruning steps on optimizing our algorithm. The details of the updated algorithm and the result of testing Canberra dataset can refer to Appendix D. 38

39 CHAPTER 6 6 CONCLUSION AND FUTURE WORK 6.1 CONCLUSION This project, generating visualization from RDF graphs, is going to explore a method to visualize RDF graphs that contain schema and data in particular. We started it from scratch, and did enormous researches on RDF data structure and data visualization. Most previous works on RDF visualization have the same major defect that the graph will become disorder and hard to be recognized along with the size of data become larger. To overcome those shortcomings, we have developed a new approach Concept-Matching that use bubbles to represent RDF data and use the importance of data to decide the size and position of bubbles. In our approach, we found one of the most difficult things is to implement a high-efficiency algorithm to retrieve data for the implementation of this method since the size of RDF dataset always be very large. We combined the use of Graph layout algorithm, Quicksort and Breadth-first search algorithms to improve the efficiency on retrieving data. From our experiments, we discovered: Experiment 1: When numbers of concept-relations stay the same, the time complexity appear exponential growth as the number of triples data increased. In this situation, the algorithm we developed is only suitable for calculating small dataset but not working properly for large dataset. Experiment 2: When numbers of triples data stay the same, the time complexity appear linear growth along with the number of concept-relations increased. In this situation, the algorithm is working properly for both small and large dataset We still found the time cost on retrieving data is quite high, so we did pruning to decrease the times of using BFS; simultaneously, the time consumption has been deduced. In conclusion, although the time complexity of implementing the algorithm is not as fast as we expected, new approach Concept-Matching still can be a good way to visualize large RDF dataset in a nice way. 6.2 FUTURE WORK By the experiments, even the process of methodology works properly, but we still need survey various users with HCI experiments. In the future work, firstly we would like to design sorts of Human-Computer Interaction experiments to test the useability of the Concept-Matching approach and the effectiveness of the graph layout including the Five 39

40 Point Scale approach. We can mainly focus on casual users and gather more data on the feeling of using this method to visualize RDF data. Moreover, if we ignore the circle relation, we do not need to colour vertices while calling BFS. As we tested, the running time will be decreased to less than few seconds via this way. To consider this fact, we would like to design some specific experiments to test what kind of data should use colouring and what kind of data can ignore this relation. Finally, we would like to develop a visualization tool to implementing this approach. 40

41 REFERENCE [1] W3schools. [2] W3C Semantic Web. [3] Semantic Web part of business world 2010, viewed 15 March 2015, < [4] W3C Semantic Web Activity. [5] Casellas, N 2011, Semantic Enhancement of Legal Information, Legal Information Institute, Cornell University Law School, viewed 16 March 2015, < [6] Coudyzer, E. (2013). First release GLAM sector reference terminologies, viewed 16 March 2015, < > [7] Berners-Lee, T, Architecture, W3C, viewed 17 March 2015, < > [8] Obitko, M 2007, Semantic Web Architecture, viewed 16 March 2015, < > [9] Dadzie, A & Rowe, M. Approached to Visualising Linked Data: A Survey, IOS Press, Semantic Web 1-2, [10] Geroimenko, V & Chen, C. Visualizing the Semantic Web: XML-Based Internet and Infor- mation Visualization. Springer, 2nd edition, [11] Janowicz, K., Schlobach, S., Lambrix, P & Hyvonen, E. Knowledge Engineering and Knowledge Management: 19 th International Conference, EKAW 2014, Linkoping, Sweden, Novermber 24 28, 2014, Proceedings. Springer International Publishing AG, [12] Sundara, S., Atre, M., Kolovski, V., Das, S., Wu, Z., Chong, EI & Srinivasan, J. Subsets, Summaries, and Sampling in Oracle. IEEEXplore ICDE Conference, [13] Quan, D., Huynh, D & Karger, DR. Haystack: A Platform for Authoring End User Semantic Web Applications. In Proceedings of the 2 nd International Semantic Web Conference, 2003, pp [14] Schraefel, M., Smith, DA., Owens, A., Russell, Alistair., Harris, C & Wilson, M. The Evolving mspace Platform: Leveraging the Semantic Web on the Trail of the Memex. Proceedings of the sisteenth ACM conference on Hypertext and hypermedia, 2005, pp [15] David & Schraefel, The Pathetic Fallacy of RDF, viewed 27 March 2015, < > [16] RDF Gravity. 41

Handling the Complexity of RDF Data: Combining List and Graph Visualization

Handling the Complexity of RDF Data: Combining List and Graph Visualization Handling the Complexity of RDF Data: Combining List and Graph Visualization Philipp Heim and Jürgen Ziegler (University of Duisburg-Essen, Germany philipp.heim, juergen.ziegler@uni-due.de) Abstract: An

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Visualizing an OrientDB Graph Database with KeyLines

Visualizing an OrientDB Graph Database with KeyLines Visualizing an OrientDB Graph Database with KeyLines Visualizing an OrientDB Graph Database with KeyLines 1! Introduction 2! What is a graph database? 2! What is OrientDB? 2! Why visualize OrientDB? 3!

More information

Visualizing a Neo4j Graph Database with KeyLines

Visualizing a Neo4j Graph Database with KeyLines Visualizing a Neo4j Graph Database with KeyLines Introduction 2! What is a graph database? 2! What is Neo4j? 2! Why visualize Neo4j? 3! Visualization Architecture 4! Benefits of the KeyLines/Neo4j architecture

More information

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS

KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,

More information

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu

ABSTRACT 1. INTRODUCTION. Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Kamil Bajda-Pawlikowski kbajda@cs.yale.edu Querying RDF data stored in DBMS: SPARQL to SQL Conversion Yale University technical report #1409 ABSTRACT This paper discusses the design and implementation

More information

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo

Semantic Modeling with RDF. DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo DBTech ExtWorkshop on Database Modeling and Semantic Modeling Lili Aunimo Expected Outcomes You will learn: Basic concepts related to ontologies Semantic model Semantic web Basic features of RDF and RDF

More information

DISCOVERING RESUME INFORMATION USING LINKED DATA

DISCOVERING RESUME INFORMATION USING LINKED DATA DISCOVERING RESUME INFORMATION USING LINKED DATA Ujjal Marjit 1, Kumar Sharma 2 and Utpal Biswas 3 1 C.I.R.M, University Kalyani, Kalyani (West Bengal) India sic@klyuniv.ac.in 2 Department of Computer

More information

Grids, Logs, and the Resource Description Framework

Grids, Logs, and the Resource Description Framework Grids, Logs, and the Resource Description Framework Mark A. Holliday Department of Mathematics and Computer Science Western Carolina University Cullowhee, NC 28723, USA holliday@cs.wcu.edu Mark A. Baker,

More information

An Ontology-based e-learning System for Network Security

An Ontology-based e-learning System for Network Security An Ontology-based e-learning System for Network Security Yoshihito Takahashi, Tomomi Abiko, Eriko Negishi Sendai National College of Technology a0432@ccedu.sendai-ct.ac.jp Goichi Itabashi Graduate School

More information

12 The Semantic Web and RDF

12 The Semantic Web and RDF MSc in Communication Sciences 2011-12 Program in Technologies for Human Communication Davide Eynard nternet Technology 12 The Semantic Web and RDF 2 n the previous episodes... A (video) summary: Michael

More information

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2)

Sorting revisited. Build the binary search tree: O(n^2) Traverse the binary tree: O(n) Total: O(n^2) + O(n) = O(n^2) Sorting revisited How did we use a binary search tree to sort an array of elements? Tree Sort Algorithm Given: An array of elements to sort 1. Build a binary search tree out of the elements 2. Traverse

More information

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim

Acknowledgements References 5. Conclusion and Future Works Sung Wan Kim Hybrid Storage Scheme for RDF Data Management in Semantic Web Sung Wan Kim Department of Computer Information, Sahmyook College Chungryang P.O. Box118, Seoul 139-742, Korea swkim@syu.ac.kr ABSTRACT: With

More information

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together

LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together LinksTo A Web2.0 System that Utilises Linked Data Principles to Link Related Resources Together Owen Sacco 1 and Matthew Montebello 1, 1 University of Malta, Msida MSD 2080, Malta. {osac001, matthew.montebello}@um.edu.mt

More information

Dude, where s my graph? RDF Data Cubes for Clinical Trials Data.

Dude, where s my graph? RDF Data Cubes for Clinical Trials Data. Paper TT07 Dude, where s my graph? RDF Data Cubes for Clinical Trials Data. Marc Andersen, StatGroup ApS, Denmark Tim Williams, UCB BioSciences Inc, USA ABSTRACT The concept of Linked Data conjures images

More information

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint

Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Linked Data Interface, Semantics and a T-Box Triple Store for Microsoft SharePoint Christian Fillies 1 and Frauke Weichhardt 1 1 Semtation GmbH, Geschw.-Scholl-Str. 38, 14771 Potsdam, Germany {cfillies,

More information

THE SEMANTIC WEB AND IT`S APPLICATIONS

THE SEMANTIC WEB AND IT`S APPLICATIONS 15-16 September 2011, BULGARIA 1 Proceedings of the International Conference on Information Technologies (InfoTech-2011) 15-16 September 2011, Bulgaria THE SEMANTIC WEB AND IT`S APPLICATIONS Dimitar Vuldzhev

More information

JustClust User Manual

JustClust User Manual JustClust User Manual Contents 1. Installing JustClust 2. Running JustClust 3. Basic Usage of JustClust 3.1. Creating a Network 3.2. Clustering a Network 3.3. Applying a Layout 3.4. Saving and Loading

More information

Towards the Integration of a Research Group Website into the Web of Data

Towards the Integration of a Research Group Website into the Web of Data Towards the Integration of a Research Group Website into the Web of Data Mikel Emaldi, David Buján, and Diego López-de-Ipiña Deusto Institute of Technology - DeustoTech, University of Deusto Avda. Universidades

More information

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination.

GCE Computing. COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination. GCE Computing COMP3 Problem Solving, Programming, Operating Systems, Databases and Networking Report on the Examination 2510 Summer 2014 Version: 1.0 Further copies of this Report are available from aqa.org.uk

More information

An Application Ontology to Support the Access to Data of Medical Doctors and Health Facilities in Brazilian Municipalities

An Application Ontology to Support the Access to Data of Medical Doctors and Health Facilities in Brazilian Municipalities An Application Ontology to Support the Access to Data of Medical Doctors and Health Facilities in Brazilian Municipalities Aline da Cruz R. Souza, Adriana P. de Medeiros, Carlos Bazilio Martins Department

More information

How semantic technology can help you do more with production data. Doing more with production data

How semantic technology can help you do more with production data. Doing more with production data How semantic technology can help you do more with production data Doing more with production data EPIM and Digital Energy Journal 2013-04-18 David Price, TopQuadrant London, UK dprice at topquadrant dot

More information

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler

Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler Evaluation of Open Source Data Cleaning Tools: Open Refine and Data Wrangler Per Larsson plarsson@cs.washington.edu June 7, 2013 Abstract This project aims to compare several tools for cleaning and importing

More information

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC

MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL

More information

Semantic Stored Procedures Programming Environment and performance analysis

Semantic Stored Procedures Programming Environment and performance analysis Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski

More information

An Ontology Model for Organizing Information Resources Sharing on Personal Web

An Ontology Model for Organizing Information Resources Sharing on Personal Web An Ontology Model for Organizing Information Resources Sharing on Personal Web Istiadi 1, and Azhari SN 2 1 Department of Electrical Engineering, University of Widyagama Malang, Jalan Borobudur 35, Malang

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Data-Gov Wiki: Towards Linked Government Data

Data-Gov Wiki: Towards Linked Government Data Data-Gov Wiki: Towards Linked Government Data Li Ding 1, Dominic DiFranzo 1, Sarah Magidson 2, Deborah L. McGuinness 1, and Jim Hendler 1 1 Tetherless World Constellation Rensselaer Polytechnic Institute

More information

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben

technische universiteit eindhoven WIS & Engineering Geert-Jan Houben WIS & Engineering Geert-Jan Houben Contents Web Information System (WIS) Evolution in Web data WIS Engineering Languages for Web data XML (context only!) RDF XML Querying: XQuery (context only!) RDFS SPARQL

More information

Graph Database Proof of Concept Report

Graph Database Proof of Concept Report Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment

More information

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs

G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia ssakr@cse.unsw.edu.eu Sameh Elnikety Microsoft Research Redmond, WA, USA samehe@microsoft.com

More information

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015 E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing

More information

Data Store Interface Design and Implementation

Data Store Interface Design and Implementation WDS'07 Proceedings of Contributed Papers, Part I, 110 115, 2007. ISBN 978-80-7378-023-4 MATFYZPRESS Web Storage Interface J. Tykal Charles University, Faculty of Mathematics and Physics, Prague, Czech

More information

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX

A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) ! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and

More information

RDF Visualization using a Three-Dimensional Adjacency Matrix

RDF Visualization using a Three-Dimensional Adjacency Matrix RDF Visualization using a Three-Dimensional Adjacency Matrix Mario Arias Gallego Computer Science Dept Univ of Valladolid, Spain marioarias@gmailcom Javier D Fernández Computer Science Dept Univ of Valladolid,

More information

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12

More information

Using RDF Metadata To Enable Access Control on the Social Semantic Web

Using RDF Metadata To Enable Access Control on the Social Semantic Web Using RDF Metadata To Enable Access Control on the Social Semantic Web James Hollenbach, Joe Presbrey, and Tim Berners-Lee Decentralized Information Group, MIT CSAIL, 32 Vassar Street, Cambridge, MA, USA,

More information

Visualizing RDF(S)-based Information

Visualizing RDF(S)-based Information Visualizing RDF(S)-based Information Alexandru Telea, Flavius Frasincar, Geert-Jan Houben Eindhoven University of Technology PO Box 513, NL-5600 MB Eindhoven, the Netherlands alext, flaviusf, houben @win.tue.nl

More information

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013 HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013 Riley Moses Bri Fidder Jon Lewis Introduction & Product Vision BIMShift is a company that provides all

More information

Semantic Interoperability

Semantic Interoperability Ivan Herman Semantic Interoperability Olle Olsson Swedish W3C Office Swedish Institute of Computer Science (SICS) Stockholm Apr 27 2011 (2) Background Stockholm Apr 27, 2011 (2) Trends: from

More information

IDE Integrated RDF Exploration, Access and RDF-based Code Typing with LITEQ

IDE Integrated RDF Exploration, Access and RDF-based Code Typing with LITEQ IDE Integrated RDF Exploration, Access and RDF-based Code Typing with LITEQ Stefan Scheglmann 1, Ralf Lämmel 2, Martin Leinberger 1, Steffen Staab 1, Matthias Thimm 1, Evelyne Viegas 3 1 Institute for

More information

Semantically Enhanced Web Personalization Approaches and Techniques

Semantically Enhanced Web Personalization Approaches and Techniques Semantically Enhanced Web Personalization Approaches and Techniques Dario Vuljani, Lidia Rovan, Mirta Baranovi Faculty of Electrical Engineering and Computing, University of Zagreb Unska 3, HR-10000 Zagreb,

More information

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc. Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE

More information

ARC: appmosphere RDF Classes for PHP Developers

ARC: appmosphere RDF Classes for PHP Developers ARC: appmosphere RDF Classes for PHP Developers Benjamin Nowack appmosphere web applications, Kruppstr. 100, 45145 Essen, Germany bnowack@appmosphere.com Abstract. ARC is an open source collection of lightweight

More information

10CS73:Web Programming

10CS73:Web Programming 10CS73:Web Programming Question Bank Fundamentals of Web: 1.What is WWW? 2. What are domain names? Explain domain name conversion with diagram 3.What are the difference between web browser and web server

More information

Annotation: An Approach for Building Semantic Web Library

Annotation: An Approach for Building Semantic Web Library Appl. Math. Inf. Sci. 6 No. 1 pp. 133-143 (2012) Applied Mathematics & Information Sciences @ 2012 NSP Natural Sciences Publishing Cor. Annotation: An Approach for Building Semantic Web Library Hadeel

More information

Zoomer: An Automated Web Application Change Localization Tool

Zoomer: An Automated Web Application Change Localization Tool Journal of Communication and Computer 9 (2012) 913-919 D DAVID PUBLISHING Zoomer: An Automated Web Application Change Localization Tool Wenhua Wang 1 and Yu Lei 2 1. Marin Software Company, San Francisco,

More information

Detection and Elimination of Duplicate Data from Semantic Web Queries

Detection and Elimination of Duplicate Data from Semantic Web Queries Detection and Elimination of Duplicate Data from Semantic Web Queries Zakia S. Faisalabad Institute of Cardiology, Faisalabad-Pakistan Abstract Semantic Web adds semantics to World Wide Web by exploiting

More information

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints

Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints Benchmarking the Performance of Storage Systems that expose SPARQL Endpoints Christian Bizer 1 and Andreas Schultz 1 1 Freie Universität Berlin, Web-based Systems Group, Garystr. 21, 14195 Berlin, Germany

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Understanding Data: A Comparison of Information Visualization Tools and Techniques

Understanding Data: A Comparison of Information Visualization Tools and Techniques Understanding Data: A Comparison of Information Visualization Tools and Techniques Prashanth Vajjhala Abstract - This paper seeks to evaluate data analysis from an information visualization point of view.

More information

Graph Database Performance: An Oracle Perspective

Graph Database Performance: An Oracle Perspective Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective

More information

Getting Started with GRUFF

Getting Started with GRUFF Getting Started with GRUFF Introduction Most articles in this book focus on interesting applications of Linked Open Data (LOD). But this chapter describes some simple steps on how to use a triple store,

More information

OSLC Primer Learning the concepts of OSLC

OSLC Primer Learning the concepts of OSLC OSLC Primer Learning the concepts of OSLC It has become commonplace that specifications are precise in their details but difficult to read and understand unless you already know the basic concepts. A good

More information

LinkZoo: A linked data platform for collaborative management of heterogeneous resources

LinkZoo: A linked data platform for collaborative management of heterogeneous resources LinkZoo: A linked data platform for collaborative management of heterogeneous resources Marios Meimaris, George Alexiou, George Papastefanatos Institute for the Management of Information Systems, Research

More information

Leveraging existing Web frameworks for a SIOC explorer to browse online social communities

Leveraging existing Web frameworks for a SIOC explorer to browse online social communities Leveraging existing Web frameworks for a SIOC explorer to browse online social communities Benjamin Heitmann and Eyal Oren Digital Enterprise Research Institute National University of Ireland, Galway Galway,

More information

NoSQL and Hadoop Technologies On Oracle Cloud

NoSQL and Hadoop Technologies On Oracle Cloud NoSQL and Hadoop Technologies On Oracle Cloud Vatika Sharma 1, Meenu Dave 2 1 M.Tech. Scholar, Department of CSE, Jagan Nath University, Jaipur, India 2 Assistant Professor, Department of CSE, Jagan Nath

More information

María Elena Alvarado gnoss.com* elenaalvarado@gnoss.com Susana López-Sola gnoss.com* susanalopez@gnoss.com

María Elena Alvarado gnoss.com* elenaalvarado@gnoss.com Susana López-Sola gnoss.com* susanalopez@gnoss.com Linked Data based applications for Learning Analytics Research: faceted searches, enriched contexts, graph browsing and dynamic graphic visualisation of data Ricardo Alonso Maturana gnoss.com *Piqueras

More information

We have big data, but we need big knowledge

We have big data, but we need big knowledge We have big data, but we need big knowledge Weaving surveys into the semantic web ASC Big Data Conference September 26 th 2014 So much knowledge, so little time 1 3 takeaways What are linked data and the

More information

RDF Resource Description Framework

RDF Resource Description Framework RDF Resource Description Framework Fulvio Corno, Laura Farinetti Politecnico di Torino Dipartimento di Automatica e Informatica e-lite Research Group http://elite.polito.it Outline RDF Design objectives

More information

Semantic Web Applications

Semantic Web Applications Semantic Web Applications Graham Klyne Nine by Nine http://www.ninebynine.net/ 26 February 2004 Nine by Nine Who am I? Scientific, engineering and networked software systems architecture Motion capture,

More information

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++

1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ Answer the following 1) The postfix expression for the infix expression A+B*(C+D)/F+D*E is ABCD+*F/DE*++ 2) Which data structure is needed to convert infix notations to postfix notations? Stack 3) The

More information

A Tool for Evaluation and Optimization of Web Application Performance

A Tool for Evaluation and Optimization of Web Application Performance A Tool for Evaluation and Optimization of Web Application Performance Tomáš Černý 1 cernyto3@fel.cvut.cz Michael J. Donahoo 2 jeff_donahoo@baylor.edu Abstract: One of the main goals of web application

More information

Performance Tuning for the Teradata Database

Performance Tuning for the Teradata Database Performance Tuning for the Teradata Database Matthew W Froemsdorf Teradata Partner Engineering and Technical Consulting - i - Document Changes Rev. Date Section Comment 1.0 2010-10-26 All Initial document

More information

Semantic Search in Portals using Ontologies

Semantic Search in Portals using Ontologies Semantic Search in Portals using Ontologies Wallace Anacleto Pinheiro Ana Maria de C. Moura Military Institute of Engineering - IME/RJ Department of Computer Engineering - Rio de Janeiro - Brazil [awallace,anamoura]@de9.ime.eb.br

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C

Krishna Institute of Engineering & Technology, Ghaziabad Department of Computer Application MCA-213 : DATA STRUCTURES USING C Tutorial#1 Q 1:- Explain the terms data, elementary item, entity, primary key, domain, attribute and information? Also give examples in support of your answer? Q 2:- What is a Data Type? Differentiate

More information

How to Design and Create Your Own Custom Ext Rep

How to Design and Create Your Own Custom Ext Rep Combinatorial Block Designs 2009-04-15 Outline Project Intro External Representation Design Database System Deployment System Overview Conclusions 1. Since the project is a specific application in Combinatorial

More information

features at a glance

features at a glance hp availability stats and performance software network and system monitoring for hp NonStop servers a product description from hp features at a glance Online monitoring of object status and performance

More information

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Introduction to Web Services

Introduction to Web Services Department of Computer Science Imperial College London CERN School of Computing (icsc), 2005 Geneva, Switzerland 1 Fundamental Concepts Architectures & escience example 2 Distributed Computing Technologies

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

TECHNICAL Reports. Discovering Links for Metadata Enrichment on Computer Science Papers. Johann Schaible, Philipp Mayr

TECHNICAL Reports. Discovering Links for Metadata Enrichment on Computer Science Papers. Johann Schaible, Philipp Mayr TECHNICAL Reports 2012 10 Discovering Links for Metadata Enrichment on Computer Science Papers Johann Schaible, Philipp Mayr kölkölölk GESIS-Technical Reports 2012 10 Discovering Links for Metadata Enrichment

More information

Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys

Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph. Client: Brian Krzys Team Members: Christopher Copper Philip Eittreim Jeremiah Jekich Andrew Reisdorph Client: Brian Krzys June 17, 2014 Introduction Newmont Mining is a resource extraction company with a research and development

More information

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS

A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS A MEDIATION LAYER FOR HETEROGENEOUS XML SCHEMAS Abdelsalam Almarimi 1, Jaroslav Pokorny 2 Abstract This paper describes an approach for mediation of heterogeneous XML schemas. Such an approach is proposed

More information

Annotea and Semantic Web Supported Collaboration

Annotea and Semantic Web Supported Collaboration Annotea and Semantic Web Supported Collaboration Marja-Riitta Koivunen, Ph.D. Annotea project Abstract Like any other technology, the Semantic Web cannot succeed if the applications using it do not serve

More information

Pivot Charting in SharePoint with Nevron Chart for SharePoint

Pivot Charting in SharePoint with Nevron Chart for SharePoint Pivot Charting in SharePoint Page 1 of 10 Pivot Charting in SharePoint with Nevron Chart for SharePoint The need for Pivot Charting in SharePoint... 1 Pivot Data Analysis... 2 Functional Division of Pivot

More information

Model Driven Interoperability through Semantic Annotations using SoaML and ODM

Model Driven Interoperability through Semantic Annotations using SoaML and ODM Model Driven Interoperability through Semantic Annotations using SoaML and ODM JiuCheng Xu*, ZhaoYang Bai*, Arne J.Berre*, Odd Christer Brovig** *SINTEF, Pb. 124 Blindern, NO-0314 Oslo, Norway (e-mail:

More information

COLINDA - Conference Linked Data

COLINDA - Conference Linked Data Undefined 1 (0) 1 5 1 IOS Press COLINDA - Conference Linked Data Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name Surname, University,

More information

urika! Unlocking the Power of Big Data at PSC

urika! Unlocking the Power of Big Data at PSC urika! Unlocking the Power of Big Data at PSC Nick Nystrom Director, Strategic Applications Pittsburgh Supercomputing Center February 1, 2013 nystrom@psc.edu 2013 Pittsburgh Supercomputing Center Big Data

More information

Object-Process Methodology as a basis for the Visual Semantic Web

Object-Process Methodology as a basis for the Visual Semantic Web Object-Process Methodology as a basis for the Visual Semantic Web Dov Dori Technion, Israel Institute of Technology, Haifa 32000, Israel dori@ie.technion.ac.il, and Massachusetts Institute of Technology,

More information

Semantic Web Services for e-learning: Engineering and Technology Domain

Semantic Web Services for e-learning: Engineering and Technology Domain Web s for e-learning: Engineering and Technology Domain Krupali Shah and Jayant Gadge Abstract E learning has gained its importance over the traditional classroom learning techniques in past few decades.

More information

MS SQL Performance (Tuning) Best Practices:

MS SQL Performance (Tuning) Best Practices: MS SQL Performance (Tuning) Best Practices: 1. Don t share the SQL server hardware with other services If other workloads are running on the same server where SQL Server is running, memory and other hardware

More information

A Practical Approach to Process Streaming Data using Graph Database

A Practical Approach to Process Streaming Data using Graph Database A Practical Approach to Process Streaming Data using Graph Database Mukul Sharma Research Scholar Department of Computer Science & Engineering SBCET, Jaipur, Rajasthan, India ABSTRACT In today s information

More information

Visualization in Argument Based Recommender System

Visualization in Argument Based Recommender System Visualization in Argument Based Recommender System Preeti #, Ankit Rajpal #, Purnima Khurana * # Assistant Professor,Department of Computer Science Deen Dayal Upadhaya College, University of Delhi, Delhi,

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Publishing Linked Data Requires More than Just Using a Tool

Publishing Linked Data Requires More than Just Using a Tool Publishing Linked Data Requires More than Just Using a Tool G. Atemezing 1, F. Gandon 2, G. Kepeklian 3, F. Scharffe 4, R. Troncy 1, B. Vatant 5, S. Villata 2 1 EURECOM, 2 Inria, 3 Atos Origin, 4 LIRMM,

More information

Towards a reference architecture for Semantic Web applications

Towards a reference architecture for Semantic Web applications Towards a reference architecture for Semantic Web applications Benjamin Heitmann 1, Conor Hayes 1, and Eyal Oren 2 1 firstname.lastname@deri.org Digital Enterprise Research Institute National University

More information

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research

MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With

More information

OntoDBench: Ontology-based Database Benchmark

OntoDBench: Ontology-based Database Benchmark OntoDBench: Ontology-based Database Benchmark Stéphane Jean, Ladjel Bellatreche, Géraud Fokou, Mickaël Baron, and Selma Khouri LIAS/ISAE-ENSMA and University of Poitiers BP 40109, 86961 Futuroscope Cedex,

More information

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy

MyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy MyOra 3.5 SQL Tool for Oracle User Guide Kris Murthy Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL Editor...

More information

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering

Map-like Wikipedia Visualization. Pang Cheong Iao. Master of Science in Software Engineering Map-like Wikipedia Visualization by Pang Cheong Iao Master of Science in Software Engineering 2011 Faculty of Science and Technology University of Macau Map-like Wikipedia Visualization by Pang Cheong

More information

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration

More information

Transport System. Transport System Telematics. Concept of a system for building shared expert knowledge base of vehicle repairs

Transport System. Transport System Telematics. Concept of a system for building shared expert knowledge base of vehicle repairs Archives of Volume 7 Transport System Telematics B. Adamczyk, Ł. Konieczny, R. Burdzik Transport System Issue 2 May 2014 Concept of a system for building shared expert knowledge base of vehicle repairs

More information

ER/Studio Enterprise Portal 1.0.2 User Guide

ER/Studio Enterprise Portal 1.0.2 User Guide ER/Studio Enterprise Portal 1.0.2 User Guide Copyright 1994-2008 Embarcadero Technologies, Inc. Embarcadero Technologies, Inc. 100 California Street, 12th Floor San Francisco, CA 94111 U.S.A. All rights

More information

System Requirements - Table of Contents

System Requirements - Table of Contents Page 1 of 12 System Requirements - Table of Contents CommNet Server CommNet Agent CommNet Browser CommNet Browser as a Stand-Alone Application CommNet Browser as a Remote Web-Based Application CommNet

More information

Yet Another Triple Store Benchmark? Practical Experiences with Real-World Data

Yet Another Triple Store Benchmark? Practical Experiences with Real-World Data Yet Another Triple Store Benchmark? Practical Experiences with Real-World Data Martin Voigt, Annett Mitschick, and Jonas Schulz Dresden University of Technology, Institute for Software and Multimedia Technology,

More information