Graph Algorithms and Graph Databases. Dr. Daisy Zhe Wang CISE Department University of Florida August 27th 2014
|
|
|
- Marcus Turner
- 10 years ago
- Views:
Transcription
1 Graph Algorithms and Graph Databases Dr. Daisy Zhe Wang CISE Department University of Florida August 27th
2 Google Knowledge Graph -- Entities and Relationships 2
3 Graph Data! Facebook Social Network graph 3
4 Graph Data! (cont.) Citation Networks and Maps of Science,
5 Web as a Graph Web as a directed graph: Nodes: Webpages Edges: Hyperlinks 5
6 The BowTie of the Web 6
7 LARGE GRAPH DATABASES AND QUERIES
8 Classes of Large Graphs Random graphs Node degree is constrained Less common Scale-free graphs Distribution of node degree follows power law Most large graphs are scale-free Small world phenomena & hubs Harder to partition 8
9 Classes of Large Graphs 9
10 Organic Growth -> Scale Free 10
11 Let s Go Hyper! Hyper-edge A traditional edge is binary A hyper edge relates n nodes Child-of edge versus father, mother, child hyper-edge Hyper-node A traditional node represents one entity Hyper node represents a set of nodes Person node versus family hyper-node 11
12 Social Networks Scale LinkedIn 70 million users Facebook 500 million users 65 billion photos Queries Alice s friends Photos with friends Rich graph Types, attributes Photo7 Photo3 Bob Chris Ed David France Alice Photo2 George Hillary Photo1 Photo8 Photo5 Photo6 Photo4 12
13 Social Networks: Data Model Node Bob Alice ID, type, attributes Photo7 Photo1 Edge Chris David Photo2 Photo8 Connects two nodes Direction, type, attributes Photo3 Ed France George Hillary Photo5 Photo6 Photo4 13
14 Managing Graph Data Here we focus on online access Rather than offline access Queries Network analytics and graph mining Read Updates Data update: change node payload Structural update: modify nodes and edges 14
15 Example: Linked open Data[LoD] Semantic Web (Tim Berners-Lee) Scale Hundreds of data sets 30B+ tuples Queries SPARQL Domains 15
16 LoD Application Example ozone level visualization 2 data sets clean air status [data.gov] Castnet site information [epa.gov] 2 SPARQL queries data.gov epa. gov 16
17 Web of Data: Data Model (1) Resource Description Framework (RDF) Uniform resource identifier (URI) Triples! 1:subject, 2:predicate, 3:object ex.: philippe, made, idmesh_paper: (URIs) 1: 2: 3: 17
18 Web of Data: Data Model (2) Naturally forms (distributed) graphs Nodes URIs [subjects] URIs / literals [objects] Edges URIs [predicates] Directed Philippe made Idmesh paper 18
19 RDF Schemas (RDFS) Classes, inheritance Class, Property, SubClass, SubProperty Constraints on structure (predicate) Constraints on subjects (Domain) Constraints on objects (Range) Collections List, Bag 19
20 RDFS Example 20
21 RDF Storage (1) RDFa Embedding RDF information (e.g., metadata, markup of attributes) in HTML pages Supported by Google, Yahoo, etc Example: <body> <div about=" Massachusetts governor is <span rel="db:governor"> <span about=" Patrick </span>, </span> the nickname is "<span property="db:nickname">bay State</span>", and the capital <span rel="db:capital"> <span about=" has the nickname "<span property="db:nickname">beantown</span>". </span> </span> </div> </body> 21
22 RDF Storage (2) Various internal formats for DBMSs Giant triple table (triple stores) subject predicate object Property tables subject property1 property2 property3 Sub-graphs 22
23 SPARQL (1/2) Declarative query language over RDF data SPJ combinations of triple patterns E.g., Retrieve all students who live in Seattle and take a graduate course Select?s Where {?s is_a Student?s lives_in Seattle?s takes?c?c is_a GraduateCourse } 23
24 SPARQL Query Execution Typically start from bound variables and performs self-joins on giant triple table Select?s Where {?s is_a Student?s lives_in Seattle?s takes?c?c is_a GraduateCourse } π s σ p= is_a o= Student π s σ p= lives_in o= Seattle π s (σ p= takes o s σ p= is_a o= GraduateCourse ) 24
25 SPARQL (2/2) Beyond conjunctions of triple patterns Named graphs Disjunctions UNION OPTIONAL (semi-structured data model) Predicate filters FILTER (?price < 30) Duplicate handling (bag semantics) DISTINCT, REDUCED Wildcards Negation as failure WHERE {?x foaf:givenname?name. OPTIONAL {?x dc:date?date }. FILTER (!bound(?date)) } 25
26 Inference Queries Additional data can be inferred using various sets of logical rules PoliticianOf(Person,City) LivesIn(Person,City) RetriedFrom(Prof,School) WorkedFor(Prof,School) Specify which ones to use by entailment regimes [Glimm11] RDF Schema has 14 entailment rules E.g., (p,rdfs:domain,x) && (u, p, y) => (u,rdf:type,x) 26
27 Graph System Interfaces Relational: SQL Triple store: SPARQL Graph processing system/databases: API 27
28 Graph Database and Distributed Graph Processing Systems RDF-3X, JENA, RDF stores Oracle Triples Store and Semantic Web Technologies Neo4j mostly widely used graph dbms Google Pregel Microsoft Trinity GraphLab, GraphX 28
29 RDF-3X [Neumann08] Max Planck Institut für Informatik Thomas Neumann & Gerhard Weikum Open-Source Triple-table storage Indexing: Clustered B+-trees on all six SPO permutations Compression: Leaf pages store deltas for each value Query Optimizations: order-preserving mergejoins, join orderings 29
30 GraphLab Open-source research project from CMU Support parallel graph operations with consistency guarantees via locking Do not support transactions and ad-hoc query processing (different from graph databases such Neo4j) 30
31 Graph/Network datasets and Example Projects/Applications Stanford Large Network Dataset Collection: Benchmarking Graph Databases: PageRank on Semantic Networks with Application to Word Sense Disambiguation: ng/paperstoread/pagerankwsd_coling04.pdf 31
32 [REVIEW] PAGE RANK: LINK ANALYSIS OVER LARGE GRAPHS Adapted Slides from Jeff Ullman, Anand Rajaraman and Jure Leskovec from Stanford
33 How to organize the Web? First try: Human curated Web directories (e.g., Yahoo) Second try: Web Search Information Retrieval using inverted index Good for finding relevant docs in a small and trusted set (e.g., Newspaper articles, Patents) But: Web is huge, full of untrusted documents, random things, web spam, etc. E.g., Word Spam 33
34 2 Challenges of Web Search 1. Web contains many sources of information. Who to trust? Observation: Trustworthy pages point to each other! (in&out-links) 2. What is the best answer to query newspaper? No single right answer Observation: Pages that actually know about newspapers might all be pointing to many newspapers (outlink) Observation: a good newspaper is pointed to from many sources (inlink) 34
35 Solution: Ranking nodes on the Graph Based on Link Structures! All web pages are not equally importance can be captured by link structures There is large diversity in the web-graph node connectivity. Let s rank the pages by the link structure! Link Spam also possible but harder 35
36 Link Analysis Link Analysis algorithms: for computing importance of nodes in a graph Page Rank Topic-Specific (personalized) Page Rank Mining for Communities Web Spam Detection Algorithms
37 How to Rank Web Pages Based on Link Analysis Web pages are not equally important vs. Page is more important if it has more inlinks Inlinks as votes has 23,400 inlinks has 1 inlink Which inlinks are more important? Recursive question! Links from important pages count more
38 Example PageRank scores 38
39 Simple recursive formulation Each link s vote is proportional to the importance of its source page If page j with importance r j has n outlinks, each link gets r j /n votes Page j s own importance is the sum of the votes on its inlinks r j =? = r i /3 + r k /4
40 Simple flow model y Yahoo y/2 A vote from an important page worth more A page is important if it is pointed to by other important pages a/2 Amazon a y/2 m a/2 M soft m Define a rank r j for page j r j = r /d i j i i d i is out-degree of node i flow equations: y = y /2 + a /2 a = y /2 + m m = a /2
41 Solving the flow equations 3 equations, 3 unknowns, no constants No unique solution All solutions equivalent modulo scale factor flow equations: y = y /2 + a /2 a = y /2 + m m = a /2 Additional constraint forces uniqueness y+a+m = 1 y = 2/5, a = 2/5, m = 1/5 Gaussian elimination method works for small examples, but we need a better method for large graphs
42 *Matrix formulation Stochastic adjacency matrix M Matrix M has one row and one column for each web page Let page i has di outlinks If i j, then M ij =1/di, Else M ij =0 M is a column stochastic matrix, where columns sum to 1 Rank vector r : vector with one entry per web page r i is the importance score of page i i r i = 1 The flow equations can be written as r = Mr
43 Matrix Formulation Example Yahoo y a m y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0 Amazon M soft y = y /2 + a /2 a = y /2 + m m = a /2 r = Mr y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m
44 A More General Example: Flow Equations = Matrix Formulation Remember the flow equation: r j = i j r i /d i Flow equation in the matrix form: r = Mr Suppose page i links to 3 pages, including j
45 Rank Vector r = Eigenvector of M The flow equations can be written r = Mr The rank vector r is an eigenvector of the stochastic web matrix M x is an eigenvector with the corresponding eigenvalue if: A x = x In fact, its first or principal/dominant eigenvector, with corresponding eigenvalue 1 We can now efficiently solve for r! The method is called Power iteration
46 Power Iteration method Given a web graph with N nodes, where the nodes are pages and edges are hyperlinks Power iteration: a simple iterative scheme Suppose there are N web pages Initialize: r 0 = [1/N,.,1/N] T Iterate: r k+1 = Mr k Stop when r k+1 - r k 1 < x 1 = 1 i N x i is the L1 norm Can use any other vector norm e.g., Euclidean norm r (t+1) j = r(t) i /d i j i d i is out-degree of node i
47 Power Iteration Example Yahoo y a m y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0 Amazon M soft y = y /2 + a /2 a = y /2 + m m = a /2 r = Mr y 1/2 1/2 0 y a = 1/2 0 1 a m 0 1/2 0 m
48 Power Iteration Example (cont.) Amazon Yahoo M soft y a m y 1/2 1/2 0 a 1/2 0 1 m 0 1/2 0 Power Iteration: Set r j =1/N 1: r j = i j r i /d i (i.e., r' = Mr) 2: r=r Goto 1 y a = m 1/3 1/3 1/3 1/3 1/2 1/6 5/12 1/3 1/4 3/8 11/24 1/6... 2/5 2/5 1/5
49 Why Power Iteration works? Power iteration: A method for finding dominant eigenvector (the vector corresponding to the largest eigenvalue) r(1)=m r(0) r(2)=m r1=m 2 r0 r(3)=m r2=m 3 r0 Claim: Sequence M r0,m 2 r0, M k r0, approaches the dominant eigenvector of M 49
50 Random Walk Interpretation Imagine a random web surfer At any time t, surfer is on some page P At time t+1, the surfer follows an outlink from P uniformly at random Ends up on some page Q linked from P Process repeats indefinitely Let p(t) be a vector whose i th component is the probability that the surfer is at page i at time t p(t) is a probability distribution on pages p(t) is a link-based importance rank t->
51 The stationary distribution Where is the surfer at time t+1? Follows a link uniformly at random p(t+1) = Mp(t) Suppose the random walk reaches a state such that p(t+1) = Mp(t) = p(t) Then p(t) is called a stationary distribution for the random walk Our rank vector r satisfies r = Mr So r is a stationary distribution p(t) for the random surfer
52 *Existence and Uniqueness A central result from the theory of random walks (aka Markov processes): For graphs that satisfy certain conditions (i.e., strongly connected, and no dead ends), the stationary distribution is unique and eventually will be reached no matter what the initial probability distribution at time t = 0. Is the Web a strongly connected graph?
53 2 problems: PageRank: Problems (1) Spider traps (all out-links are within the group) Eventually spider traps absorb all importance (2) Some pages are dead ends (have no out-links) Such pages cause importance to leak out 53
54 Spider traps A group of pages is a spider trap if there are no links from within the group to outside the group Random surfer gets trapped Spider traps violate the conditions needed for the random walk theorem
55 Random teleports The Google solution for spider traps At each time step, the random surfer has two options: With probability, follow a link at random With probability 1-, jump to some page uniformly at random Common values for are in the range 0.8 to 0.9 Results: Surfer will teleport out of spider trap within a few time steps
56 Dead ends Pages with no outlinks are dead ends for the random surfer Nowhere to go on next step All importance becomes zero
57 Teleport Dealing with dead-ends Follow random teleport links with probability 1.0 from dead-ends Adjust matrix accordingly Prune and propagate Preprocess the graph to eliminate dead-ends Might require multiple passes Compute page rank on reduced graph Approximate values for deadends by propagating values from reduced graph
58 More Graph Analytics and Mining Algorithms Wiki Book on Graph Algorithms: orithms Shortest path finding Topological sorting Minimum spanning tree Strongly connected components Cliques 58
59 Summary Graph Data and Graph Algorithms Page Rank Algorithm Shortest path, Strongly connected components, etc. Graph Database and Queries (e.g., SPARQL) Project Ideas find relevant dataset and papers problem statement and ideas for techniques
Part 1: Link Analysis & Page Rank
Chapter 8: Graph Data Part 1: Link Analysis & Page Rank Based on Leskovec, Rajaraman, Ullman 214: Mining of Massive Datasets 1 Exam on the 5th of February, 216, 14. to 16. If you wish to attend, please
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman 2 Intuition: solve the recursive equation: a page is important if important pages link to it. Technically, importance = the principal
Search engines: ranking algorithms
Search engines: ranking algorithms Gianna M. Del Corso Dipartimento di Informatica, Università di Pisa, Italy ESP, 25 Marzo 2015 1 Statistics 2 Search Engines Ranking Algorithms HITS Web Analytics Estimated
Practical Graph Mining with R. 5. Link Analysis
Practical Graph Mining with R 5. Link Analysis Outline Link Analysis Concepts Metrics for Analyzing Networks PageRank HITS Link Prediction 2 Link Analysis Concepts Link A relationship between two entities
Estimating PageRank Values of Wikipedia Articles using MapReduce
Estimating PageRank Values of Wikipedia Articles using MapReduce Due: Sept. 30 Wednesday 5:00PM Submission: via Canvas, individual submission Instructor: Sangmi Pallickara Web page: http://www.cs.colostate.edu/~cs535/assignments.html
The PageRank Citation Ranking: Bring Order to the Web
The PageRank Citation Ranking: Bring Order to the Web presented by: Xiaoxi Pang 25.Nov 2010 1 / 20 Outline Introduction A ranking for every page on the Web Implementation Convergence Properties Personalized
1 o Semestre 2007/2008
Departamento de Engenharia Informática Instituto Superior Técnico 1 o Semestre 2007/2008 Outline 1 2 3 4 5 Outline 1 2 3 4 5 Exploiting Text How is text exploited? Two main directions Extraction Extraction
Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time
Big Data Technology Motivating NoSQL Databases: Computing Page Importance Metrics at Crawl Time Edward Bortnikov & Ronny Lempel Yahoo! Labs, Haifa Class Outline Link-based page importance measures Why
Link Analysis. Chapter 5. 5.1 PageRank
Chapter 5 Link Analysis One of the biggest changes in our lives in the decade following the turn of the century was the availability of efficient and accurate Web search, through search engines such as
Part 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia [email protected] Sameh Elnikety Microsoft Research Redmond, WA, USA [email protected]
Social Network Mining
Social Network Mining Data Mining November 11, 2013 Frank Takes ([email protected]) LIACS, Universiteit Leiden Overview Social Network Analysis Graph Mining Online Social Networks Friendship Graph Semantics
Asking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Social Media Mining. Network Measures
Klout Measures and Metrics 22 Why Do We Need Measures? Who are the central figures (influential individuals) in the network? What interaction patterns are common in friends? Who are the like-minded users
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS
ABSTRACT KEYWORD SEARCH OVER PROBABILISTIC RDF GRAPHS In many real applications, RDF (Resource Description Framework) has been widely used as a W3C standard to describe data in the Semantic Web. In practice,
MapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
Lecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany MapReduce II MapReduce II 1 / 33 Outline 1. Introduction
The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.)
Chapter 7 Google PageRank The world s largest matrix computation. (This chapter is out of date and needs a major overhaul.) One of the reasons why Google TM is such an effective search engine is the PageRank
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
Graph Processing and Social Networks
Graph Processing and Social Networks Presented by Shu Jiayu, Yang Ji Department of Computer Science and Engineering The Hong Kong University of Science and Technology 2015/4/20 1 Outline Background Graph
SPARQL: Un Lenguaje de Consulta para la Web
SPARQL: Un Lenguaje de Consulta para la Web Semántica Marcelo Arenas Pontificia Universidad Católica de Chile y Centro de Investigación de la Web M. Arenas SPARQL: Un Lenguaje de Consulta para la Web Semántica
Teaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks
Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics Examination Scheme
Network Graph Databases, RDF, SPARQL, and SNA
Network Graph Databases, RDF, SPARQL, and SNA NoCOUG Summer Conference August 16 2012 at Chevron in San Ramon, CA David Abercrombie Data Analytics Engineer, Tapjoy [email protected] About me
HIGH PERFORMANCE BIG DATA ANALYTICS
HIGH PERFORMANCE BIG DATA ANALYTICS Kunle Olukotun Electrical Engineering and Computer Science Stanford University June 2, 2014 Explosion of Data Sources Sensors DoD is swimming in sensors and drowning
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Social Network Analysis
Social Network Analysis Challenges in Computer Science April 1, 2014 Frank Takes ([email protected]) LIACS, Leiden University Overview Context Social Network Analysis Online Social Networks Friendship Graph
Social Media Mining. Graph Essentials
Graph Essentials Graph Basics Measures Graph and Essentials Metrics 2 2 Nodes and Edges A network is a graph nodes, actors, or vertices (plural of vertex) Connections, edges or ties Edge Node Measures
Analysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
RDF y SPARQL: Dos componentes básicos para la Web de datos
RDF y SPARQL: Dos componentes básicos para la Web de datos Marcelo Arenas PUC Chile & University of Oxford M. Arenas RDF y SPARQL: Dos componentes básicos para la Web de datos Valladolid 2013 1 / 61 Semantic
Databases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
Big Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs
1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE
Enhancing the Ranking of a Web Page in the Ocean of Data
Database Systems Journal vol. IV, no. 3/2013 3 Enhancing the Ranking of a Web Page in the Ocean of Data Hitesh KUMAR SHARMA University of Petroleum and Energy Studies, India [email protected] In today
Systems and Algorithms for Big Data Analytics
Systems and Algorithms for Big Data Analytics YAN, Da Email: [email protected] My Research Graph Data Distributed Graph Processing Spatial Data Spatial Query Processing Uncertain Data Querying & Mining
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations
Graphs over Time Densification Laws, Shrinking Diameters and Possible Explanations Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU 1 Introduction What can we do with graphs? What patterns
Graph Database Performance: An Oracle Perspective
Graph Database Performance: An Oracle Perspective Xavier Lopez, Ph.D. Senior Director, Product Management 1 Copyright 2012, Oracle and/or its affiliates. All rights reserved. Program Agenda Broad Perspective
From relational databases to linked data:r for the semantic web. Jose Quesada, Max Planck Institute, Berlin
From relational databases to linked data:r for the semantic web Jose Quesada, Max Planck Institute, Berlin Who this talk targets You have big data; you use a database You have an evolving schema definition.
Introduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
Extending the Linked Data API with RDFa
Extending the Linked Data API with RDFa Steve Battle 1, James Leigh 2, David Wood 2 1 Gloze Ltd, UK [email protected] 2 3 Round Stones, USA James, [email protected] Linked data is about connecting
Graph Theory and Complex Networks: An Introduction. Chapter 08: Computer networks
Graph Theory and Complex Networks: An Introduction Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, [email protected] Chapter 08: Computer networks Version: March 3, 2011 2 / 53 Contents
Semantic Web Standard in Cloud Computing
ETIC DEC 15-16, 2011 Chennai India International Journal of Soft Computing and Engineering (IJSCE) Semantic Web Standard in Cloud Computing Malini Siva, A. Poobalan Abstract - CLOUD computing is an emerging
Mining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud. Tejas Bharat Thorat Prof.RanjanaR.Badre Computer Engineering Department Computer
GRAPH DATABASE SYSTEMS. h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1
GRAPH DATABASE SYSTEMS h_da Prof. Dr. Uta Störl Big Data Technologies: Graph Database Systems - SoSe 2016 1 Use Case: Route Finding Source: Neo Technology, Inc. h_da Prof. Dr. Uta Störl Big Data Technologies:
CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensie Computing Uniersity of Florida, CISE Department Prof. Daisy Zhe Wang Map/Reduce: Simplified Data Processing on Large Clusters Parallel/Distributed
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata
Generating SQL Queries Using Natural Language Syntactic Dependencies and Metadata Alessandra Giordani and Alessandro Moschitti Department of Computer Science and Engineering University of Trento Via Sommarive
How To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
Detection and Elimination of Duplicate Data from Semantic Web Queries
Detection and Elimination of Duplicate Data from Semantic Web Queries Zakia S. Faisalabad Institute of Cardiology, Faisalabad-Pakistan Abstract Semantic Web adds semantics to World Wide Web by exploiting
Complex Networks Analysis: Clustering Methods
Complex Networks Analysis: Clustering Methods Nikolai Nefedov Spring 2013 ISI ETH Zurich [email protected] 1 Outline Purpose to give an overview of modern graph-clustering methods and their applications
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
Scaling Up HBase, Hive, Pegasus
CSE 6242 A / CS 4803 DVA Mar 7, 2013 Scaling Up HBase, Hive, Pegasus Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko,
Mining Social-Network Graphs
342 Chapter 10 Mining Social-Network Graphs There is much information to be gained by analyzing the large-scale data that is derived from social networks. The best-known example of a social network is
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis
A Comparison Framework of Similarity Metrics Used for Web Access Log Analysis Yusuf Yaslan and Zehra Cataltepe Istanbul Technical University, Computer Engineering Department, Maslak 34469 Istanbul, Turkey
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn
A linear algebraic method for pricing temporary life annuities
A linear algebraic method for pricing temporary life annuities P. Date (joint work with R. Mamon, L. Jalen and I.C. Wang) Department of Mathematical Sciences, Brunel University, London Outline Introduction
LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model
LINKED DATA EXPERIENCE AT MACMILLAN Building discovery services for scientific and scholarly content on top of a semantic data model 22 October 2014 Tony Hammond Michele Pasin Background About Macmillan
Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari [email protected]
Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari [email protected] Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content
Removing Web Spam Links from Search Engine Results
Removing Web Spam Links from Search Engine Results Manuel EGELE [email protected], 1 Overview Search Engine Optimization and definition of web spam Motivation Approach Inferring importance of features
Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher
Collaborative Filtering Scalable Data Analysis Algorithms Claudia Lehmann, Andrina Mascher Outline 2 1. Retrospection 2. Stratosphere Plans 3. Comparison with Hadoop 4. Evaluation 5. Outlook Retrospection
Corso di Biblioteche Digitali
Corso di Biblioteche Digitali Vittore Casarosa [email protected] tel. 050-315 3115 cell. 348-397 2168 Ricevimento dopo la lezione o per appuntamento Valutazione finale 70-75% esame orale 25-30% progetto
Big Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
Perron vector Optimization applied to search engines
Perron vector Optimization applied to search engines Olivier Fercoq INRIA Saclay and CMAP Ecole Polytechnique May 18, 2011 Web page ranking The core of search engines Semantic rankings (keywords) Hyperlink
HadoopRDF : A Scalable RDF Data Analysis System
HadoopRDF : A Scalable RDF Data Analysis System Yuan Tian 1, Jinhang DU 1, Haofen Wang 1, Yuan Ni 2, and Yong Yu 1 1 Shanghai Jiao Tong University, Shanghai, China {tian,dujh,whfcarter}@apex.sjtu.edu.cn
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu 10-30-2014
LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING ----Changsheng Liu 10-30-2014 Agenda Semi Supervised Learning Topics in Semi Supervised Learning Label Propagation Local and global consistency Graph
Big Data Analytics. Rasoul Karimi
Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Introduction
Efficient Identification of Starters and Followers in Social Media
Efficient Identification of Starters and Followers in Social Media Michael Mathioudakis Department of Computer Science University of Toronto [email protected] Nick Koudas Department of Computer Science
Machine Learning over Big Data
Machine Learning over Big Presented by Fuhao Zou [email protected] Jue 16, 2014 Huazhong University of Science and Technology Contents 1 2 3 4 Role of Machine learning Challenge of Big Analysis Distributed
Semantic Stored Procedures Programming Environment and performance analysis
Semantic Stored Procedures Programming Environment and performance analysis Marjan Efremov 1, Vladimir Zdraveski 2, Petar Ristoski 2, Dimitar Trajanov 2 1 Open Mind Solutions Skopje, bul. Kliment Ohridski
SGL: Stata graph library for network analysis
SGL: Stata graph library for network analysis Hirotaka Miura Federal Reserve Bank of San Francisco Stata Conference Chicago 2011 The views presented here are my own and do not necessarily represent the
Bigdata : Enabling the Semantic Web at Web Scale
Bigdata : Enabling the Semantic Web at Web Scale Presentation outline What is big data? Bigdata Architecture Bigdata RDF Database Performance Roadmap What is big data? Big data is a new way of thinking
CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2009 Lecture 15 - Data Warehousing: Cubes Final Exam Overview Open books and open notes No laptops and no other mobile devices
Big Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline
A Comparison of Database Query Languages: SQL, SPARQL, CQL, DMX
ISSN: 2393-8528 Contents lists available at www.ijicse.in International Journal of Innovative Computer Science & Engineering Volume 3 Issue 2; March-April-2016; Page No. 09-13 A Comparison of Database
Social Networks and Social Media
Social Networks and Social Media Social Media: Many-to-Many Social Networking Content Sharing Social Media Blogs Microblogging Wiki Forum 2 Characteristics of Social Media Consumers become Producers Rich
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD. Dr. Buğra Gedik, Ph.D.
LARGE-SCALE GRAPH PROCESSING IN THE BIG DATA WORLD Dr. Buğra Gedik, Ph.D. MOTIVATION Graph data is everywhere Relationships between people, systems, and the nature Interactions between people, systems,
Microblogging Queries on Graph Databases: An Introspection
Microblogging Queries on Graph Databases: An Introspection ABSTRACT Oshini Goonetilleke RMIT University, Australia [email protected] Timos Sellis RMIT University, Australia [email protected]
RDF Support in Oracle Oracle USA Inc.
RDF Support in Oracle Oracle USA Inc. 1. Introduction Resource Description Framework (RDF) is a standard for representing information that can be identified using a Universal Resource Identifier (URI).
RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
Cloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
A Comparison of Current Graph Database Models
A Comparison of Current Graph Database Models Renzo Angles Universidad de Talca (Chile) 3rd Int. Workshop on Graph Data Management: Techniques and applications (GDM 2012) 5 April, Washington DC, USA Outline
Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer
Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial
