Run$me Query Op$miza$on
|
|
|
- Evangeline Garrison
- 10 years ago
- Views:
Transcription
1 Run$me Query Op$miza$on Robust Op$miza$on for Graphs All Rights Reserved 1
2 RDF Join Order Op$miza$on Typical approach Assign es$mated cardinality to each triple pabern. Bigdata uses the fast range counts Start with the most selec$ve triple pabern For each remaining join Propagate variable bindings Re- es$mate cardinality of remaining joins Cardinality es$ma$on error Grows exponen.ally with join path length All Rights Reserved 2
3 Run$me Query Op$mizer (RTO) Inspired by ROX (Run$me Op$miza$on for XQuery) Online, incremental and adap$ve dynamic programming. Breadth first es$ma$on of actual cost of join paths. Cost func$on is cumula$ve intermediate cardinality. Extensions for SPARQL seman$cs. Query plans op$mized in the data for each query. Never produces a bad query plan. Does not rely on sta$s$cal summaries. Recognizes correla$ons in the data for the specific query. Can improve the running $me for some queries by 10x x. Maximum payoff for high volume analy$c queries All Rights Reserved 3
4 Join Graphs Model set of joins as ver$ces Connected by join variables Implicit connec$ons if variables shared through filters. Requires an index for each join Easily sa$sfied for RDF. Star$ng ver$ces Determined by their range counts All Rights Reserved 4
5 Join Graph for LUBM Q9 v0?x a Student v1?y a ub:faculty v2?z a ub:course v3?x ub:advisor?y v4?y ub:teacherof?z v5?x ub:takescourse?z SELECT?x?y?z WHERE {?x a ub:student. # v0?y a ub:faculty. # v1?z a ub:course. # v2?x ub:advisor?y. # v3?y ub:teacherof?z. # v4?x ub:takescourse?z. # v5-6 Vertices - 3 Variables, each in 3 vertices. - No filters All Rights Reserved 5
6 Annotate with Range Counts 6463 v0?x a Student v3 3101?x ub:advisor?y - Range count each vertex. - Minimum cardinality for {1,4,2. - Will begin with those vertices. 540 v1 v4 1627?y a ub:faculty?y ub:teacherof?z v2 v5?z a ub:course?x ub:takescourse?z All Rights Reserved 6
7 Incremental Join Path Es$ma$on Evalua$on proceeds in rounds. Increase sample size in each round. Resample to reduce sampling bias. Es$ma$on error reduced as sample size increases. Es$mate join cardinality using cut off joins Push random sample of N tuples into each vertex Join cut off when N tuples are output (handles overflow). Output cardinality es$mate can underflow (raise N in next round). Join paths pruned when dominated by another path for the same ver$ces All Rights Reserved 7
8 Round v0?x a Student v3 3101?x ub:advisor?y Path : sumestcard - [1 4] : [4 2] : v1 v4 1627?y a ub:faculty?y ub:teacherof?z v2 v5?z a ub:course?x ub:takescourse?z All Rights Reserved 8
9 Round v0?x a Student 540 v1?y a ub:faculty v3 v4 3101?x ub:advisor?y 1627?y ub:teacherof?z Path : sumestcard - [1 4 2] : [1 4 3] : [1 4 5] : [4 2 1] : [4 2 3] : [4 2 5] : v2 v5?z a ub:course?x ub:takescourse?z All Rights Reserved 9
10 Round v0?x a Student 540 v1?y a ub:faculty v3 v4 3101?x ub:advisor?y 1627?y ub:teacherof?z Path : sumestcard - [ ] : [ ] : [ ] : [ ] : [ ] : [ ] : [ ] : [ ] : v2 v5?z a ub:course?x ub:takescourse?z All Rights Reserved 10
11 Round v0?x a Student 540 v1?y a ub:faculty v3 v4 3101?x ub:advisor?y 1627?y ub:teacherof?z Path : sumestcard - [ ] : [ ] : [ ] : [ ] : [ ] : v2 v5?z a ub:course?x ub:takescourse?z All Rights Reserved 11
12 Round v0 v Path : sumestcard - [ ] : 1896?x a Student?x ub:advisor?y 540 v1 v4 1627?y a ub:faculty?y ub:teacherof?z v2 v5?z a ub:course?x ub:takescourse?z All Rights Reserved 12
13 Selected join path vertex srccard * join hit ratio ( in limit out ) = sumestcard 1 * ( ) = E * 2.97 ( ) = * 6.62 ( ) = * 0.02 ( ) = * 0.64 ( ) = * 1 ( ) = % Faster than the join path chosen by the static join order optimizer. vertex : The vertices in the join path in the selected evaluation order. srccard : The estimated input cardinality to each join (E means Exact). ratio : The estimated join hit ratio for each join. in : The #of input solutions for each cutoff join. limit : The sample size for each cutoff join. out : The #of output solutions for each cutoff join. sumestcard : The cumulative estimated cardinality of the join path All Rights Reserved 13
14 Sub- Groups, Sub- Select, etc. v0?x a Student v1?y a ub:faculty v2?z a ub:course v3?x ub:advisor?y v4?y ub:teacherof?z v5?x ub:takescourse?z Vectored leb- to- right evalua$on Solu$ons flowing into a group depend on join order in the parent SELECT?x?y?z WHERE {?y a ub:faculty. # v1?z a ub:course. # v2?x ub:advisor?y. # v3?y ub:teacherof?z. # v4 {?x a ub:student. # v0?x ub:takescourse?z. # v5 FlaBen into join graph Path extensions MUST respect SPARQL group boundaries Sample join path with groups: 1 => 4 => 3 { => 5 => 0 => All Rights Reserved 14
15 Bigdata Vectored Query Engine 15
16 Bigdata Query Engine Vectored query engine High concurrency and vectored operator evalua$on. Physical query plan (BOPs) Supports pipelined, blocked, and at- once operators. Chunks queue up for each operator. Chunks transparently mapped across a cluster. Query engine instance runs on: Each data service; plus Each node providing a SPARQL endpoint. 16
17 Query Hints Virtual triples Scope to query, graph pabern group, or join. Allow fine grained control over Analy$c query mode Evalua$on order Vector size Etc. SELECT?x?y?z WHERE { hint:query hint:vectorsize ?x a ub:graduatestudent.?y a ub:university.?z a ub:department.?x ub:memberof?z.?z ub:suborganizationof?y.?x ub:undergraduatedegreefrom?y. 17
18 New Bigdata Join Operators Pipeline joins (fast, incremental). Map binding sets over the shards, execu$ng joins close to the data. Faster for single machine and much faster for distributed query. First results appear with very low latency Hash joins against an access path Scan the access path, probing the hash table for joins. Hash joins on a cluster can saturate the disk transfer rate! Solu$on set hash joins Combine sets of solu$ons on shared join variables. Used for Sub- Select and Group Graph PaBerns (aka join groups) Merge joins Used when a series of group graph paberns share a common join variable Think OPTIONALs. Java : Must sort the solu$on sets first, then a single pass to do the join. HTree : Linear in the data (it does not need to sort the solu$ons). 18
19 Query Evalua$on Logical Query Model Physical Query Model SPARQL Parser AST AST Optimizers AST Query Plan Generator BOPs Vectored Query Engine Solutions Fast Range Counts Resolve Terms database Access Paths 19
20 Data Flow Evalua$on Leb- to- right evalua$on Physical query plan is operator sequence (not tree) op1 op2 op3 Parallelism Within operator Across operators Across queries Operator annota$ons used extensively. 20
21 Pipeline Joins Fast, vectored, indexed join. Very low latency to first solu$ons. Mapped across shards on a cluster. Joins execute close to the data. Faster for single machine Much faster for distributed query. 21
22 Preparing a query Original query: SELECT?x WHERE {?x a ub:graduatestudent ; ub:takescourse < GraduateCourse0>. Translated query: query :- (x 8 256) ^ (x ) Query execu$on plan (access paths selected, joins reordered): query :- pos(x ) ^ spo(x 8 256) 22
23 Pipeline Join Execu$on 23
24 TEE Solu$ons flow to both the default and alt sinks. UNION(A,B) default sink TEE A B alt. sink Generalizes to n- ary UNION 24
25 Condi$onal Flow Data flows to either the default sink or the alt sink, depending on the condi$on. default sink cond? yes no alt. sink Branching pabern depends on the query logic. 25
26 Sub- Group, Sub- Select, etc. Build a hash index from the incoming solu$ons (at- once). Run the sub- select, sub- group, etc. Solu$ons from the hash index are vectored into the sub- plan. Sub- Select must hide variables which are not projected. Join against the hash index (at- once). HashIndexOp build index group SolutionSetHashJoinOp Normal Optional Exists Not Exists hash index 26
27 Named Solu$on Sets Support solu$on set reuse Evalua$on Run before main WHERE clause. Exogenous solu$ons fed into named subquery. Results wriben on hash index. Hash index INCLUDEd into query. Example is BSBM BI Q5 Select?country?product?nrOfReviews?avgPrice WITH { Select?country?product (count(?review) As?nrOfReviews) {?product a < v01/instances/producttype4>.?review bsbm:reviewfor?product.?review rev:reviewer?reviewer.?reviewer bsbm:country?country. Group By?country?product AS %namedset1 WHERE { { Select?country (max(?nrofreviews) As?maxReviews) { INCLUDE %namedset1 Group By?country { Select?product (avg(xsd:float(str(?price))) As?avgPrice) {?product a < v01/instances/producttype4>.?offer bsbm:product?product.?offer bsbm:price?price. Group By?product INCLUDE %namedset1. FILTER(?nrOfReviews=?maxReviews) Order By desc(?nrofreviews)?country?product 27
28 Named Subquery Evalua$on Run the sub- query on the query engine. Output wriben onto hash index or stream. Hash index is a good choice if the join variables can be iden$fied in advance, but that is oben not possible since the posi$on of the INCLUDE(s) for that solu$on set in the rest of the query are not yet decided. Sub- Query Sub- Plan Query Engine SOLUTIONS Hash Index Stream Any INCLUDE SolutionSetHashJoinOp HashJoinOp (Hash join against AP scan) Nested loop scan (joining the named solution set against exogenous solutions with low cardinality) 28
29 BoBom Up Evalua$on SPARQL seman$cs specify as if bobom up evalua$on For many queries, leb- to- right evalua$on produces the same results. Badly designed leb joins must be libed out and run first. SELECT * { :x1 :p?v. OPTIONAL { :x3 :q?w. OPTIONAL { :x2 :p?v rewrite SELECT * { WITH { SELECT?w?v { :x3 :q?w. OPTIONAL { :x2 :p?v AS %namedset1 :x1 :p?v. OPTIONAL { INCLUDE %namedset1 29
30 Merge Join PaBern High performance n- way join Linear in the data (on HTree). Series of group graph paberns must share a common join variable. Common pabern in SPARQL Select something and then fill in op$onal informa$on about it through op$onal join groups. SELECT (SAMPLE(?var9) AS?var1)?var2?var3 { { SELECT DISTINCT?var3 WHERE {?var3 rdf:type pol:politician.?var3 pol:hasrole?var6.?var6 pol:party "Democrat". OPTIONAL {?var3 p1:name?var9 OPTIONAL {?var10 p2:votedby?var3.?var10 rdfs:label?var2.... GROUP BY?var2?var3 30
31 Merge Joins (cont.) SELECT (SAMPLE(?var9) AS?var1)?var2?var3 { { SELECT DISTINCT?var3 WHERE {?var3 rdf:type pol:politician.?var3 pol:hasrole?var6.?var6 pol:party "Democrat". OPTIONAL {?var3 p1:name?var9 OPTIONAL {?var10 p2:votedby?var3.?var10 rdfs:label?var2.... GROUP BY?var2?var3 rewrite SELECT WITH { as %set1 WITH { as %set2 WITH { as %set3 WHERE { INCLUDE %set1. OPTIONAL { INCLUDE %set2. OPTIONAL { INCLUDE %set3. Handled as one n- way merge join. 31
32 Parallel Graph Query Plans Partly ordered scans Parallel AP reads against mul$ple shards Hash par$$oned Dis$nct Order by Aggrega$on (when not pipelined) Default Graph Queries Global index view provides RDF Merge seman$cs. DISTINCT TRIPLES across the set of contexts in the default graph 32
33 Parallel Hash Joins for Unselec$ve APs AP must read a lot of data with reasonable locality Shard view: journal + index segment(s). One IO per leaf on segments. 90%+ data in key- order on disk Narrow stride (e.g., OSP or POS) Crossover once sequen$al IO dominates vectored nested index join Key range scans turn into sequen$al IO in the cluster Goal is to saturate the disk read bandwidth Strategy A Map intermediate solu$ons against target shards Build hash index on target nodes Parallel local AP scans, probing hash index for joins. Strategy B Hash par$$on intermediate solu$ons on join variables. Parallel local AP scans against spanned shard(s) Vectored hash par$$on of solu$ons read from APs. Vectored probe in distributed hash index par$$ons for joins. 33
34 Examining a Query Plan 34
35 NSS Explain SPARQL Query As given SPARQL Parser SPARQL Parse Tree As parsed Original AST Generated abstract operator model Op$mized AST Op$mized abstract operator model Query Plan Physical query plan Query Evalua$on Sta$s$cs As executed AST Resolve Terms Logical Query Model AST Optimizers database AST Fast Range Counts Query Plan Generator Physical Query Model BOPs Access Paths Vectored Query Engine Solutions 35
36 Query Evalua$on Sta$s$cs As- run or live Explain provides as- run view. Live view available by monitoring long running queries. Summary sta$s$cs solu$ons=128, chunks=52, subqueries=0, elapsed=2467ms. solu$ons chunks subqueries elapsed #of generated solu$ons. #of chunks in the output solu$ons. #of subqueries evaluated (each in their own table) total elapsed clock $me for this query. Followed by a detailed table of per- operator sta$s$cs. For the top- level query For each sub- query 36
37 Detailed Operator Sta$s$cs (Many columns are for the cluster) queryid evalorder bopid, predid bopsummary predsummary query or subquery iden$fier total then one per operator in order. Index of operator or predicate in plan. Short summary for each operator Short summary for access path (AP). fastrangecount Es$mated cardinality for AP. 37
38 Units & Chunks In & Out unitsin chunksin unitsout chunksout typeerrors joinra$o #of solu$ons flowing into an operator #of chunks of solu$ons flowing into an op. #of solu$ons flowing out of an operator. #of chunks of solu$ons flowing out of an op. #of run$me SPARQL type errors (solu$on dropped) Ra$o of solu$ons out to solu$ons in. Indicator of selec$vity and cardinality explosion. 38
39 Query Engine Metrics NSS / Ganglia Work Queues one per operator: blockedworkqueuecount blockedworkqueuerunningtotal Operators many per query plan: operatorac$vecount operatorhaltcount operatorstartcount Logical Query Model Physical Query Model operatortasksperquery Queries: queriespersecond SPARQL Parser AST AST Optimizers AST Query Plan Generator BOPs Vectored Query Engine Solutions querydonecount queryerrorcount querystartcount Resolve Terms database Fast Range Counts Access Paths 39
Bigdata : Enabling the Semantic Web at Web Scale
Bigdata : Enabling the Semantic Web at Web Scale Presentation outline What is big data? Bigdata Architecture Bigdata RDF Database Performance Roadmap What is big data? Big data is a new way of thinking
bigdata SYSTAP, LLC 2006-2012 All Rights Reserved bigdata http://www.bigdata.com/blog Presented at Graph Data Management 2012
1 1 Presentation outline Bigdata and the Semantic Web Bigdata Architecture Indices and Dynamic Sharding Services, Discovery, and Dynamics Bulk Data Loader Bigdata RDF Database Provenance mode Vectored
Using RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
SQL Query Evaluation. Winter 2006-2007 Lecture 23
SQL Query Evaluation Winter 2006-2007 Lecture 23 SQL Query Processing Databases go through three steps: Parse SQL into an execution plan Optimize the execution plan Evaluate the optimized plan Execution
Data Stream Algorithms in Storm and R. Radek Maciaszek
Data Stream Algorithms in Storm and R Radek Maciaszek Who Am I? l Radek Maciaszek l l l l l l Consul9ng at DataMine Lab (www.dataminelab.com) - Data mining, business intelligence and data warehouse consultancy.
Oracle Database In- Memory Op4on in Ac4on
Oracle Database In- Memory Op4on in Ac4on Tanel Põder & Kerry Osborne Accenture Enkitec Group h4p:// 1 Tanel Põder Intro: About Consultant, Trainer, Troubleshooter Oracle Database Performance geek Exadata
The Sierra Clustered Database Engine, the technology at the heart of
A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, and Bhavani Thuraisingham University of Texas at Dallas, Dallas TX 75080, USA Abstract.
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Inside the PostgreSQL Query Optimizer
Inside the PostgreSQL Query Optimizer Neil Conway [email protected] Fujitsu Australia Software Technology PostgreSQL Query Optimizer Internals p. 1 Outline Introduction to query optimization Outline of
Comp 5311 Database Management Systems. 16. Review 2 (Physical Level)
Comp 5311 Database Management Systems 16. Review 2 (Physical Level) 1 Main Topics Indexing Join Algorithms Query Processing and Optimization Transactions and Concurrency Control 2 Indexing Used for faster
Data Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
An Oracle White Paper May 2010. Guide for Developing High-Performance Database Applications
An Oracle White Paper May 2010 Guide for Developing High-Performance Database Applications Introduction The Oracle database has been engineered to provide very high performance and scale to thousands
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
Lecture 1: Data Storage & Index
Lecture 1: Data Storage & Index R&G Chapter 8-11 Concurrency control Query Execution and Optimization Relational Operators File & Access Methods Buffer Management Disk Space Management Recovery Manager
SQL Server Query Tuning
SQL Server Query Tuning Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author Pro SQL Server
BBM467 Data Intensive ApplicaAons
Hace7epe Üniversitesi Bilgisayar Mühendisliği Bölümü BBM467 Data Intensive ApplicaAons Dr. Fuat Akal [email protected] FoundaAons of Data[base] Clusters Database Clusters Hardware Architectures Data
Deliverable 2.1.4. 150 Billion Triple dataset hosted on the LOD2 Knowledge Store Cluster. LOD2 Creating Knowledge out of Interlinked Data
Collaborative Project LOD2 Creating Knowledge out of Interlinked Data Project Number: 257943 Start Date of Project: 01/09/2010 Duration: 48 months Deliverable 2.1.4 150 Billion Triple dataset hosted on
Advanced Oracle SQL Tuning
Advanced Oracle SQL Tuning Seminar content technical details 1) Understanding Execution Plans In this part you will learn how exactly Oracle executes SQL execution plans. Instead of describing on PowerPoint
Graph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
OS/Run'me and Execu'on Time Produc'vity
OS/Run'me and Execu'on Time Produc'vity Ron Brightwell, Technical Manager Scalable System SoAware Department Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation,
Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
Performance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
Query Performance Tuning: Start to Finish. Grant Fritchey
Query Performance Tuning: Start to Finish Grant Fritchey Who? Product Evangelist for Red Gate Software Microsoft SQL Server MVP PASS Chapter President Author: SQL Server Execution Plans SQL Server 2008
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
Understanding Query Processing and Query Plans in SQL Server. Craig Freedman Software Design Engineer Microsoft SQL Server
Understanding Query Processing and Query Plans in SQL Server Craig Freedman Software Design Engineer Microsoft SQL Server Outline SQL Server engine architecture Query execution overview Showplan Common
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif Sakr National ICT Australia UNSW, Sydney, Australia [email protected] Sameh Elnikety Microsoft Research Redmond, WA, USA [email protected]
DB2 for i5/os: Tuning for Performance
DB2 for i5/os: Tuning for Performance Jackie Jansen Senior Consulting IT Specialist [email protected] August 2007 Agenda Query Optimization Index Design Materialized Query Tables Parallel Processing Optimization
In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR
1 2 2 3 In this session, we use the table ZZTELE with approx. 115,000 records for the examples. The primary key is defined on the columns NAME,VORNAME,STR The uniqueness of the primary key ensures that
SQL Performance for a Big Data 22 Billion row data warehouse
SQL Performance for a Big Data Billion row data warehouse Dave Beulke dave @ d a v e b e u l k e.com Dave Beulke & Associates Session: F19 Friday May 8, 15 8: 9: Platform: z/os D a v e @ d a v e b e u
SQL Query Performance Tuning: Tips and Best Practices
SQL Query Performance Tuning: Tips and Best Practices Pravasini Priyanka, Principal Test Engineer, Progress Software INTRODUCTION: In present day world, where dozens of complex queries are run on databases
Introduc)on to urika. Mul)threading. SPARQL Database. urika Appliance. XMT- 2 Programming. Use Cases
1 Introduc)on to urika Mul)threading SPARQL Database urika Appliance XMT- 2 Programming Use Cases 2 MTA- 1 1998 Gallium arsenide: Proof of concept First produc,on implementa,on of latency- tolerant mul,threading
Data Warehousing und Data Mining
Data Warehousing und Data Mining Multidimensionale Indexstrukturen Ulf Leser Wissensmanagement in der Bioinformatik Content of this Lecture Multidimensional Indexing Grid-Files Kd-trees Ulf Leser: Data
Introduction. Bigdata Database Architecture
Introduction Bigdata is a standards-based, high-performance, scalable, open-source graph database. Written entirely in Java, the platform supports the SPARQL 1.1 family of specifications, including Query,
BigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
Why Query Optimization? Access Path Selection in a Relational Database Management System. How to come up with the right query plan?
Why Query Optimization? Access Path Selection in a Relational Database Management System P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, T. Price Peyman Talebifard Queries must be executed and execution
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering
HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering Chang Liu 1 Jun Qu 1 Guilin Qi 2 Haofen Wang 1 Yong Yu 1 1 Shanghai Jiaotong University, China {liuchang,qujun51319, whfcarter,yyu}@apex.sjtu.edu.cn
D B M G Data Base and Data Mining Group of Politecnico di Torino
Database Management Data Base and Data Mining Group of [email protected] A.A. 2014-2015 Optimizer objective A SQL statement can be executed in many different ways The query optimizer determines
Elena Baralis, Silvia Chiusano Politecnico di Torino. Pag. 1. Physical Design. Phases of database design. Physical design: Inputs.
Phases of database design Application requirements Conceptual design Database Management Systems Conceptual schema Logical design ER or UML Physical Design Relational tables Logical schema Physical design
Content Distribu-on Networks (CDNs)
Content Distribu-on Networks (CDNs) Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:0am in Architecture N101 hjp://www.cs.princeton.edu/courses/archive/spr12/cos461/ Second Half of the Course
Performance Comparison of Intel Enterprise Edition for Lustre* software and HDFS for MapReduce Applications
Performance Comparison of Intel Enterprise Edition for Lustre software and HDFS for MapReduce Applications Rekha Singhal, Gabriele Pacciucci and Mukesh Gangadhar 2 Hadoop Introduc-on Open source MapReduce
Introducing Cost Based Optimizer to Apache Hive
Introducing Cost Based Optimizer to Apache Hive John Pullokkaran Hortonworks 10/08/2013 Abstract Apache Hadoop is a framework for the distributed processing of large data sets using clusters of computers
Big Data Processing with Google s MapReduce. Alexandru Costan
1 Big Data Processing with Google s MapReduce Alexandru Costan Outline Motivation MapReduce programming model Examples MapReduce system architecture Limitations Extensions 2 Motivation Big Data @Google:
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc
Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12
Extending the Linked Data API with RDFa
Extending the Linked Data API with RDFa Steve Battle 1, James Leigh 2, David Wood 2 1 Gloze Ltd, UK [email protected] 2 3 Round Stones, USA James, [email protected] Linked data is about connecting
MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context
MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able
Architec;ng Splunk for High Availability and Disaster Recovery
Copyright 2014 Splunk Inc. Architec;ng Splunk for High Availability and Disaster Recovery Dritan Bi;ncka BD Solu;on Architecture Disclaimer During the course of this presenta;on, we may make forward- looking
How To Use A Webmail On A Pc Or Macodeo.Com
Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC
Big Data, Fast Processing Speeds Kevin McGowan SAS Solutions on Demand, Cary NC ABSTRACT As data sets continue to grow, it is important for programs to be written very efficiently to make sure no time
A Shared-nothing cluster system: Postgres-XC
Welcome A Shared-nothing cluster system: Postgres-XC - Amit Khandekar Agenda Postgres-XC Configuration Shared-nothing architecture applied to Postgres-XC Supported functionalities: Present and Future Configuration
Integrity Checking and Monitoring of Files on the CASTOR Disk Servers
Integrity Checking and Monitoring of Files on the CASTOR Disk Servers Author: Hallgeir Lien CERN openlab 17/8/2011 Contents CONTENTS 1 Introduction 4 1.1 Background...........................................
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration
Understanding SQL Server Execution Plans. Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner
Understanding SQL Server Execution Plans Klaus Aschenbrenner Independent SQL Server Consultant SQLpassion.at Twitter: @Aschenbrenner About me Independent SQL Server Consultant International Speaker, Author
Oracle EXAM - 1Z0-117. Oracle Database 11g Release 2: SQL Tuning. Buy Full Product. http://www.examskey.com/1z0-117.html
Oracle EXAM - 1Z0-117 Oracle Database 11g Release 2: SQL Tuning Buy Full Product http://www.examskey.com/1z0-117.html Examskey Oracle 1Z0-117 exam demo product is here for you to test the quality of the
Optimizing Performance. Training Division New Delhi
Optimizing Performance Training Division New Delhi Performance tuning : Goals Minimize the response time for each query Maximize the throughput of the entire database server by minimizing network traffic,
MyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC
MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL
Efficient Data Access and Data Integration Using Information Objects Mica J. Block
Efficient Data Access and Data Integration Using Information Objects Mica J. Block Director, ACES Actuate Corporation [email protected] Agenda Information Objects Overview Best practices Modeling Security
1Z0-117 Oracle Database 11g Release 2: SQL Tuning. Oracle
1Z0-117 Oracle Database 11g Release 2: SQL Tuning Oracle To purchase Full version of Practice exam click below; http://www.certshome.com/1z0-117-practice-test.html FOR Oracle 1Z0-117 Exam Candidates We
SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK
3/2/2011 SWISSBOX REVISITING THE DATA PROCESSING SOFTWARE STACK Systems Group Dept. of Computer Science ETH Zürich, Switzerland SwissBox Humboldt University Dec. 2010 Systems Group = www.systems.ethz.ch
Introduc8on to Apache Spark
Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these
Spark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
Physical Database Design and Tuning
Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence
Best Practices for DB2 on z/os Performance
Best Practices for DB2 on z/os Performance A Guideline to Achieving Best Performance with DB2 Susan Lawson and Dan Luksetich www.db2expert.com and BMC Software September 2008 www.bmc.com Contacting BMC
1. Physical Database Design in Relational Databases (1)
Chapter 20 Physical Database Design and Tuning Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley 1. Physical Database Design in Relational Databases (1) Factors that Influence
Oracle Spatial and Graph. Jayant Sharma Director, Product Management
Oracle Spatial and Graph Jayant Sharma Director, Product Management Agenda Oracle Spatial and Graph Graph Capabilities Q&A 2 Oracle Spatial and Graph Complete Open Integrated Most Widely Used 3 Open and
SQream Technologies Ltd - Confiden7al
SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!
Datenbanksysteme II: Implementation of Database Systems Implementing Joins
Datenbanksysteme II: Implementation of Database Systems Implementing Joins Material von Prof. Johann Christoph Freytag Prof. Kai-Uwe Sattler Prof. Alfons Kemper, Dr. Eickler Prof. Hector Garcia-Molina
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at
Lecture 3: Scaling by Load Balancing 1. Comments on reviews i. 2. Topic 1: Scalability a. QUESTION: What are problems? i. These papers look at distributing load b. QUESTION: What is the context? i. How
HBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services
HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Binary search tree with SIMD bandwidth optimization using SSE
Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous
Nodes, Ties and Influence
Nodes, Ties and Influence Chapter 2 Chapter 2, Community Detec:on and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010. 1 IMPORTANCE OF NODES 2 Importance of Nodes Not
Semantic Web Standard in Cloud Computing
ETIC DEC 15-16, 2011 Chennai India International Journal of Soft Computing and Engineering (IJSCE) Semantic Web Standard in Cloud Computing Malini Siva, A. Poobalan Abstract - CLOUD computing is an emerging
low-level storage structures e.g. partitions underpinning the warehouse logical table structures
DATA WAREHOUSE PHYSICAL DESIGN The physical design of a data warehouse specifies the: low-level storage structures e.g. partitions underpinning the warehouse logical table structures low-level structures
Parallel Computing. Benson Muite. [email protected] http://math.ut.ee/ benson. https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage
Parallel Computing Benson Muite [email protected] http://math.ut.ee/ benson https://courses.cs.ut.ee/2014/paralleel/fall/main/homepage 3 November 2014 Hadoop, Review Hadoop Hadoop History Hadoop Framework
Driving MySQL to Big Data Scale. Thomas Hazel Founder, Chief Scien@st [email protected]
Driving MySQL to Big Data Scale Thomas Hazel Founder, Chief Scien@st [email protected] Millions to Billions to Trillions Agenda Driving MySQL to Big Data Scale Market Trends Hardware Trends Current Computer
BIG DATA AND INVESTIGATIVE ANALYTICS
The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve
White Paper. Optimizing the Performance Of MySQL Cluster
White Paper Optimizing the Performance Of MySQL Cluster Table of Contents Introduction and Background Information... 2 Optimal Applications for MySQL Cluster... 3 Identifying the Performance Issues.....
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
Partitioning under the hood in MySQL 5.5
Partitioning under the hood in MySQL 5.5 Mattias Jonsson, Partitioning developer Mikael Ronström, Partitioning author Who are we? Mikael is a founder of the technology behind NDB
An Ant Colony Optimization Approach to the Software Release Planning Problem with Dependent Requirements
3rd InternaIonal Symposium on Search Based SoKware Engineering September 10 12, Szeged, Hungary An Ant Colony Optimization Approach to the Software Release Planning Problem with Dependent Requirements
Project Group High- performance Flexible File System 2010 / 2011
Project Group High- performance Flexible File System 2010 / 2011 Lecture 1 File Systems André Brinkmann Task Use disk drives to store huge amounts of data Files as logical resources A file can contain
Data Management in the Cloud
Data Management in the Cloud Ryan Stern [email protected] : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
Performance rule violations usually result in increased CPU or I/O, time to fix the mistake, and ultimately, a cost to the business unit.
Is your database application experiencing poor response time, scalability problems, and too many deadlocks or poor application performance? One or a combination of zparms, database design and application
bigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
An Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
Architectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
Data Mining. Supervised Methods. Ciro Donalek [email protected]. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek [email protected] Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
Big RDF Data Partitioning and Processing using hadoop in Cloud
Big RDF Data Partitioning and Processing using hadoop in Cloud Tejas Bharat Thorat Dept. of Computer Engineering MIT Academy of Engineering, Alandi, Pune, India Prof.Ranjana R.Badre Dept. of Computer Engineering
