OVERVIEW OF DATA EXPLORATION TECHNIQUES. Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri SIGMOD 2015, Melbourne
|
|
- Gerald Gilmore
- 8 years ago
- Views:
Transcription
1 OVERVIEW OF DATA EXPLORATION TECHNIQUES Stratos Idreos, Olga Papaemmanouil, Surajit Chaudhuri SIGMOD 2015, Melbourne
2 USER INTERACTION
3 express interests query/results recommendasons annotate collaborate visualize results User Interface Layer assisted query formulason
4 User Interface Layer
5 User Interface Layer Data Visualiza=on
6 User Interface Layer Data Visualiza=on Explora=on Interface
7 data visualizason visualiza=on tools User Interface Layer Data Visualiza=on Explora=on Interface visual op=miza=ons automa=c visualiza=on
8 data visualizason visualiza=on tools User Interface Layer Data Visualiza=on Explora=on Interface visual op=miza=ons automa=c visualiza=on
9 (1,1) (M,1) Back in 1982 i th tuple goes here (1,i) window- based sophis=cated browser for rela=onal s browser for mulsple relasons/tuples TIMBER rich query language for icon- oriented s visual editor of text objects browser for geographical data TIMBER, VL 82
10 user- driven visual specs visualizasons Polaris back- end queries data cubes Polaris, INFOVIS 02
11 visual specificasons specificasons (abributes) user- driven visualizasons Polaris back- end queries data cubes back- end queries: data selecson, parsson into panes Polaris, INFOVIS 2002
12 visual specificasons transformasons (group by, sort) user- driven visualizasons Polaris back- end queries data cubes back- end queries: data transformasons (group, sort, aggregate within each pane) Polaris, INFOVIS 2002
13 visual specificasons mappings (shape, size, color) user- driven visualizasons Polaris back- end queries data cubes back- end queries: graphical transformasons (renter and visualize) Polaris, INFOVIS 2002
14 collaborasve explorason live annotasons Sky View explorason for sky objects/paberns AstroShelf, SIGMOD 12
15 Live Annota=ons collaborasve explorason subscripsons to interessng objects Sky View explorason for sky objects/paberns AstroShelf, SIGMOD 12
16 Live Annota=ons collaborasve explorason stream based nosficasons Sky View explorason for sky objects/paberns AstroShelf, SIGMOD 12
17 data visualizason visualiza=on tools User Interface Layer Data Visualiza=on Explora=on Interface visual op=miza=ons automa=c visualiza=on
18 automasc visualizason request views User Interface Layer interessng? insigheul? Data Visualiza=on review views manual, repessve explorason for best visualizason(s)
19 auto- ranked visualizasons model good charts saved decks/ replay logs VizDeck search, select, promote, discard, save, share filter across charts, recommend, rank VizDeck, SIGMOD 12
20 automasc visualizasons user query Q 1 Q 2 uslity Q n high deviason from overall dataset aggregasons/ single- abribute group- by informa=ve queries % sales/ region sales over Sme visualizason engine See, PVL 13
21 data visualizason visualiza=on tools User Interface Layer Data Visualiza=on Explora=on Interface visual op=miza=ons automa=c visualiza=on
22 resoluson reducson user query Sci query results Visualiza=on expensive, ineffecsve on big data sets Scalar, Big Data Vis 13
23 resoluson reducson user query Sci query results Visualiza=on user query Data Reduc=on reduced results Visualiza=on Sci modified query plans filter/aggregate/sample at given resoluson Scalar, Big Data Vis 13
24 approximate visualizasons user query Sampling reduced results approximate chart Visualiza=on SELECT X, AVG(Y) FROM R(X,Y) GROUP BY X original chart same group ordering Blais et al, PVL 15
25 approximate visualizasons user query Sampling reduced results approximate chart Visualiza=on SELECT X, AVG(Y) FROM R(X,Y) GROUP BY X clear ordering less samples Blais et al, PVL 15
26 approximate visualizasons user query Sampling reduced results approximate chart Visualiza=on SELECT X, AVG(Y) FROM R(X,Y) GROUP BY X correct order? sample more min # samples for correct order? Blais et al, PVL 15
27 approximate visualizasons user query Sampling reduced results approximate chart Visualiza=on SELECT X, AVG(Y) FROM R(X,Y) GROUP BY X #samples Group 1 Group 2 Group 3 Group 4 1 [60,90] [20,50] [10,40] [40,70] 20 [64,84] [30,48] [15,35] [45,65] 21 [66,84], I [30,48] [17,35] [46,64] 70 [66,84], I [40,47] [17,32], I [46,53] sampling phases/ confidence intervals Blais et al, PVL 15
28 visualizason management user query overlapping user queries query results Visualiza=on replicated db opera=ons memory opera=ons on big data Ermac, PVL 14
29 visualizason management user query query results Visualiza=on transforma=ons to pixel space visual op=miza=ons reduced visual rendering =me specifica=ons DVMS logical visual plans è physical query plans Ermac, PVL 14
30 explorason interfaces automa=c explora=on User Interface Layer Data Visualiza=on Explora=on Interface assisted query formula=on novel query interfaces
31 explorason interfaces automa=c explora=on User Interface Layer Data Visualiza=on Explora=on Interface assisted query formula=on novel query interfaces
32 manual vs automasc data explorason long, imprecise, labor- intensive process manual SQL query formulason query execuson result review predicate adjustment
33 manual vs automasc data explorason long, imprecise, labor- intensive process manual SQL query formulason query execuson result review predicate adjustment auto capture user interests op=mize query execu=on reduce user effort recommend data/queries
34 manual vs automasc data explorason long, imprecise, labor- intensive process manual SQL query formulason query execuson result review predicate adjustment auto capture user interests op=mize query execu=on reduce user effort recommend data/queries
35 explore by example relevant irrelevant User Model decision tree classifier Space Explora=on sample extrac=on Samples effecsveness vs efficiency sampling areas? sampling size? AIDE, SIGMOD 14/ VL 15
36 explore by example relevant areas to predict Abribute B Abribute A AIDE, SIGMOD 14/ VL 15
37 explore by example uniform sampling across domain x x x Abribute B x x x x x x Abribute A AIDE, SIGMOD 14/ VL 15
38 explore by example Abribute B x sampling around relevant objects predicted relevant area discover relevant area Abribute A AIDE, SIGMOD 14/ VL 15
39 explore by example sampling around boundaries Abribute B x refined predicted relevant areas Abribute A AIDE, SIGMOD 14/ VL 15
40 result recommendasons query results YMAL interes=ng queries addi=onal results YMAL, VLJ 13
41 result recommendasons query results YMAL interes=ng queries addi=onal results query extract query fasets expand adributes rank fasets top- k queries selecson predicates based on original query add abributes from table schema freq(result)/ freq() YMAL, VLJ 13
42 result recommendasons query results YMAL interes=ng queries addi=onal results query extract query fasets expand adributes rank fasets top- k queries selecson predicates based on original query add abributes from table schema freq(result)/ freq()!tle, year, genre of Scorsese movies!tle, year, genre, country of Scorsese movies + = many Scorsese movies are related to Italy YMAL, VLJ 13
43 explorason interfaces automa=c explora=on User Interface Layer Data Visualiza=on Explora=on Interface assisted query formula=on novel query interfaces
44 keyword- based query suggessons SQL query (tedious) keywords (intuisve) relevant data relevant & irrelevant data keyword search relevant data how we can discover relevant queries? SQLSUGG, ICDE 11
45 keyword- based query suggessons keywords Template Matcher ranked templates SQL Query Generator suggested queries Sample Results/ Visualiza=on Template Repository database gray template on Stle/authors? template on Stle? SQLSUGG, ICDE 11
46 keyword- based query suggessons keywords Template Matcher ranked templates SQL Query Generator suggested queries Sample Results/ Visualiza=on Template Repository =tle year Paper Template 1 =tle year Paper id=p_id Template 2 Author template generason SQLSUGG, ICDE 11
47 keyword- based query suggessons keywords Template Matcher ranked templates SQL Query Generator suggested queries Sample Results/ Visualiza=on template relevance = f (en=ty relevance & importance) relevant template? Template Repository ensty relevance è keyword frequency in ensty ensty importanceè importance of data nodes SQLSUGG, ICDE 11
48 equi- join inference sample table A table B A 1 A 2 B 1 B 2 Cartesian product inference algorithm informa=ve tuple goal join predicate goal predicate: discover all posi=ves eliminate all nega=ves minimize user effort BonifaS et al, ET 14
49 equi- join inference sample table A table B A 1 A 2 B 1 B 2 Cartesian product inference algorithm informa=ve tuple goal join predicate (A1, B1) (A1, B2) candidate predicates (A1, B1) (A2, B1) (A1, B1) (A2, B2) (A1, B1) (A1, B2) (A2, B1) prune predicates with uninformasve tuples label tuple that prunes as many predicates as possible BonifaS et al, ET 14
50 graphical query specificason result visualizason answers non- answers DataPlay, PVL 13
51 graphical query specificason result visualizason answers non- answers pivot relason add, remove query constraints query /visualizason recommendasons seman=c query tuning by local syntac=c modifica=ons DataPlay, PVL 13
52 graphical query specificason result visualizason answers non- answers pivot relason add, remove results query correcsons search limited to local modifica=ons DataPlay, PVL 13
53 query recommendasons query results Charles queries selected query Charles, CIDR 13
54 query recommendasons query results Charles queries selected query different data parssons weight <5 >5 weight, height <5 >5 <20 <30 <5 >5 >20 >30 quality: simplicity, breadth, balance Charles, CIDR 13
55 query refinement condi=onal query select species from birds where color= {red: 80%, blue: 20%} Merlin ranked results by match probability sensi=vity of user predicates query refinements w/ quality improvement rank species 1 Bluebird 2 Blue Jay adr sensi=vity color 18.6 impact on ranking adr size quality score 83.3 legcolor 57.1 remaining adributes result quality if added in the query Merlin, ICDE 14
56 explorason interfaces automa=c explora=on User Interface Layer Data Visualiza=on Explora=on Interface assisted query formula=on novel query interfaces
57 no- keyboard interfaces query context gesture recognison query intend query space search pabern query template Gesture, CIDR 13
58 no- keyboard interfaces novel database kernel touch input quick response touch recognison gesture recognison map touch to operators dbtouch, CIDR 13
59 InteracSve ExploraSon through Data Prefetching & Query ApproximaSon MIDDLEWARE TECHNIQUES
60 interacsve data explorason SQL query formulason query execuson result review predicate adjustment ad- hoc, non- op=mized, labor- intensive process interac=ve: small latency bounds on user wait =me
61 middleware opsmizasons query results query approxima=on online processing sample- based processing middleware prefetching specula=ve query execu=on result reuse structure- aware prefetching
62 sample- based processing query Samples sampling approximate results accuracy vs response Smes sample construcson & selecson error approximason
63 off- line data synopses query Aqua transformed query samples histograms Synopses approximate results + confidence bounds join synopses: sample dissnguished joins congressional samples: biased sampling for group- by queries incremental maintenance: equi- depth & compressed histograms Aqua, SIGMOD 99
64 select avg(sessiontime) FROM table WHERE city= SF WITHIN 1 SEC online sample selecson online sampling selecson Results 190+/ (95% confidence) disk in- memory offline sampling on frequent columns sets parallel query execu=on on mul=ple samples across mul=ple machines samples across 1000s machines Blink, EuroSys 13
65 data impressions query & =me/error bounds approximate results Level 1 Level 2 Level 3 impressions during data loading adapsve sampling to explorason focus muls layer sampling and processing to meet user bounds SciBORG, CIDR 11
66 middleware opsmizasons query results query approxima=on online processing sample- based processing middleware prefetching specula=ve query execu=on result reuse structure- aware prefetching
67 speculasve query execuson Query Formula=on user wait =me Result Review =me Query Execu=on 1. predict follow- up queries 2. execute queries 3. cache results
68 speculasve query execuson Query Formula=on user wait =me Result Review =me Query Execu=on 1. predict follow- up queries 2. execute queries 3. cache results
69 speculasve query execuson Query Formula=on user wait =me Result Review =me Query Execu=on 1. predict follow- up queries 2. execute queries 3. cache results explora=on space reduc=on query enumera=on query ranking
70 cube explorason explora=on space reduc=on user query SELECT AVG (iops) FROM events WHERE month= m1 AND week= w1 GROUP BY zone month week hour itme zone center location rack DICE, ICDE 14
71 cube explorason explora=on space reduc=on user query SELECT AVG (iops) FROM events WHERE month= m1 AND week= w1 GROUP BY zone zone center location rack month week hour itme cube explora=on operators WHERE month= m1 WHERE month= m1 AND week= w1 AND hour= h1 WHERE month= m1 AND week= w2 DICE, ICDE 14 parent child sibling
72 cube explorason explora=on space reduc=on query enumera=on user query SELECT AVG (iops) FROM events WHERE month= m1 AND week= w1 GROUP BY zone month week hour itme specula=ve queries Q(month= m1 ) Q(month = m12 ) Q(hour = h1 ) Q(hour = h24 ) zone center location rack Q(week= w2 ) Q(week= w3 ) DICE, ICDE 14
73 cube explorason explora=on space reduc=on query enumera=on query ranking user query SELECT AVG (iops) FROM events WHERE month= m1 AND week= w1 GROUP BY zone month week hour itme specula=ve queries Q(month= m1 ) Q(month = m12 ) Q(hour = h1 ) Q(hour = h24 ) zone center location rack Q(week= w2 ) Q(week= w3 ) DICE, ICDE 14
74 cube explorason Query Formula=on user wait =me, t Result Review =me Specula=ve Execu=on Query Execu=on DICE, ICDE 14 74
75 cube explorason Query Formula=on user wait =me, t Result Review =me Specula=ve Execu=on QUERY Probability Exec Time Q Q Q Q Q Query Execu=on maximize query probability total speculason Sme < t DICE, ICDE 14 75
76 result reuse prefetching window prefetching window Query 1 Query 2 Query 3 Execu=on Execu=on Execu=on =me idensfy (likely) overlapping results cache them reduce query execuson Sme (user wait Sme)
77 semansc windows user- defined window properses overlapping results/windows SW 4 SW 1 SW 3 SW 2 window prefetching which order? 2D explorason space Kalinin et al, SIGMOD 14
78 semansc windows user- defined window properses overlapping results/windows SW 4 SW 1 SW 3 SW 2 uslity- based result ranking & result prefetching 2D explorason space Kalinin et al, SIGMOD 14
79 semansc windows extend & prefetch SW 1 SW 2 online performance vs query compleson Sme adjust prefetching size to output progress Kalinin et al, SIGMOD 14
80 query diversified results k representa=ve tuples with max total pairwise distance data diversificason
81 query diversified results k representa=ve tuples with max total pairwise distance data diversificason Query Output Max Diversified Set Search Diversified Output k= 3 T 1 T 2 T 3 T 4 T 5 d(t 1, T 3 ) d (T 2, T 3 ) random tuple d(t 4, T 3 ) d(t 5, T 3 )
82 query diversified results k representa=ve tuples with max total pairwise distance data diversificason Query Output Max Diversified Set Search Diversified Output k= 3 T 1 T 2 T 3 T 4 T 5 d(t 1, T 3 ) d (T 2, T 3 ) random tuple d(t 4, T 3 ) d(t 5, T 3 ) T 1 T 2 T 3 T 4 T 5 d (T 2, T 1 )+ d(t 2,T 3 ) d (T 4, T 1 )+d(t 4, T 3 ) d (T 5, T 1 )+d(t 5, T 3 )
83 query diversified results k representa=ve tuples with max total pairwise distance data diversificason Query Output Max Diversified Set Search Diversified Output k= 3 T 1 T 2 T 3 T 4 T 5 d(t 1, T 3 ) d (T 2, T 3 ) random tuple d(t 4, T 3 ) d(t 5, T 3 ) T 1 T 2 T 3 T 4 T 5 d (T 2, T 1 )+ d(t 2,T 1 ) d (T 4, T 1 )+d(t 4, T 3 ) d (T 5, T 1 )+d(t 5, T 3 ) T 1 T 2 T 3 T 4 T 5
84 interacsve data diversificason w w w w w Q 1 w Q 2 w w w w Q w 3 w w overlapping diversified results long Time- To- Insight cache diversified results and use most promising regression model predicts max diversificason of a set DivIDE, SSM 14
85 interacsve data diversificason query Cached Diversified Results reusable results query results divide search space reusable diversified results new query results model based output selec=on diversified results search space pruning through regression model best/first fit search for max total diversificason among cached and new results DivIDE, SSM 14
86 structure- aware prefetching prefetching for interacsve spasal query sequences model structures of past spasal queries in graph idensfy guiding structure in past two queries : iterasve pruning cache the predicted next locason SCOUT, VL 12
How To Use A Webmail On A Pc Or Macodeo.Com
Big data workloads and real-world data sets Gang Lu Institute of Computing Technology, Chinese Academy of Sciences BigDataBench Tutorial MICRO 2014 Cambridge, UK INSTITUTE OF COMPUTING TECHNOLOGY 1 Five
More informationTopic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth
Topic Extrac,on from Online Reviews for Classifica,on and Recommenda,on (2013) R. Dong, M. Schaal, M. P. O Mahony, B. Smyth Lecture Algorithms to Analyze Big Data Speaker Hüseyin Dagaydin Heidelberg, 27
More informationMyOra 3.0. User Guide. SQL Tool for Oracle. Jayam Systems, LLC
MyOra 3.0 SQL Tool for Oracle User Guide Jayam Systems, LLC Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationData Warehousing. Yeow Wei Choong Anne Laurent
Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes
More informationSQL Server Analysis Services Complete Practical & Real-time Training
A Unit of Sequelgate Innovative Technologies Pvt. Ltd. ISO Certified Training Institute Microsoft Certified Partner SQL Server Analysis Services Complete Practical & Real-time Training Mode: Practical,
More informationSisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
More informationINCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION. Danyel Fisher Microso0 Research
INCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION Danyel Fisher Microso0 Research Exploratory Visualiza9on Ini9al Query Process query Get a response Change parameters
More informationRun$me Query Op$miza$on
Run$me Query Op$miza$on Robust Op$miza$on for Graphs 2006-2014 All Rights Reserved 1 RDF Join Order Op$miza$on Typical approach Assign es$mated cardinality to each triple pabern. Bigdata uses the fast
More informationA Vague Improved Markov Model Approach for Web Page Prediction
A Vague Improved Markov Model Approach for Web Page Prediction ABSTRACT Priya Bajaj and Supriya Raheja Department of Computer Science & Engineering, ITM University Gurgaon, Haryana 122001, India Today
More informationData Mining and Database Systems: Where is the Intersection?
Data Mining and Database Systems: Where is the Intersection? Surajit Chaudhuri Microsoft Research Email: surajitc@microsoft.com 1 Introduction The promise of decision support systems is to exploit enterprise
More informationBusiness Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? PTR Associates Limited
Business Benefits From Microsoft SQL Server Business Intelligence Solutions How Can Business Intelligence Help You? www.ptr.co.uk Business Benefits From Microsoft SQL Server Business Intelligence (September
More informationSAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24. Data Federation Administration Tool Guide
SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package 7 2015-11-24 Data Federation Administration Tool Guide Content 1 What's new in the.... 5 2 Introduction to administration
More informationadaptive loading adaptive indexing dbtouch 3 Ideas for Big Data Exploration
adaptive loading adaptive indexing dbtouch Ideas for Big Data Exploration Stratos Idreos CWI, INS-, Amsterdam data is everywhere daily data years daily data years Eric Schmidt: Every two days we create
More informationFinding Anomalies in Time- Series using Visual Correla/on for Interac/ve Root Cause Analysis
VizSec 2013 October 14, 2013 Atlanta GA, USA Finding Anomalies in Time- Series using Visual Correla/on for Interac/ve Root Cause Analysis Florian Stoffel, Fabian Fischer, Daniel A. Keim Data Analysis and
More informationSBML SBGN SBML Just my 2 cents. Alice C. Villéger COMBINE 2010
SBML SBGN SBML Just my 2 cents Alice C. Villéger COMBINE 2010 Disclaimer Fuzzy talk work in progress last minute slides Someone else has been working on very similar stuff and should really have been talking
More informationDBMS / Business Intelligence, SQL Server
DBMS / Business Intelligence, SQL Server Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the needs of IT professionals.
More informationA Workbench for Prototyping XML Data Exchange (extended abstract)
A Workbench for Prototyping XML Data Exchange (extended abstract) Renzo Orsini and Augusto Celentano Università Ca Foscari di Venezia, Dipartimento di Informatica via Torino 155, 30172 Mestre (VE), Italy
More informationReversing Statistics for Scalable Test Databases Generation
Reversing Statistics for Scalable Test Databases Generation Entong Shen Lyublena Antova Pivotal (formerly Greenplum) DBTest 2013, New York, June 24 1 Motivation Data generators: functional and performance
More informationTopology Aware Analytics for Elastic Cloud Services
Topology Aware Analytics for Elastic Cloud Services athafoud@cs.ucy.ac.cy Master Thesis Presentation May 28 th 2015, Department of Computer Science, University of Cyprus In Brief.. a Tool providing Performance
More informationMyOra 3.5. User Guide. SQL Tool for Oracle. Kris Murthy
MyOra 3.5 SQL Tool for Oracle User Guide Kris Murthy Contents Features... 4 Connecting to the Database... 5 Login... 5 Login History... 6 Connection Indicator... 6 Closing the Connection... 7 SQL Editor...
More informationSQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
More informationConnecting Segments for Visual Data Exploration and Interactive Mining of Decision Rules
Journal of Universal Computer Science, vol. 11, no. 11(2005), 1835-1848 submitted: 1/9/05, accepted: 1/10/05, appeared: 28/11/05 J.UCS Connecting Segments for Visual Data Exploration and Interactive Mining
More informationOracle Database 10g: Introduction to SQL
Oracle University Contact Us: 1.800.529.0165 Oracle Database 10g: Introduction to SQL Duration: 5 Days What you will learn This course offers students an introduction to Oracle Database 10g database technology.
More informationAn Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationDB2 for i. Analysis and Tuning. Mike Cain IBM DB2 for i Center of Excellence. mcain@us.ibm.com
DB2 for i Monitoring, Analysis and Tuning Mike Cain IBM DB2 for i Center of Excellence Rochester, MN USA mcain@us.ibm.com 8 Copyright IBM Corporation, 2008. All Rights Reserved. This publication may refer
More informationBuilding Data Cubes and Mining Them. Jelena Jovanovic Email: jeljov@fon.bg.ac.yu
Building Data Cubes and Mining Them Jelena Jovanovic Email: jeljov@fon.bg.ac.yu KDD Process KDD is an overall process of discovering useful knowledge from data. Data mining is a particular step in the
More informationEfficient Data Access and Data Integration Using Information Objects Mica J. Block
Efficient Data Access and Data Integration Using Information Objects Mica J. Block Director, ACES Actuate Corporation mblock@actuate.com Agenda Information Objects Overview Best practices Modeling Security
More informationSAS BI Dashboard 3.1. User s Guide
SAS BI Dashboard 3.1 User s Guide The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2007. SAS BI Dashboard 3.1: User s Guide. Cary, NC: SAS Institute Inc. SAS BI Dashboard
More informationSearch and Information Retrieval
Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search
More informationSampling Methods In Approximate Query Answering Systems
Sampling Methods In Approximate Query Answering Systems Gautam Das Department of Computer Science and Engineering The University of Texas at Arlington Box 19015 416 Yates St. Room 300, Nedderman Hall Arlington,
More informationServer Load Prediction
Server Load Prediction Suthee Chaidaroon (unsuthee@stanford.edu) Joon Yeong Kim (kim64@stanford.edu) Jonghan Seo (jonghan@stanford.edu) Abstract Estimating server load average is one of the methods that
More informationWeb-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy
The Deep Web: Surfacing Hidden Value Michael K. Bergman Web-Scale Extraction of Structured Data Michael J. Cafarella, Jayant Madhavan & Alon Halevy Presented by Mat Kelly CS895 Web-based Information Retrieval
More informationIntroduction to IR Systems: Supporting Boolean Text Search. Information Retrieval. IR vs. DBMS. Chapter 27, Part A
Introduction to IR Systems: Supporting Boolean Text Search Chapter 27, Part A Database Management Systems, R. Ramakrishnan 1 Information Retrieval A research field traditionally separate from Databases
More informationVisualization methods for patent data
Visualization methods for patent data Treparel 2013 Dr. Anton Heijs (CTO & Founder) Delft, The Netherlands Introduction Treparel can provide advanced visualizations for patent data. This document describes
More informationIntroduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?
Introduction Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695 01/11/2007 Authors objectives: Describe
More informationSAS BI Dashboard 4.3. User's Guide. SAS Documentation
SAS BI Dashboard 4.3 User's Guide SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2010. SAS BI Dashboard 4.3: User s Guide. Cary, NC: SAS Institute
More informationMario Guarracino. Data warehousing
Data warehousing Introduction Since the mid-nineties, it became clear that the databases for analysis and business intelligence need to be separate from operational. In this lecture we will review the
More informationDBA xpress Product Overview
DBA xpress Product Overview provides next-generation SQL tools specifically tailored for performance and ease of use when architecting or administering large microsoft SQL Server database systems. Key
More informationPerformance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
More informationORACLE BUSINESS INTELLIGENCE WORKSHOP
ORACLE BUSINESS INTELLIGENCE WORKSHOP Integration of Oracle BI Publisher with Oracle Business Intelligence Enterprise Edition Purpose This tutorial mainly covers how Oracle BI Publisher is integrated with
More informationContinuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationVisual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationPSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.
PSG College of Technology, Coimbatore-641 004 Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS Project Project Title Area of Abstract No Specialization 1. Software
More informationMap- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering
Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming
More informationReporting Services. White Paper. Published: August 2007 Updated: July 2008
Reporting Services White Paper Published: August 2007 Updated: July 2008 Summary: Microsoft SQL Server 2008 Reporting Services provides a complete server-based platform that is designed to support a wide
More informationThe Right BI Tool for the Job in a non- SAP Applica9on Environment
September 9 11, 2013 Anaheim, California The Right BI Tool for the Job in a non- SAP Applica9on Environment Speaker Name(s): Ty Miller Full Spectrum Business Intelligence Self Service Dashboards and Apps
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationMOC 20461C: Querying Microsoft SQL Server. Course Overview
MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server
More informationReasoning Component Architecture
Architecture of a Spam Filter Application By Avi Pfeffer A spam filter consists of two components. In this article, based on my book Practical Probabilistic Programming, first describe the architecture
More informationConsumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis
Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis (Version 1.17) For validation Document version 0.1 7/7/2014 Contents What is SAP Predictive Analytics?... 3
More informationQOS Based Web Service Ranking Using Fuzzy C-means Clusters
Research Journal of Applied Sciences, Engineering and Technology 10(9): 1045-1050, 2015 ISSN: 2040-7459; e-issn: 2040-7467 Maxwell Scientific Organization, 2015 Submitted: March 19, 2015 Accepted: April
More informationThe Arts & Science of Tuning HANA models for Performance. Abani Pattanayak, SAP HANA CoE Nov 12, 2015
The Arts & Science of Tuning HANA models for Performance Abani Pattanayak, SAP HANA CoE Nov 12, 2015 Disclaimer This presentation outlines our general product direction and should not be relied on in making
More informationData Validation Online References
Data Validation Online References Submitted To: Program Manager GeoConnections Victoria, BC, Canada Submitted By: Jody Garnett Brent Owens Refractions Research Inc. Suite 400, 1207 Douglas Street Victoria,
More informationBig Data and Scripting map/reduce in Hadoop
Big Data and Scripting map/reduce in Hadoop 1, 2, parts of a Hadoop map/reduce implementation core framework provides customization via indivudual map and reduce functions e.g. implementation in mongodb
More informationLinguistic information visualization and web services
Linguistic information visualization and web services Chris Culy and Verena Lyding European Academy Bolzano-Bozen Bolzano-Bozen, Italy http://www.eurac.edu/linfovis LInfoVis (= Linguistic Information Visualization)
More informationVisualization of Semantic Windows with SciDB Integration
Visualization of Semantic Windows with SciDB Integration Hasan Tuna Icingir Department of Computer Science Brown University Providence, RI 02912 hti@cs.brown.edu February 6, 2013 Abstract Interactive Data
More informationDifferential privacy in health care analytics and medical research An interactive tutorial
Differential privacy in health care analytics and medical research An interactive tutorial Speaker: Moritz Hardt Theory Group, IBM Almaden February 21, 2012 Overview 1. Releasing medical data: What could
More informationBI xpress Product Overview
BI xpress Product Overview Develop and manage SSIS packages with ease! Key Features Create a robust auditing and notification framework for SSIS Speed BI development with SSAS calculations and SSIS package
More informationMonitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center
Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Presented by: Dennis Liao Sales Engineer Zach Rea Sales Engineer January 27 th, 2015 Session 4 This Session
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1 Announcements- - - Project Goal: design a database system applica-on with a web front-
More informationNetApp FAS Hybrid Array Flash Efficiency. Silverton Consulting, Inc. StorInt Briefing
NetApp FAS Hybrid Array Flash Efficiency Silverton Consulting, Inc. StorInt Briefing PAGE 2 OF 7 Introduction Hybrid storage arrays (storage systems with both disk and flash capacity) have become commonplace
More informationBusiness Insight Report Authoring Getting Started Guide
Business Insight Report Authoring Getting Started Guide Version: 6.6 Written by: Product Documentation, R&D Date: February 2011 ImageNow and CaptureNow are registered trademarks of Perceptive Software,
More informationCS1100: Access Reports
CS1100: Access Reports A (Very) Short Tutorial on Microsoft Access Report Construction Created By Martin Schedlbauer With contributions from Matthew Ekstrand-Abueg CS1100 Microsoft Access 1 Reports Reports
More informationInge Os Sales Consulting Manager Oracle Norway
Inge Os Sales Consulting Manager Oracle Norway Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database Machine Oracle & Sun Agenda Oracle Fusion Middelware Oracle Database 11GR2 Oracle Database
More informationOracle Database In- Memory Op4on in Ac4on
Oracle Database In- Memory Op4on in Ac4on Tanel Põder & Kerry Osborne Accenture Enkitec Group h4p:// 1 Tanel Põder Intro: About Consultant, Trainer, Troubleshooter Oracle Database Performance geek Exadata
More informationBusiness Intelligence and Process Modelling
Business Intelligence and Process Modelling F.W. Takes Universiteit Leiden Lecture 2: Business Intelligence & Visual Analytics BIPM Lecture 2: Business Intelligence & Visual Analytics 1 / 72 Business Intelligence
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationIntroduction to Imagery and Raster Data in ArcGIS
Esri International User Conference San Diego, California Technical Workshops July 25, 2012 Introduction to Imagery and Raster Data in ArcGIS Simon Woo slides Cody Benkelman - demos Overview of Presentation
More informationORACLE BUSINESS INTELLIGENCE WORKSHOP
ORACLE BUSINESS INTELLIGENCE WORKSHOP Creating Interactive Dashboards and Using Oracle Business Intelligence Answers Purpose This tutorial shows you how to build, format, and customize Oracle Business
More informationTurning ClearPath MCP Data into Information with Business Information Server. White Paper
Turning ClearPath MCP Data into Information with Business Information Server White Paper 1 Many Unisys ClearPath MCP Series customers have Enterprise Database Server (DMSII) databases to support a variety
More informationOur Raison d'être. Identify major choice decision points. Leverage Analytical Tools and Techniques to solve problems hindering these decision points
Analytic 360 Our Raison d'être Identify major choice decision points Leverage Analytical Tools and Techniques to solve problems hindering these decision points Empowerment through Intelligence Our Suite
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationGraph Database Proof of Concept Report
Objectivity, Inc. Graph Database Proof of Concept Report Managing The Internet of Things Table of Contents Executive Summary 3 Background 3 Proof of Concept 4 Dataset 4 Process 4 Query Catalog 4 Environment
More informationParallel Analysis and Visualization on Cray Compute Node Linux
Parallel Analysis and Visualization on Cray Compute Node Linux David Pugmire, Oak Ridge National Laboratory and Hank Childs, Lawrence Livermore National Laboratory and Sean Ahern, Oak Ridge National Laboratory
More informationRaising the Bar (Chart)
Raising the Bar (Chart) THE NEXT GENERATION OF VISUALIZATION TOOLS Jeffrey Heer @jeffrey_heer Univ. of Washington + Trifacta ? Visualizing Big Data! Stratified Sampling Binned Aggregation immens: Real-Time
More informationBuilding Effective Dashboard Views Using OMEGAMON and the Tivoli Enterprise Portal
1 IBM Software Group Tivoli Software Building Effective Dashboard Views Using OMEGAMON and the Tivoli Enterprise Portal Ed Woods IBM Corporation 2011 IBM Corporation IBM s Integrated Service Management
More informationNews and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren
News and trends in Data Warehouse Automation, Big Data and BI Johan Hendrickx & Dirk Vermeiren Extreme Agility from Source to Analysis DWH Appliances & DWH Automation Typical Architecture 3 What Business
More informationSAP Data Services 4.X. An Enterprise Information management Solution
SAP Data Services 4.X An Enterprise Information management Solution Table of Contents I. SAP Data Services 4.X... 3 Highlights Training Objectives Audience Pre Requisites Keys to Success Certification
More informationGraph Mining and Social Network Analysis
Graph Mining and Social Network Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann
More informationDBMS / Business Intelligence, Business Intelligence / DBMS
DBMS / Business Intelligence, Business Intelligence / DBMS Orsys, with 30 years of experience, is providing high quality, independant State of the Art seminars and hands-on courses corresponding to the
More informationBIDM Project. Predicting the contract type for IT/ITES outsourcing contracts
BIDM Project Predicting the contract type for IT/ITES outsourcing contracts N a n d i n i G o v i n d a r a j a n ( 6 1 2 1 0 5 5 6 ) The authors believe that data modelling can be used to predict if an
More informationPLANET: Massively Parallel Learning of Tree Ensembles with MapReduce. Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo.
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Authors: B. Panda, J. S. Herbach, S. Basu, R. J. Bayardo. VLDB 2009 CS 422 Decision Trees: Main Components Find Best Split Choose split
More informationP6 Analytics Reference Manual
P6 Analytics Reference Manual Release 3.2 October 2013 Contents Getting Started... 7 About P6 Analytics... 7 Prerequisites to Use Analytics... 8 About Analyses... 9 About... 9 About Dashboards... 10 Logging
More informationVisualization Techniques in Data Mining
Tecniche di Apprendimento Automatico per Applicazioni di Data Mining Visualization Techniques in Data Mining Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo di Milano
More informationProject Overview. Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome
Project Overview Collabora'on Mee'ng with Op'mis, 20-21 Sept. 2011, Rome Cloud-TM at a glance "#$%&'$()!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"#$%&!"'!()*+!!!!!!!!!!!!!!!!!!!,-./01234156!("*+!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!&7"7#7"7!("*+!!!!!!!!!!!!!!!!!!!89:!;62!("$+!
More informationMicrosoft Consulting Services. PerformancePoint Services for Project Server 2010
Microsoft Consulting Services PerformancePoint Services for Project Server 2010 Author: Emmanuel Fadullon, Delivery Architect Microsoft Consulting Services August 2011 Information in the document, including
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationJun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC
Jun Liu, Senior Software Engineer Bianny Bian, Engineering Manager SSG/STO/PAC Agenda Quick Overview of Impala Design Challenges of an Impala Deployment Case Study: Use Simulation-Based Approach to Design
More informationSoftware Development & Education Center. Microsoft Office 2010. (Microsoft Project 2010)
Software Development & Education Center Microsoft Office 2010 (Microsoft Project 2010) Mastering Microsoft Project 2010 About This Course This three-day instructor-led course provides students with the
More informationIntegrating Apache Spark with an Enterprise Data Warehouse
Integrating Apache Spark with an Enterprise Warehouse Dr. Michael Wurst, IBM Corporation Architect Spark/R/Python base Integration, In-base Analytics Dr. Toni Bollinger, IBM Corporation Senior Software
More informationDB2 for i5/os: Tuning for Performance
DB2 for i5/os: Tuning for Performance Jackie Jansen Senior Consulting IT Specialist jjansen@ca.ibm.com August 2007 Agenda Query Optimization Index Design Materialized Query Tables Parallel Processing Optimization
More informationCOURSE SYLLABUS COURSE TITLE:
1 COURSE SYLLABUS COURSE TITLE: FORMAT: CERTIFICATION EXAMS: 55043AC Microsoft End to End Business Intelligence Boot Camp Instructor-led None This course syllabus should be used to determine whether the
More informationORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process
ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 16 - Data Warehousing Class Projects Class projects are going very well! Project presentations: 15 minutes On Wednesday
More informationIMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH
IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH Kalinka Mihaylova Kaloyanova St. Kliment Ohridski University of Sofia, Faculty of Mathematics and Informatics Sofia 1164, Bulgaria
More informationAdobe Insight, powered by Omniture
Adobe Insight, powered by Omniture Accelerating government intelligence to the speed of thought 1 Challenges that analysts face 2 Analysis tools and functionality 3 Adobe Insight 4 Summary Never before
More information