Role of Spatial in Benchmarking Big Data
|
|
- Reynold Webster
- 8 years ago
- Views:
Transcription
1 Role of Spatial in Benchmarking Big Data 2012 NSF Workshop on Big Data Benchmarking (San Jose, CA) Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering University of Minnesota For more details: 1. S. Shekhar et al., Identifying patterns in spatial information: A survey of Methods, Wiley Interdisciplinary Reviews in Data Mining and Knowledge Discovery, Volume 1, May/June S. Shekhar et al., Spatial Databases: Accomplishments and Research Needs, IEEE Transactions on Knowledge and Data Eng., 11(1), Jan./Feb (Updated version in Wiley Encyclopedia of Computer Science (Ed. Benjamin Wah), 2009.) 1
2 Emerging SBD: Geo-social Media, Device2Device 2
3 Why include Spatial Workload in Big Data Benchmark? Spatial Computing is critical to many societal grand challenges It is critical in cell-phone era of computing Decrease Map-Reduce Bias High-cost of Reduce step favors non-iterative workload MPI, OpenMP provide lightweight synchronization needed for data analytics Spatial provide iterative workloads to counter map-reduce bias Beyond Pre-Big-Data Computing Assumptions Beyond Sorting assumption in Relational DBMS Numbers, Character-Strings Points, Line-Strings, Polygons, Routes, Graphs Equi-Join Spatial-distance Join, Nearest Neighbor Beyond I.I.D. assumption in Statistics, Machine Learning, Independent Samples Auto-correlation Identical Distribution Heterogeneous, Non-stationary 3
4 How to include Spatial Workload in Big Data Benchmark? Table Schema Add home-address, work-address, cell-phone columns (for a customer) Derive addresses by reverse-geocoding spatial locations Generate locations from a mixture of point process for Hot-spots (auto-correlation) Urban, suburban, rural (geographic heterogeneity) Generate trajectories for cell-phones Generate routes between home and work using shortest-path algorithms Add temporal schedule using routine Add more points of interest beyond home and work, e.g. city simulators Spatial Queries Nearest Neighbor queries generated for home, work and points on commute routes Shortest paths between points of interest (home, work, ) Hotspots Trend, Change-points Metrics Footprint scale: local, regional, country, continent, global Mobile device interaction per second 4
5 Spatial Databases: Representative Projects Evacutation Route Planning Parallelize Range Queries Shortest Paths Storing graphs in disk blocks only in old plan Only in new plan In both plans 5
6 Spatial Data Mining : Representative Projects Location prediction: nesting sites Spatial outliers: sensor (#9) on I-35 Nest locations Distance to open water Vegetation durability Water depth Co-location Patterns Tele connections 6
7 Motivation for Spatial Computing Societal: Google Earth, Google Maps, Navigation, location-based service Global Challenges facing humanity many are geo-spatial! Future of Computer Science (CS) is to address societal challenges! 7
8 Spatial Computing: Reccent Trends Smarter Planet 8 SIG SPATIAL 8
9 Traditional Spatial Data Spatial attribute: Neighborhood and extent Geo-Reference: longitude, latitude, elevation Spatial data genre Raster: geo-images e.g., Google Earth Vector: point, line, polygons Graph, e.g., roadmap: node, edge, path Raster Data for UMN Campus Courtesy: UMN Graph Data for UMN Campus Courtesy: Bing Vector Data for UMN Campus Courtesy: MapQuest 9
10 Traditional SBD: Raster Example Data Sets: Google Earth, Bing, NASA Worldwind Satellite Imagery (periodic scan) Climate simulation outputs for next century Geo-videos from UAVs, security cameras Example use case Change detection Feature extraction Urban terrain Visualizing the Urban Terrain Raster Data for UMN Campus Courtesy: UMN Automated Change Detection Automatic Feature Extraction Average Monthly Temperature (Courtsey: NASA, Prof. V. Kumar) 10
11 Traditional SBD: Vector Vector data sub-genre Point, e.g., street addresses, Line-strings, e.g., road center line Polygons, e.g., zipcode boundaries, Collections of above types Common use cases Distance from a point, line or polygon Geo-Buffers around geo-features Nearest gas-station, store, hospital, Topological queries, Overlapping whose jurisdiction? Range query subset inside a polygon Aggregation: Hot-spots, emerging hot-spots of crime, disease, Spatial auto-correlation measures Spatial auto-regression Vector Data for UMN Campus Courtesy: MapQuest 11
12 Traditional SBD: Spatial Graphs Spatial Graph Examples Roadmaps, rail-road networks, air-routes Electric grid, Gas pipelines, supply chains, Graph data sub-genre Nodes, Edges, Routes, Flow networks with capacity constraints Use cases: Geo-code, Map-matching, Connectivity, Shortest paths, Travel-time based nearest store, hospital, Logistics, supply-chain management, Graph Data for UMN Campus Courtesy: Bing 12
13 Emerging SBD: Geo-social Media, Device2Device 13
14 Emergin Use-Case: Eco-Routing Minimize fuel consumption and GPG emission rather than proxies, e.g. distance, travel-time avoid congestion, idling at red-lights, turns and elevation changes, etc. U.P.S. Embraces High-Tech Delivery Methods (July 12, 2007) By The research at U.P.S. is paying off... saving roughly three million gallons of fuel in good part by mapping routes that minimize left turns. 14
15 Emerging SBD: Mobile Device2Device Mobile Device Examples Cell-phones, Check-ins, location API in HTML5, tweeter, Vehicles: cars, trucks, airplanes, RFID-tags, bar-codes, GPS-collars, Trajectory & Measurements sub-genre Receiver: GPS tracks, System: Cameras, RFID readers, Use cases: Tracking, Tracing, Improve service, deter theft Geo-fencing, Identify nearby friends Eco-routing 15 15
16 Emerging SBD: Geo-Sensor Networks Geo-Sensor Network Examples Urban roads Cameras in cities (Millions) Electricity distribution grids, Weather sensors networks, Robot with sensors, Sensor Network sub-genre Fixed reasonable resource: traffic sensors Ad-hoc, resource poor: wireless sensor networks Use cases: Monitoring Anomalies, e.g., accidents,, Real-time event detection Congestion, emerging hotspots, Feed-back control Predictive, anticipatory planning 16 16
17 SBD Metrics Data Type Representation Operations Potential Metrics Raster Geo-Matrix Geo-registration, Feature Extraction, Change Detection, spatial auto-regression Vector Network Points, Lines, Polygons Graphs (nodes, edges) Nearest Neighbor, Point Query, Range Query (e.g., Buffer), Spatial Join, Hotspot detection, etc. Shortest Path, Map matching, Geo-coding, Max Flow, Evacuation, etc. Raster operations per second Vector operations per second Shortest-Paths per second Mobile Devices 2 Device Check-ins, Trajectories, Measurements Check-in, identify close-by friends,.eco-routes, Track, trace Mobile device2device interactions per second 17 17
18 Relational DBMS to Spatial DBMS 1980s: Relational DBMS Relational Algebra Query Processing, e.g. sort-merge equi-join algorithm, B+ Tree index Spatial customer (e.g. NASA, USPS) got interested But faced challenges Semantic Gap Spatial concepts: distance, direction, overlap, inside, shortest paths, SQL representation was quite verbose Relational algebra can not represent Transitive closure Performance challenge due to linearity assumption Is B+ tree appropriate for geographic data? Is sorting natural in geographic space? New ideas emerged in 1990s Spatial data types and operations (e.g. OGIS Simple Features) R-tree, Spatial-Join-Index, space partitioning, 18
19 Data Mining to Spatial Data Mining 1990s: Data Mining Scale up to traditional models to large relational databases Linear regression, Decision Trees, New pattern families Association rules Which items are bought together? E.g. (Diaper, beer) Spatial customers Walmart Which items are bought just before/after events, e.g. hurricanes? Where is (diaper-beer) pattern prevalent? Global climate change But faced challenges Independence Assumption Transactions, i.e. disjoint partitioning of data 19
20 Spatial Prediction Nest locations Distance to open water Vegetation durability Water depth 20
21 Mental Model: Spatial Autocorrelation (SA) First Law of Geography All things are related, but nearby things are more related than distant things. [Tobler, 1970] Pixel property with independent identical distribution Vegetation Durability with SA Autocorrelation Traditional i.i.d. assumption is not valid Measures: K-function, Moran s I, Variogram, 21
22 Ex. 3: Hardest to Parallelize : the spatial auto - regression (auto - correlatio n) parameter W : n - by- n neighborho od matrix over spatial framework Name Classical Linear Regression Spatial Auto-Regression Model y x β ε y ρwy xβ ε Maximum Likelihood Estimation ln( L) ln I W n ln(2 2 ) n ln( 2 2 ) SSE Need cloud computing to scale up to large spatial dataset. However, Map reduce is too slow for iterative computations! computing determinant of large matrix is an open problem! 22
23 Clustering Clustering: Find groups of tuples Statistical Significance Complete spatial randomness, cluster, and de-cluster Inputs: Complete Spatial Random (CSR), Cluster, Decluster Classical Clustering (K-mean) Spatial Clustering 23
24 Spatial Outliers Spatial Outliers Traffic Data in Twin Cities Abnormal Sensor Detections Spatial and Temporal Outliers Spatial Join Based Tests 24
25 Association Patterns Association rule e.g. (Diaper in T => Beer in T) Transaction Items Bought 1 {socks,, milk,, beef, egg, } 2 {pillow,, toothbrush, ice-cream, muffin, } 3 {,, pacifier, formula, blanket, } n {battery, juice, beef, egg, chicken, } Support: probability (Diaper and Beer in T) = 2/5 Confidence: probability (Beer in T Diaper in T) = 2/2 Algorithm Apriori [Agarwal, Srikant, VLDB94] Support based pruning using monotonicity Note: Transaction is a core concept! 25
26 Pattern Family 4: Co-locations/Co-occurrence Given: A collection of different types of spatial events Find: Co-located subsets of event types Challenge: No Transactions New Approaches Spatial Join Based 26
27 Parallelizing Spatial Big Data on Cloud Computing Parallelizing Spatial Computing Case 1: Compute Spatial-Autocorrelation Simpler to Parallelize Map-reduce is okay Should it provide spatial de-clustering services? Can query-compiler generate map-reduce parallel code? Case 2: Harder : Parallelize Range Query on Polygon Maps Need dynamic load balancing beyond map-reduce But, local processing is cheaper than sending it to another node! MPI or OpenMP is better! Case 3: Estimate Spatial Auto-Regression Parameters, Routing Map-reduce is inefficient for iterative computations! MPI or OpenMP is essential! Golden section search, Determinant of large matrix Eco-routing algorithms, Evacuation route planning 27
28 Why include Spatial Workload in Big Data Benchmark? Spatial Computing is critical to many societal grand challenges It is critical in cell-phone era of computing Decrease Map-Reduce Bias High-cost of Reduce step favors non-iterative workload MPI, OpenMP provide lightweight synchronization needed for data analytics Spatial provide iterative workloads to counter map-reduce bias Beyond Pre-Big-Data Computing Assumptions Beyond Sorting assumption in Relational DBMS Numbers, Character-Strings Points, Line-Strings, Polygons, Routes, Graphs Equi-Join Spatial-distance Join, Nearest Neighbor Beyond I.I.D. assumption in Statistics, Machine Learning, Independent Samples Auto-correlation Identical Distribution Heterogeneous, Non-stationary 28
29 How to include Spatial Workload in Big Data Benchmark? Table Schema Add home-address, work-address, cell-phone columns (for a customer) Derive addresses by reverse-geocoding spatial locations Generate locations from a mixture of point process for Hot-spots (auto-correlation) Urban, suburban, rural (geographic heterogeneity) Generate trajectories for cell-phones Generate routes between home and work using shortest-path algorithms Add temporal schedule using routine Add more points of interest beyond home and work, e.g. city simulators Spatial Queries Nearest Neighbor queries generated for home, work and points on commute routes Shortest paths between points of interest (home, work, ) Hotspots Trend, Change-points Metrics Footprint scale: local, regional, country, continent, global Mobile device interaction per second 29
Spatial Big Data Challenges
Spatial Big Data Challenges ARO/NSF Workshop on Big Data at Large: Applications and Algorithms (Durham, NC) June 14 th, 2012. Congratulations Army on 237 th Anniversary! Shashi Shekhar McKnight Distinguished
More informationSpatial Big Data. Shashi Shekhar
Spatial Big Data Shashi Shekhar McKnight Distinguished University Professor Department of Computer Science and Engineering, University of Minnesota www.cs.umn.edu/~shekhar AAG-NIH Symp. on Enabling a National
More informationSpatial Big Data: A Perspective
Spatial Big Data: A Perspective Computing, Informatics and Decision Systems Eng. Distinguished Lecture Series Arizona State University May 16 th, 2013. Shashi Shekhar McKnight Distinguished University
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive
More informationRecommendations in Mobile Environments. Professor Hui Xiong Rutgers Business School Rutgers University. Rutgers, the State University of New Jersey
1 Recommendations in Mobile Environments Professor Hui Xiong Rutgers Business School Rutgers University ADMA-2014 Rutgers, the State University of New Jersey Big Data 3 Big Data Application Requirements
More informationSpatial Big Data Analytics for Urban Informatics
Spatial Big Data Analytics for Urban Informatics A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Michael Robert Evans IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
More informationOracle Spatial and Graph. Jayant Sharma Director, Product Management
Oracle Spatial and Graph Jayant Sharma Director, Product Management Agenda Oracle Spatial and Graph Graph Capabilities Q&A 2 Oracle Spatial and Graph Complete Open Integrated Most Widely Used 3 Open and
More informationIs a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
More informationOracle Big Data Spatial and Graph
Oracle Big Data Spatial and Graph Oracle Big Data Spatial and Graph offers a set of analytic services and data models that support Big Data workloads on Apache Hadoop and NoSQL database technologies. For
More informationDeveloping Fleet and Asset Tracking Solutions with Web Maps
Developing Fleet and Asset Tracking Solutions with Web Maps Introduction Many organizations have mobile field staff that perform business processes away from the office which include sales, service, maintenance,
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationSpatial Data Analysis
14 Spatial Data Analysis OVERVIEW This chapter is the first in a set of three dealing with geographic analysis and modeling methods. The chapter begins with a review of the relevant terms, and an outlines
More informationVisualize your World. Democratization i of Geographic Data
Visualize your World Democratization i of Geographic Data Session Agenda Google GEO Solutions - More than just a Map Enabling our Government Customers- Examples Summary & Invite to Learn More About Google
More informationBig Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel
Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationGIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)
GIS Initiative: Developing an atmospheric data model for GIS Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP) Unidata seminar August 30, 2004 Presentation Outline Overview
More informationParallel Visualization for GIS Applications
Parallel Visualization for GIS Applications Alexandre Sorokine, Jamison Daniel, Cheng Liu Oak Ridge National Laboratory, Geographic Information Science & Technology, PO Box 2008 MS 6017, Oak Ridge National
More informationTracking System for GPS Devices and Mining of Spatial Data
Tracking System for GPS Devices and Mining of Spatial Data AIDA ALISPAHIC, DZENANA DONKO Department for Computer Science and Informatics Faculty of Electrical Engineering, University of Sarajevo Zmaja
More information<Insert Picture Here> Data Management Innovations for Massive Point Cloud, DEM, and 3D Vector Databases
Data Management Innovations for Massive Point Cloud, DEM, and 3D Vector Databases Xavier Lopez, Director, Product Management 3D Data Management Technology Drivers: Challenges & Benefits
More informationIntroduction to Spatial Data Mining
Introduction to Spatial Data Mining 7.1 Pattern Discovery 7.2 Motivation 7.3 Classification Techniques 7.4 Association Rule Discovery Techniques 7.5 Clustering 7.6 Outlier Detection Introduction: a classic
More informationLocation matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort xavier.conort@gear-analytics.com Motivation Location matters! Observed value at one location is
More informationCHAPTER-24 Mining Spatial Databases
CHAPTER-24 Mining Spatial Databases 24.1 Introduction 24.2 Spatial Data Cube Construction and Spatial OLAP 24.3 Spatial Association Analysis 24.4 Spatial Clustering Methods 24.5 Spatial Classification
More informationLocation Identification and Vehicle Tracking using VANET(VETRAC)
Location Identification and Vehicle Tracking using VANET(VETRAC) Supervisor Name: Md. Mosaddik Hasan Assistant Professor Dept. of CSE MBSTU Md. Al-Amin Nipu CE-10031 Dept. of CSE MBSTU Abstract: Large
More informationHigh Performance Spatial Queries and Analytics for Spatial Big Data. Fusheng Wang. Department of Biomedical Informatics Emory University
High Performance Spatial Queries and Analytics for Spatial Big Data Fusheng Wang Department of Biomedical Informatics Emory University Introduction Spatial Big Data Geo-crowdsourcing:OpenStreetMap Remote
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationClustering. Adrian Groza. Department of Computer Science Technical University of Cluj-Napoca
Clustering Adrian Groza Department of Computer Science Technical University of Cluj-Napoca Outline 1 Cluster Analysis What is Datamining? Cluster Analysis 2 K-means 3 Hierarchical Clustering What is Datamining?
More informationModel, Analyze and Optimize the Supply Chain
Model, Analyze and Optimize the Supply Chain Optimize networks Improve product flow Right-size inventory Simulate service Balance production Optimize routes The Leading Supply Chain Design and Analysis
More informationBig Data Analytics in Mobile Environments
1 Big Data Analytics in Mobile Environments 熊 辉 教 授 罗 格 斯 - 新 泽 西 州 立 大 学 2012-10-2 Rutgers, the State University of New Jersey Why big data: historical view? Productivity versus Complexity (interrelatedness,
More informationUsing D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables
Using D2K Data Mining Platform for Understanding the Dynamic Evolution of Land-Surface Variables Praveen Kumar 1, Peter Bajcsy 2, David Tcheng 2, David Clutter 2, Vikas Mehra 1, Wei-Wen Feng 2, Pratyush
More informationIntroduction to GIS (Basics, Data, Analysis) & Case Studies. 13 th May 2004. Content. What is GIS?
Introduction to GIS (Basics, Data, Analysis) & Case Studies 13 th May 2004 Content Introduction to GIS Data concepts Data input Analysis Applications selected examples What is GIS? Geographic Information
More informationContinuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information
Continuous Fastest Path Planning in Road Networks by Mining Real-Time Traffic Event Information Eric Hsueh-Chan Lu Chi-Wei Huang Vincent S. Tseng Institute of Computer Science and Information Engineering
More informationThe Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationHow To Understand The History Of Navigation In French Marine Science
E-navigation, from sensors to ship behaviour analysis Laurent ETIENNE, Loïc SALMON French Naval Academy Research Institute Geographic Information Systems Group laurent.etienne@ecole-navale.fr loic.salmon@ecole-navale.fr
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationOracle Database 10g: Building GIS Applications Using the Oracle Spatial Network Data Model. An Oracle Technical White Paper May 2005
Oracle Database 10g: Building GIS Applications Using the Oracle Spatial Network Data Model An Oracle Technical White Paper May 2005 Building GIS Applications Using the Oracle Spatial Network Data Model
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationAssessment of Workforce Demands to Shape GIS&T Education
Assessment of Workforce Demands to Shape GIS&T Education Gudrun Wallentin, Barbara Hofer, Christoph Traun gudrun.wallentin@sbg.ac.at University of Salzburg, Dept. of Geoinformatics Z_GIS, Austria www.gi-n2k.eu
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationIntroduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing
Introduction to Data Mining and Machine Learning Techniques Iza Moise, Evangelos Pournaras, Dirk Helbing Iza Moise, Evangelos Pournaras, Dirk Helbing 1 Overview Main principles of data mining Definition
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationSmart Cities Solution Overview Innovation Center Network, Research & Innovation. SAP SE Reiner Bildmayer
Smart Cities Solution Overview Innovation Center Network, Research & Innovation SAP SE Reiner Bildmayer Why Cities need to be Run Better Challenges and Opportunities ~50% of the world s population currently
More informationContext-Aware Online Traffic Prediction
Context-Aware Online Traffic Prediction Jie Xu, Dingxiong Deng, Ugur Demiryurek, Cyrus Shahabi, Mihaela van der Schaar University of California, Los Angeles University of Southern California J. Xu, D.
More informationAnalytics on Big Data
Analytics on Big Data Riccardo Torlone Università Roma Tre Credits: Mohamed Eltabakh (WPI) Analytics The discovery and communication of meaningful patterns in data (Wikipedia) It relies on data analysis
More informationBig Data Analytics. Chances and Challenges. Volker Markl
Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD
More informationAnomaly Detection and Predictive Maintenance
Anomaly Detection and Predictive Maintenance Rosaria Silipo Iris Adae Christian Dietz Phil Winters Rosaria.Silipo@knime.com Iris.Adae@uni-konstanz.de Christian.Dietz@uni-konstanz.de Phil.Winters@knime.com
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationSee the wood for the trees
See the wood for the trees Dr. Harald Schöning Head of Research The world is becoming digital socienty government economy Digital Society Digital Government Digital Enterprise 2 Data is Getting Bigger
More informationWhere is... How do I get to...
Big Data, Fast Data, Spatial Data Making Sense of Location Data in a Smart City Hans Viehmann Product Manager EMEA ORACLE Corporation August 19, 2015 Copyright 2014, Oracle and/or its affiliates. All rights
More informationGIS Databases With focused on ArcSDE
Linköpings universitet / IDA / Div. for human-centered systems GIS Databases With focused on ArcSDE Imad Abugessaisa g-imaab@ida.liu.se 20071004 1 GIS and SDBMS Geographical data is spatial data whose
More informationA Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML
www.bsc.es A Case of Study on Hadoop Benchmark Behavior Modeling Using ALOJA-ML Josep Ll. Berral, Nicolas Poggi, David Carrera Workshop on Big Data Benchmarks Toronto, Canada 2015 1 Context ALOJA: framework
More informationAn Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015
An Introduction to Data Mining for Wind Power Management Spring 2015 Big Data World Every minute: Google receives over 4 million search queries Facebook users share almost 2.5 million pieces of content
More informationIntroduction to GIS. http://libguides.mit.edu/gis
Introduction to GIS http://libguides.mit.edu/gis 1 Overview What is GIS? Types of Data and Projections What can I do with GIS? Data Sources and Formats Software Data Management Tips 2 What is GIS? 3 Characteristics
More informationSmart City Australia
Smart City Australia Slaven Marusic Department of Electrical and Electronic Engineering The University of Melbourne, Australia ARC Research Network on Intelligent Sensors, Sensor Networks and Information
More informationLoad balancing in a heterogeneous computer system by self-organizing Kohonen network
Bull. Nov. Comp. Center, Comp. Science, 25 (2006), 69 74 c 2006 NCC Publisher Load balancing in a heterogeneous computer system by self-organizing Kohonen network Mikhail S. Tarkov, Yakov S. Bezrukov Abstract.
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationAPPM4720/5720: Fast algorithms for big data. Gunnar Martinsson The University of Colorado at Boulder
APPM4720/5720: Fast algorithms for big data Gunnar Martinsson The University of Colorado at Boulder Course objectives: The purpose of this course is to teach efficient algorithms for processing very large
More informationIntroduction. Introduction. Spatial Data Mining: Definition WHAT S THE DIFFERENCE?
Introduction Spatial Data Mining: Progress and Challenges Survey Paper Krzysztof Koperski, Junas Adhikary, and Jiawei Han (1996) Review by Brad Danielson CMPUT 695 01/11/2007 Authors objectives: Describe
More informationAn Ontology-Based Approach for Optimal Resource Allocation in Vehicular Cloud Computing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 2, February 2015,
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 3, Issue 11, November 2015 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationHow To Use Hadoop For Gis
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
More informationAsking Hard Graph Questions. Paul Burkhardt. February 3, 2014
Beyond Watson: Predictive Analytics and Big Data U.S. National Security Agency Research Directorate - R6 Technical Report February 3, 2014 300 years before Watson there was Euler! The first (Jeopardy!)
More informationReducing Environmental Footprint based on Multi-Modal Fleet Management Systems for Eco-Routing and Driver Behavior Adaptation
Reducing Environmental Footprint based on Multi-Modal Fleet Management Systems for Eco-Routing and Driver Behavior Adaptation Josif Grabocka Umer Khan Lars Schmidt-Thieme Information Systems and Machine
More informationTowards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems
Towards a Thriving Data Economy: Open Data, Big Data, and Data Ecosystems Volker Markl volker.markl@tu-berlin.de dima.tu-berlin.de dfki.de/web/research/iam/ bbdc.berlin Based on my 2014 Vision Paper On
More informationKeywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
More informationEVENT CENTRIC MODELING APPROACH IN CO- LOCATION PATTERN ANALYSIS FROM SPATIAL DATA
EVENT CENTRIC MODELING APPROACH IN CO- LOCATION PATTERN ANALYSIS FROM SPATIAL DATA Venkatesan.M 1, Arunkumar.Thangavelu 2, Prabhavathy.P 3 1& 2 School of Computing Science & Engineering, VIT University,
More informationLet the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationSpatial Big Data: Case Studies on Volume, Velocity, and Variety
Spatial Big Data: Case Studies on Volume, Velocity, and Variety Michael R. Evans Dev Oliver Xun Zhou Shashi Shekhar Department of Computer Science University of Minnesota Minneapolis, MN 1 Introduction
More informationHow To Manage Assets With Geospatially Enabled Asset Management
Point of view Geospatially enhanced asset management: Everything in its place "Geospatially enabled asset management uses the power of location not only to properly value and utilize assets, but also as
More informationCOPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments
Contents List of Figures Foreword Preface xxv xxiii xv Acknowledgments xxix Chapter 1 Fraud: Detection, Prevention, and Analytics! 1 Introduction 2 Fraud! 2 Fraud Detection and Prevention 10 Big Data for
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationParameter Estimation for the Spatial Autoregression Model: A Rigorous Approach
Parameter Estimation for the Spatial Autoregression Model: A Rigorous Approach Mete CelikÝ Baris M. KazarÞ Shashi ShekharÝ Daniel BoleyÝ Abstract The spatial autoregression (SAR) model is a knowledge discovery
More information3-D Object recognition from point clouds
3-D Object recognition from point clouds Dr. Bingcai Zhang, Engineering Fellow William Smith, Principal Engineer Dr. Stewart Walker, Director BAE Systems Geospatial exploitation Products 10920 Technology
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationThe Data Mining Process
Sequence for Determining Necessary Data. Wrong: Catalog everything you have, and decide what data is important. Right: Work backward from the solution, define the problem explicitly, and map out the data
More informationA Novel Location-Centric IoT-Cloud Based On-Street Car Parking Violation Management System in Smart Cities
sensors Article A Novel Location-Centric IoT-Cloud Based On-Street Car Parking Violation Management System in Smart Cities Thanh Dinh 1,2 and Younghan Kim 1, * 1 School of Electronic Engineering, Soongsil
More informationEnvironmental Remote Sensing GEOG 2021
Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class
More informationCluster Analysis: Advanced Concepts
Cluster Analysis: Advanced Concepts and dalgorithms Dr. Hui Xiong Rutgers University Introduction to Data Mining 08/06/2006 1 Introduction to Data Mining 08/06/2006 1 Outline Prototype-based Fuzzy c-means
More informationMining Network Relationships in the Internet of Things
Mining Network Relationships in the Internet of Things PAT DOODY, DIRECTOR OF THE CENTRE FOR INNOVATION IN DISTRIBUTED SYSTEMS (CIDS) INSTITUTE OF TECHNOLOGY TRALEE ANDREW SHIELDS IRC FUNDED RESEARCHER
More informationMap Matching and Real World Integrated Sensor Data Warehousing
Map Matching and Real World Integrated Sensor Data Warehousing www.nrel.gov/tsdc www.nrel.gov/fleet_dna Evan Burton Data Engineer (Presenter) Jeff Gonder Vehicle System Analysis Team Lead Adam Duran Engineer/Analyst
More informationANALYTICS IN BIG DATA ERA
ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY, DISCOVER RELATIONSHIPS AND CLASSIFY HUGE AMOUNT OF DATA MAURIZIO SALUSTI SAS Copyr i g ht 2012, SAS Ins titut
More informationSESSION 8: GEOGRAPHIC INFORMATION SYSTEMS AND MAP PROJECTIONS
SESSION 8: GEOGRAPHIC INFORMATION SYSTEMS AND MAP PROJECTIONS KEY CONCEPTS: In this session we will look at: Geographic information systems and Map projections. Content that needs to be covered for examination
More informationSanjeev Kumar. contribute
RESEARCH ISSUES IN DATAA MINING Sanjeev Kumar I.A.S.R.I., Library Avenue, Pusa, New Delhi-110012 sanjeevk@iasri.res.in 1. Introduction The field of data mining and knowledgee discovery is emerging as a
More informationProject Participants
Annual Report for Period:10/2006-09/2007 Submitted on: 08/15/2007 Principal Investigator: Yang, Li. Award ID: 0414857 Organization: Western Michigan Univ Title: Projection and Interactive Exploration of
More informationTeam Builder Project
Team Builder Project Software Requirements Specification Draft 2 February 2, 2015 Team:.dat ASCII 1 Table of Contents Introduction Purpose 4 Scope of Project.4 Overview.5 Business Context 5 Glossary 6
More informationRF Coverage Validation and Prediction with GPS Technology
RF Coverage Validation and Prediction with GPS Technology By: Jin Yu Berkeley Varitronics Systems, Inc. 255 Liberty Street Metuchen, NJ 08840 It has taken many years for wireless engineers to tame wireless
More informationWeb and Mobile GIS Applications Development
Web and Mobile GIS Applications Development Presented by : Aamir Ali Manager Section Head (GIS Software Customization) Pakistan Space and Upper Atmosphere Research Commission (SUPARCO) Geographical Information
More informationMining Big Data. Pang-Ning Tan. Associate Professor Dept of Computer Science & Engineering Michigan State University
Mining Big Data Pang-Ning Tan Associate Professor Dept of Computer Science & Engineering Michigan State University Website: http://www.cse.msu.edu/~ptan Google Trends Big Data Smart Cities Big Data and
More informationVisual Data Mining. Motivation. Why Visual Data Mining. Integration of visualization and data mining : Chidroop Madhavarapu CSE 591:Visual Analytics
Motivation Visual Data Mining Visualization for Data Mining Huge amounts of information Limited display capacity of output devices Chidroop Madhavarapu CSE 591:Visual Analytics Visual Data Mining (VDM)
More informationSeaCloudDM: Massive Heterogeneous Sensor Data Management in the Internet of Things
SeaCloudDM: Massive Heterogeneous Sensor Data Management in the Internet of Things Jiajie Xu Institute of Software, Chinese Academy of Sciences (ISCAS) 2012-05-15 Outline 1. Challenges in IoT Data Management
More informationData Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining
Data Mining Cluster Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 8 by Tan, Steinbach, Kumar 1 What is Cluster Analysis? Finding groups of objects such that the objects in a group will
More informationVector storage and access; algorithms in GIS. This is lecture 6
Vector storage and access; algorithms in GIS This is lecture 6 Vector data storage and access Vectors are built from points, line and areas. (x,y) Surface: (x,y,z) Vector data access Access to vector
More informationDistributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu
Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.
More informationStreaming Analytics and the Internet of Things: Transportation and Logistics
Streaming Analytics and the Internet of Things: Transportation and Logistics FOOD WASTE AND THE IoT According to the Food and Agriculture Organization of the United Nations, every year about a third of
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationOracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
More information