Yogesh Simmhan. Computer Engineering & Center for Energy Informatics
|
|
- Shon Maximilian Riley
- 8 years ago
- Views:
Transcription
1 Yogesh Simmhan Computer Engineering & Center for Energy Informatics
2 Big Data 3 V s of Big Data (Gartner) Volume Variety Velocity Is Big Data a hype or a new realization? Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 2
3 Academia was ahead of the curve HPC, Grid, SC Center, Computational Science, MPI, TeraFLOPs, Extreme Computing Compute Intensive Data Intensive Data Grid, Clouds, Data Centers, Informatics, 4 th Paradigm, Hadoop, Petabyte, Distributed Data Everywhere Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 3
4 And they keep growing Enterprise Data Warehouses, Web Logs High Thru put Instruments (NGS) Cyber Physical Systems (Smart Infrastructure) Large Instruments (LHC, LSST) Social Networks Quantified Self, Personal Informatics Internet of Things Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 4
5 Vast & Fast Data Complexity & Dynamism have been less examined! Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 5
6 Evolving Nature of Cyber-Infra Traditional HPC and Accelerated computing Clouds, GPGPU, FPGA, XMT, ARM/SoC, Phi, NTV Democratized, Massively parallel, Faster, Power Efficient Some are here, some are coming soon Both research & usability questions Which ones work well? For what apps? Benchmarks, user support, tutorials, APIs Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 6
7 and Big Data Platforms Hadoop (MapReduce) Tuple/NoSQL Giraph (Pregel) Graph Processing using BSP Storm, InfoSphere Streams Stream Processing Esper, Siddhi Complex Event Processing Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 7
8 THE APPLICATION Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 8
9 Smart Grids Distribution Transmission Generation A Cyber-Physical- Social System Residential Utility Commercial Microgrids Buildings Control Center Solar Cogeneration Rooms HVAC & Lighting Electric Vehicles Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 9
10 Energy Technology Shift US EIA s Annual Energy Outlook 2011 Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 10
11 Technology Shift Big Data Ed. i-lab.usc.edu Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 11
12 Demand Response Optimization Demand curtailment by consumers thru shedding, shifting & shaping load in response to utility request LADWP s Renewable Portfolio Standard: Challenges and Implementation, DWP When to curtail? Forecast when demand will outstrip supply based on realtime information Whom to target? Predict customers & buildings to request based on current conditions Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 12
13 Temperature 'F Consumers Load Shedding in Lindley Hall 1 0 Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 13
14 Consumers smartgrid.usc.edu ladwp.com Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 14
15 USC Campus Microgrid Testbed City within a city Largest private institutional customer of LA DWP Electric load of 28 MW Diversity Dorms, Classrooms, Offices, Hospitals, Restaurants 33k students, 13k staff 301 acres A living, learning laboratory for controlled & calibrated validation of systems, operational scenarios, & behavior Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 15
16 Big Data in Campus Microgrid Real-time streaming data ~50,000 1/15min intervals Dynamic Data 5 years of historical data ~170 1/15 min intervals Big Data Integration with social & infra. Data Customer surveys, building & organization details Complex Data Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 16
17 From Microgrid to City Grid LA Smart Grid Demonstration Project DOE & LA Dept of Water and Power De-risk novel analytics for smart grids Scalable software platform & algorithms LA DWP: Largest municipal utility in US 4M residents, 1.4M customers 7GW load, ~1% of US electricity use 50,000 customers getting Smart Meters Big Data software platform for demand-side energy management Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 17
18 Scalable Software Architecture for DR in the USC Microgrid Environment, Events Customers Analysts Customers, Facilities Engineers Visualization Researchers Generation Capacity Data Equipment, Sensors Monitor Ingest Data Store & Share Data Forecast Demand Decide D 2 R Strategy Curtailment Notification Voluntary and Direct Load Control Article in CiSE Cloud Special Issue, 2013 Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 18
19 Consumption (in kwh) Consumption (in kwh) Consumption Data Variability office building 15-min time Interval of the day dorm building 15-min time Interval of the day Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 19
20 Consumption Data Variability Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 20
21 Customer Predictive Analytics on Big Data Historical KWH Time Series Data Set Historical Feature Data Set ARIMA Model Training Current Feature Data Set Regression Tree ML Model Training Campus-scale 15-min Prediction Errors ARIMA Forecasting Forecasted KWH Time Series Data Set Regression Tree Forecasting Actual KWH Time Series Data Set Calculate Error Measures Applications to Outage management, Energy markets, sustainable plant construction, Reliable renewable integration Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 21
22 Analytics over Dynamic Data THE PLATFORM Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 22
23 Floe: Adaptive Stream Processing Provision Cloud Service Provider Cloud Infrastructure IEEE SCALE Challenge winner, 2012 SC & escience, 2013 Papers on Dynamic Dataflows, Online Dataflow Updates Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 23
24 SCEPter: Complex Event Processing with Semantics and operating across offline & online data Next Frontiers in Big Data, Y. Simmhan Big Data, 2013 paper on High Availability SCEP 4-Sep-13 24
25 GoFFish: Temporal Graph Analytics Scalable platform for graphoriented event analytics Highly performant Big Data framework for clusters & clouds Data analytics over heterogeneous inter-connected event sources Novel data mapping to time-series graphs, with efficient layout on distributed storage Compose and efficiently execute dataflow applications over timeseries graphs Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 25
26 Analysis over Time-series Graphs Fixed graph of event sources Known relationships between sources E.g. pathway E.g. red car event from a camera Event Streams form graph time-series Track path of the red car Mine for interesting trajectory Inferring track from micro-paths Other analysis Wide Area outage management Event clustering and aggregation Topic propagation in social networks Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 26
27 GoFFish Software Platform Platform to store, compose & execute analytics on time-series graph datasets At scale, on distributed systems GoFS: Distributed Graphoriented File System Gopher Compose sub-graph centric complex analytics Executed on Floe streaming dataflow engine Data & Compute collocated Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 27
28 When Had ** p is just not good enough Gap in frameworks for dynamic graphs Tuple/row/column-oriented frameworks E.g. Hadoop, Hive, Impala Graph Databases & In-Memory E.g. Flock DB, Neo4J, MSR Trinity, Giraph Focus on large simple graphs, complex queries, distributed in-memory Parallel Graph Frameworks MPI, parallel computing, steep curve Specialized HPC/shared-memory hardware Storage & compute are (loosely)coupled Vertical Scaling Horizontal Scaling Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 28
29 Design Insights Mitigate Weak Links 1) Disk I/O is the weakest link for big data Do more with less disk reads, parallel disk I/O Graph layout on distributed disks is key! 2) Network I/O Limit network communicate. Transfer in chunks. Graph partitioning on distributed hosts is key! 3) Memory Capacity Not all data fits in distributed memory Incremental loading & computation 4) CPU Think Hadoop Elastic Cloud execution, Leverage many-core Think Hive Think Pregel USP! Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 29
30 Data Model Designed for sub-graph centric distributed computing Graphs» Partitions Distributed evenly across machines Sub-graphs Logical unit of operation Vertices Host A Host B Sub-graph is unit of distributed data access & operation Extends Google Pregel/Apache Giraph s vertex-centric BSP model no global view Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 30
31 Data Model Designed for sub-graph distributed centric computing over time-series graphs Graph template Common features Graph instances Time-variant features Instance vertex follows the template vertex s partition Host A t 1 t 2 t 3 t4 t 5 Host B Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 31
32 Logical (User) API Graph instances have topology, timestamp, name-value attributes for V & ESubgraph { } timestamp, V[], E_local[], E_remote[], V_attr[][], E_attr[][] PTId GetLocalPartition(GTId) SGTId[] GetSubgraphTemplates(PTId) Iterator<Subgraph> GetSubgraphInstances(SGTId, Start_time, End_time, Vertex_Attrs[], Edge_Attrs[]) Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 32
33 Data Layout on local & DFS We operate on a distributed file system Commodity disks, hosts similar to HDFS Distributed Graph Layout (Across hosts) Local Partition Layout (Within a host) How do sub-graph templates, instances & attributes, over time, map to files on disk? Slice is a unit of disk access Split sub-graph model (by time, attributes) Group similar items (topology, attribute types, time-ranges) Compactly packed... Kryo, protobuf, custom Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 33
34 GoFS Architecture Gopher Host Z Head Node Sub-Graph Task 0 Host A Data Node Host B Data Node Host X Data Node Task 1 Task 2 Task N Sub-Graph Task 1 Sub-Graph Task 2 Sub-Graph Task N Time-series Graph Data Model API [User] GoFS Slice (File System) Layout API [Developer] Graph Template Slices Partition Metadata Slice SubGraph Template Slices Attribute Instances Attribute Instances Partition Metadata Slice SubGraph Template Slices Attribute Instances Attribute Instances Partition Metadata Slice Graph Template Metadata Slice SubGraph Template Slices Attribute Instances Attribute Instances Network Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 34
35 Gopher Design N 1 Vertex-centric graph model Google Pregel (Apache Giraph) N Message overhead, dynamic graphs Sub-graph centric streaming dataflows Sub-graphs reduce messaging, more local ops Streaming allows incremental execution Dataflow composition more flexible than BSP v i, M N 2 3 Optimizations and Analysis of BSP Graph Processing Models on Public Clouds, Redekopp, Simmhan & Prasanna, IPDPS 2013 Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 35
36 Gopher SubGraph Prog. Model Sub-graph initiator [SG Id] compute(iterator t <SubGraph>) {SGId,t,M[]} Sub-graph processor [SG Id] compute(t, M[]) {SGId,t,M[]} Initiator/processor runs on GoFS data node Streaming/BSP messaging to remote edges Process subgraph before expanding to neighbors Allows # of supersteps to be reduced by O(subgraph), relative to Giraph/Pregel! Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 36
37 IP # Vantage Point Visual Analytics Abstract rendering with Continuum/IU Hop Distance IP # Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 37
38 To Conclude There is more to Big Data than 3-V s Interesting problems at the Intersection Dynamic Data & Realtime analytics are essential for emerging Cyber-Physical apps Industry vs Academia Gap in scalable analytics platforms for dynamic graphs GoFFish s focus on temporal graphs Look towards leveraging accelerated CI Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 38
39 Acknowledgement ceng.usc.edu/~simmhan THANK YOU! Next Frontiers in Big Data, Y. Simmhan 4-Sep-13 39
Large-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationOverview on Graph Datastores and Graph Computing Systems. -- Litao Deng (Cloud Computing Group) 06-08-2012
Overview on Graph Datastores and Graph Computing Systems -- Litao Deng (Cloud Computing Group) 06-08-2012 Graph - Everywhere 1: Friendship Graph 2: Food Graph 3: Internet Graph Most of the relationships
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationCLOUD-BASED SOFTWARE PLATFORM FOR DATA-DRIVEN SMART GRID MANAGEMENT
CLOUD-BASED SOFTWARE PLATFORM FOR DATA-DRIVEN SMART GRID MANAGEMENT Yogesh Simmhan, Saima Aman, Alok Kumbhare, Rongyang Liu, Sam Stevens, Qunzhi Zhou and Viktor Prasanna, University of Southern California,
More informationInformation Processing, Big Data, and the Cloud
Information Processing, Big Data, and the Cloud James Horey Computational Sciences & Engineering Oak Ridge National Laboratory Fall Creek Falls 2010 Information Processing Systems Model Parameters Data-intensive
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationHur hanterar vi utmaningar inom området - Big Data. Jan Östling Enterprise Technologies Intel Corporation, NER
Hur hanterar vi utmaningar inom området - Big Data Jan Östling Enterprise Technologies Intel Corporation, NER Legal Disclaimers All products, computer systems, dates, and figures specified are preliminary
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationIndustry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationChing-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015
E6893 Big Data Analytics Lecture 8: Spark Streams and Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More information! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)
! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I) Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and
More informationTeradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
More informationConvex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics
Convex Optimization for Big Data: Lecture 2: Frameworks for Big Data Analytics Sabeur Aridhi Aalto University, Finland Sabeur Aridhi Frameworks for Big Data Analytics 1 / 59 Introduction Contents 1 Introduction
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationEnergy Efficient MapReduce
Energy Efficient MapReduce Motivation: Energy consumption is an important aspect of datacenters efficiency, the total power consumption in the united states has doubled from 2000 to 2005, representing
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationAli Ghodsi Head of PM and Engineering Databricks
Making Big Data Simple Ali Ghodsi Head of PM and Engineering Databricks Big Data is Hard: A Big Data Project Tasks Tasks Build a Hadoop cluster Challenges Clusters hard to setup and manage Build a data
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationBig Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage
Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j
More informationMizan: A System for Dynamic Load Balancing in Large-scale Graph Processing
/35 Mizan: A System for Dynamic Load Balancing in Large-scale Graph Processing Zuhair Khayyat 1 Karim Awara 1 Amani Alonazi 1 Hani Jamjoom 2 Dan Williams 2 Panos Kalnis 1 1 King Abdullah University of
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics February 11, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationAnalysis of Web Archives. Vinay Goel Senior Data Engineer
Analysis of Web Archives Vinay Goel Senior Data Engineer Internet Archive Established in 1996 501(c)(3) non profit organization 20+ PB (compressed) of publicly accessible archival material Technology partner
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationBIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
More informationAddressing Open Source Big Data, Hadoop, and MapReduce limitations
Addressing Open Source Big Data, Hadoop, and MapReduce limitations 1 Agenda What is Big Data / Hadoop? Limitations of the existing hadoop distributions Going enterprise with Hadoop 2 How Big are Data?
More informationlocuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationBig Data and Analytics: A Conceptual Overview. Mike Park Erik Hoel
Big Data and Analytics: A Conceptual Overview Mike Park Erik Hoel In this technical workshop This presentation is for anyone that uses ArcGIS and is interested in analyzing large amounts of data We will
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationAre You Ready for Big Data?
Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationSoftware tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team
Software tools for Complex Networks Analysis Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team MOTIVATION Why do we need tools? Source : nature.com Visualization Properties extraction
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More informationLet the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationUsing an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationApache Hama Design Document v0.6
Apache Hama Design Document v0.6 Introduction Hama Architecture BSPMaster GroomServer Zookeeper BSP Task Execution Job Submission Job and Task Scheduling Task Execution Lifecycle Synchronization Fault
More informationBig Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island
Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationSQLstream Blaze and Apache Storm A BENCHMARK COMPARISON
SQLstream Blaze and Apache Storm A BENCHMARK COMPARISON 2 The V of Big Data Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. Gartner The emergence
More informationBig Graph Processing: Some Background
Big Graph Processing: Some Background Bo Wu Colorado School of Mines Part of slides from: Paul Burkhardt (National Security Agency) and Carlos Guestrin (Washington University) Mines CSCI-580, Bo Wu Graphs
More informationBig Data Mining Services and Knowledge Discovery Applications on Clouds
Big Data Mining Services and Knowledge Discovery Applications on Clouds Domenico Talia DIMES, Università della Calabria & DtoK Lab Italy talia@dimes.unical.it Data Availability or Data Deluge? Some decades
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationMonitis Project Proposals for AUA. September 2014, Yerevan, Armenia
Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationBIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationSEAIP 2009 Presentation
SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationAn Oracle White Paper October 2011. Oracle: Big Data for the Enterprise
An Oracle White Paper October 2011 Oracle: Big Data for the Enterprise Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationROME, 17-10-2013 BIG DATA ANALYTICS
ROME, 17-10-2013 BIG DATA ANALYTICS BIG DATA FOUNDATIONS Big Data is #1 on the 2012 and the 2013 list of most ambiguous terms - Global language monitor 2 BIG DATA FOUNDATIONS Big Data refers to data sets
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationThe Potential of Big Data in the Cloud. Juan Madera Technology Consultant juan.madera.jimenez@accenture.com
The Potential of Big Data in the Cloud Juan Madera Technology Consultant juan.madera.jimenez@accenture.com Agenda How to apply Big Data & Analytics What is it? Definitions, Technology and Data Science
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationEnabling the SmartGrid through Cloud Computing
Enabling the SmartGrid through Cloud Computing April 2012 Creating Value, Delivering Results 2012 eglobaltech Incorporated. Tech, Inc. All rights reserved. 1 Overall Objective To deliver electricity from
More informationScalable Data Analysis in R. Lee E. Edlefsen Chief Scientist UserR! 2011
Scalable Data Analysis in R Lee E. Edlefsen Chief Scientist UserR! 2011 1 Introduction Our ability to collect and store data has rapidly been outpacing our ability to analyze it We need scalable data analysis
More informationEvaluating partitioning of big graphs
Evaluating partitioning of big graphs Fredrik Hallberg, Joakim Candefors, Micke Soderqvist fhallb@kth.se, candef@kth.se, mickeso@kth.se Royal Institute of Technology, Stockholm, Sweden Abstract. Distributed
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationBig Data. Value, use cases and architectures. Petar Torre Lead Architect Service Provider Group. Dubrovnik, Croatia, South East Europe 20-22 May, 2013
Dubrovnik, Croatia, South East Europe 20-22 May, 2013 Big Data Value, use cases and architectures Petar Torre Lead Architect Service Provider Group 2011 2013 Cisco and/or its affiliates. All rights reserved.
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationUsing Data Mining and Machine Learning in Retail
Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune
More informationBig Data Systems CS 5965/6965 FALL 2015
Big Data Systems CS 5965/6965 FALL 2015 Today General course overview Expectations from this course Q&A Introduction to Big Data Assignment #1 General Course Information Course Web Page http://www.cs.utah.edu/~hari/teaching/fall2015.html
More informationHow To Use Hadoop For Gis
2013 Esri International User Conference July 8 12, 2013 San Diego, California Technical Workshop Big Data: Using ArcGIS with Apache Hadoop David Kaiser Erik Hoel Offering 1330 Esri UC2013. Technical Workshop.
More informationMr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo
Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationNavigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationBig Fast Data Hadoop acceleration with Flash. June 2013
Big Fast Data Hadoop acceleration with Flash June 2013 Agenda The Big Data Problem What is Hadoop Hadoop and Flash The Nytro Solution Test Results The Big Data Problem Big Data Output Facebook Traditional
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationReal Time Big Data Processing
Real Time Big Data Processing Cloud Expo 2014 Ian Meyers Amazon Web Services Global Infrastructure Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationData Centric Systems (DCS)
Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationOracle Big Data Handbook
ORACLG Oracle Press Oracle Big Data Handbook Tom Plunkett Brian Macdonald Bruce Nelson Helen Sun Khader Mohiuddin Debra L. Harding David Segleau Gokula Mishra Mark F. Hornick Robert Stackowiak Keith Laker
More informationTutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More information