Streamdrill: Analyzing Big Data Streams in Realtime
|
|
- Phebe Franklin
- 8 years ago
- Views:
Transcription
1 Streamdrill: Analyzing Big Data Streams in Realtime Mikio L. th 6
2 Realtime Big Data: Sources Finance Gaming Monitoring Advertisment Sensor Networks Social Media Attribution: flickr users kenteegardin, fguillen, torkildr, Docklandsboy, brewbooks, ellbrown, JasonAHowie
3 Tasks by Complexity Complexity Counting and Averages (over time windows), Count Distinct Profiles and Histograms Trends Outliers and Fraud detection Prediction (churn, failure)
4 Fast reponses Tasks by Latency Reporting Visualization and Monitoring Optimizing, Personalization Control Really Realtime only if you can react in realtime!
5 What makes Data Big? Many Events 100 events / second 360k per hour 8.6M per day 260M per month 3.2B per year Many Objects
6 Current approach: Scaling Batch (MapReduce) Stream (Storm, Spark) Expensive to scale to realtime!
7 Scaling? Approximate! Scaling is nice, but: Scaling is expensive Data is noisy Not every data point is important Methods are noisy, too Exact numbers often not necessary
8 Scaling vs. Approximation Scaling Approximation need raw processing power to get fast approximate more to get fast may compute results you don't need focusses on data you are interested in practically requires a cluster setup already consumes whole stream with one node
9 Heavy Hitters (a.k.a. Top-k) Count activities over large item sets (millions, even more, e.g. IP addresses, Twitter users) Interested in most active elements only. Case 1: element already in data base frank 15 paul 12 jan 8 felix 5 leo 3 alex 2 paul Case 2: new element nico Fixed tables of counts alex 2 nico 3 Metwally, Agrawal, Abbadi, Efficient computation of Frequent and Top-k Elements in Data Streams, Internation Conference on Database Theory, 2005
10 Wait a minute? Only Counting? Well, getting the top most active items is already useful. Web analytics, Users, Trending Topics Counting is statistics!
11 Counting is Statistics Empirical mean: Correlations: Covariance matrix ( PCA):
12 More: Maximum-Likelihood Estimate probabilistic models based on which is slightly biased, but simpler
13 Outlier detection Once you have a model, you can compute p-values (based on recent time frames!)
14 Online TF-IDF
15 So much more to do with trends Least Recently Used Caches Sparse Vectors Sparse Matrices Conditional Probabilities (Histograms) Accumulators...
16 streamdrill Core Engine: Heavy Hitters counting + exponential decay Instant counts & top-k results over time windows Modules for specific use cases Features In-Memory, with snapshots to disk written in Scala Interface: Query by REST, push data by REST or UDP Single node performance: up to 20k events/s, about 1M objects per GB
17 Architecture Overview
18 streamdrill modules profiling streamdrill core recommendation fraud detection Ready made modules to cover core business applications.
19 Use Case: Realtime user profiles Objective Track user activity in different categories in realtime Event (user, category) Output Trends for (user, *)
20 Use Case: Realtime recommendations Objective Recommend items to users based on user interests, item popularity Event: (user, item, categories) Output: User profiles to find categories for item trends
21 Use Case: Realtime fraud and rate limiting Objective: Identify unusually active users/ips Unusually high co-occurence Event (id) or (id, device) Output trend for ids, or look at size of (id, *) or (*, device) above threshold
22 Summary Streamdrill: Big Data through approximation Counts are the basis for (nearyl) everything Try our demo: streamdrill.com
Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect
Matteo Migliavacca (mm53@kent) School of Computing Conjugating data mood and tenses: Simple past, infinite present, fast continuous, simpler imperative, conditional future perfect Simple past - Traditional
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationUnified Big Data Analytics Pipeline. 连 城 lian@databricks.com
Unified Big Data Analytics Pipeline 连 城 lian@databricks.com What is A fast and general engine for large-scale data processing An open source implementation of Resilient Distributed Datasets (RDD) Has an
More informationHadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationJournée Thématique Big Data 13/03/2015
Journée Thématique Big Data 13/03/2015 1 Agenda About Flaminem What Do We Want To Predict? What Is The Machine Learning Theory Behind It? How Does It Work In Practice? What Is Happening When Data Gets
More informationApache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com
Apache Spark : Fast and Easy Data Processing Sujee Maniyam Elephant Scale LLC sujee@elephantscale.com http://elephantscale.com Spark Fast & Expressive Cluster computing engine Compatible with Hadoop Came
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationBIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane
BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationMapReduce Online. Tyson Condie, Neil Conway, Peter Alvaro, Joseph Hellerstein, Khaled Elmeleegy, Russell Sears. Neeraj Ganapathy
MapReduce Online Tyson Condie, Neil Conway, Peter Alvaro, Joseph Hellerstein, Khaled Elmeleegy, Russell Sears Neeraj Ganapathy Outline Hadoop Architecture Pipelined MapReduce Online Aggregation Continuous
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationUsing Summingbird for aggregating eye tracking data to find patterns in images in a multi-user environment
Using Summingbird for aggregating eye tracking data to find patterns in images in a multi-user environment Johan Fogelström and Remzi Can Aksoy School of Computer Science and Communication (CSC), Royal
More informationReal Time Analytics for Big Data. NtiSh Nati Shalom @natishalom
Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationTalend Real-Time Big Data Sandbox. Big Data Insights Cookbook
Talend Real-Time Big Data Talend Real-Time Big Data Overview of Real-time Big Data Pre-requisites to run Setup & Talend License Talend Real-Time Big Data Big Data Setup & About this cookbook What is the
More informationQuantifind s story: Building custom interactive data analytics infrastructure
Quantifind s story: Building custom interactive data analytics infrastructure Ryan LeCompte @ryanlecompte Scala Days 2015 Background Software Engineer at Quantifind @ryanlecompte ryan@quantifind.com http://github.com/ryanlecompte
More informationHigh Performance Predictive Analytics in R and Hadoop:
High Performance Predictive Analytics in R and Hadoop: Achieving Big Data Big Analytics Presented by: Mario E. Inchiosa, Ph.D. US Chief Scientist August 27, 2013 1 Polling Questions 1 & 2 2 Agenda Revolution
More informationHere comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012
Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationHadoop2, Spark Big Data, real time, machine learning & use cases. Cédric Carbone Twitter : @carbone
Hadoop2, Spark Big Data, real time, machine learning & use cases Cédric Carbone Twitter : @carbone Agenda Map Reduce Hadoop v1 limits Hadoop v2 and YARN Apache Spark Streaming : Spark vs Storm Machine
More informationAn Open-Source Streaming Machine Learning and Real-Time Analytics Architecture
An Open-Source Streaming Machine Learning and Real-Time Analytics Architecture Using an IoT example (incubating) (incubating) Fred Melo @fredmelo_br 1 William Markito @william_markito Traditional Data
More informationLet the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
More informationInteractive Analytical Processing in Big Data Systems,BDGS: AMay Scalable 23, 2014 Big Data1 Generat / 20
Interactive Analytical Processing in Big Data Systems,BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking,Study about DataSet May 23, 2014 Interactive Analytical Processing in Big Data Systems,BDGS:
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationGraylog2 Lennart Koopmann, OSDC 2014. @_lennart / www.graylog2.org
Graylog2 Lennart Koopmann, OSDC 2014 @_lennart / www.graylog2.org About me 25 years old Living in Hamburg, Germany @_lennart on Twitter Co-Founder of TORCH - The Graylog2 company. Graylog2 history Started
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationAnalyzing Big Data with AWS
Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,
More informationDeveloping Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control
Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University
More informationEmbedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP
Operates more like a search engine than a database Scoring and ranking IP allows for fuzzy searching Best-result candidate sets returned Contextual analytics to correctly disambiguate entities Embedded
More informationCustomer Case Study. Automatic Labs
Customer Case Study Automatic Labs Customer Case Study Automatic Labs Benefits Validated product in days Completed complex queries in minutes Freed up 1 full-time data scientist Infrastructure savings
More informationHYPER-CONVERGED INFRASTRUCTURE STRATEGIES
1 HYPER-CONVERGED INFRASTRUCTURE STRATEGIES MYTH BUSTING & THE FUTURE OF WEB SCALE IT 2 ROADMAP INFORMATION DISCLAIMER EMC makes no representation and undertakes no obligations with regard to product planning
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationFast Data in the Era of Big Data: Twitter s Real-
Fast Data in the Era of Big Data: Twitter s Real- Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Presented by: Rania Ibrahim 1 AGENDA Motivation
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationAzure Machine Learning, SQL Data Mining and R
Azure Machine Learning, SQL Data Mining and R Day-by-day Agenda Prerequisites No formal prerequisites. Basic knowledge of SQL Server Data Tools, Excel and any analytical experience helps. Best of all:
More informationHow To Choose A Data Flow Pipeline From A Data Processing Platform
S N A P L O G I C T E C H N O L O G Y B R I E F SNAPLOGIC BIG DATA INTEGRATION PROCESSING PLATFORMS 2 W Fifth Avenue Fourth Floor, San Mateo CA, 94402 telephone: 888.494.1570 www.snaplogic.com Big Data
More informationThe Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
More informationBig Data Analysis: Apache Storm Perspective
Big Data Analysis: Apache Storm Perspective Muhammad Hussain Iqbal 1, Tariq Rahim Soomro 2 Faculty of Computing, SZABIST Dubai Abstract the boom in the technology has resulted in emergence of new concepts
More informationTime series IoT data ingestion into Cassandra using Kaa
Time series IoT data ingestion into Cassandra using Kaa Andrew Shvayka ashvayka@cybervisiontech.com Agenda Data ingestion challenges Why Kaa? Why Cassandra? Reference architecture overview Hands-on Sandbox
More informationCSE-E5430 Scalable Cloud Computing Lecture 11
CSE-E5430 Scalable Cloud Computing Lecture 11 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 30.11-2015 1/24 Distributed Coordination Systems Consensus
More informationFast Analytics on Big Data with H20
Fast Analytics on Big Data with H20 0xdata.com, h2o.ai Tomas Nykodym, Petr Maj Team About H2O and 0xdata H2O is a platform for distributed in memory predictive analytics and machine learning Pure Java,
More informationDriving Value From Big Data
Big Data Executive Forum Data Discovery, Modern Architecture & Visualization Driving Value From Big Data Bill Franks Chief Analytics Officer, Teradata It s Not So Much Big Data As it is different data.
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationBeyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
More informationBayesian networks - Time-series models - Apache Spark & Scala
Bayesian networks - Time-series models - Apache Spark & Scala Dr John Sandiford, CTO Bayes Server Data Science London Meetup - November 2014 1 Contents Introduction Bayesian networks Latent variables Anomaly
More informationBigData. An Overview of Several Approaches. David Mera 16/12/2013. Masaryk University Brno, Czech Republic
BigData An Overview of Several Approaches David Mera Masaryk University Brno, Czech Republic 16/12/2013 Table of Contents 1 Introduction 2 Terminology 3 Approaches focused on batch data processing MapReduce-Hadoop
More informationScaling Out With Apache Spark. DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf
Scaling Out With Apache Spark DTL Meeting 17-04-2015 Slides based on https://www.sics.se/~amir/files/download/dic/spark.pdf Your hosts Mathijs Kattenberg Technical consultant Jeroen Schot Technical consultant
More informationBuilding your Big Data Architecture on Amazon Web Services
Building your Big Data Architecture on Amazon Web Services Abhishek Sinha @abysinha sinhaar@amazon.com AWS Services Deployment & Administration Application Services Compute Storage Database Networking
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationTaming the Internet of Things: The Lord of the Things
Taming the Internet of Things: The Lord of the Things Kirk Borne @KirkDBorne School of Physics, Astronomy, & Computational Sciences College of Science, George Mason University, Fairfax, VA Taming the Internet
More informationIntelligent Business Operations and Big Data. 2014 Software AG. All rights reserved.
Intelligent Business Operations and Big Data 1 What is Big Data? Big data is a popular term used to acknowledge the exponential growth, availability and use of information in the data-rich landscape of
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationFast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture
Fast Data in the Era of Big Data: Tiwtter s Real-Time Related Query Suggestion Architecture Gilad Mishne, Jeff Dalton, Zhenghua Li, Aneesh Sharma, Jimmy Lin Adeniyi Abdul 2522715 Agenda Abstract Introduction
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationComplex Event Processing (CEP) Why and How. Richard Hallgren BUGS 2013-05-30
Complex Event Processing (CEP) Why and How Richard Hallgren BUGS 2013-05-30 Objectives Understand why and how CEP is important for modern business processes Concepts within a CEP solution Overview of StreamInsight
More informationLoad Testing at Yandex. Alexey Lavrenuke
Load Testing at Yandex Alexey Lavrenuke Load Testing at Yandex What is Yandex Yet another indexer Yet another indexer Yandex Yandex s mission is to help people discover new opportunities in their lives.
More informationThe Top 10 7 Hadoop Patterns and Anti-patterns. Alex Holmes @
The Top 10 7 Hadoop Patterns and Anti-patterns Alex Holmes @ whoami Alex Holmes Software engineer Working on distributed systems for many years Hadoop since 2008 @grep_alex grepalex.com what s hadoop...
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationMap-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationTechnology Strategies for Big Data Analytics Paul Bachteal Director, Americas Technology Practice
Technology Strategies for Big Data Analytics Paul Bachteal Director, Americas Technology Practice THRIVING IN THE BIG DATA ERA DATA SIZE VOLUME VARIETY VELOCITY VALUE TODAY THE FUTURE BIG DATA ANALYTICS
More informationOnline and Scalable Data Validation in Advanced Metering Infrastructures
Online and Scalable Data Validation in Advanced Metering Infrastructures Chalmers University of technology Agenda 1. Problem statement 2. Preliminaries Data Streaming 3. Streaming-based Data Validation
More informationProductionizing a 24/7 Spark Streaming Service on YARN
Productionizing a 24/7 Spark Streaming Service on YARN Issac Buenrostro, Arup Malakar Spark Summit 2014 July 1, 2014 About Ooyala Cross-device video analytics and monetization products and services Founded
More informationDynamic M2M Event Processing Complex Event Processing and OSGi on Java Embedded
Dynamic M2M Event Processing Complex Event Processing and OSGi on Java Embedded Oleg Kostukovsky - Master Principal Sales Consultant Walt Bowers - Hitachi CTA Chief Architect 1 2 1. The Vs of Big Data
More informationBigMemory and Hadoop: Powering the Real-time Intelligent Enterprise
WHITE PAPER and Hadoop: Powering the Real-time Intelligent Enterprise BIGMEMORY: IN-MEMORY DATA MANAGEMENT FOR THE REAL-TIME ENTERPRISE Terracotta is the solution of choice for enterprises seeking the
More informationPowerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches
Powerful Duo: MapR Big Data Analytics with Cisco ACI Network Switches Introduction For companies that want to quickly gain insights into or opportunities from big data - the dramatic volume growth in corporate
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationNCAS National Caller ID Authentication System
NCAS National Caller ID Authentication System The National Telecom Security Border Controller OmniBud INC. 2003 2013 Dedicate to Internet Communication and Access Security NCAS Registration Module PSTN
More informationBig-data Analytics: Challenges and Opportunities
Big-data Analytics: Challenges and Opportunities Chih-Jen Lin Department of Computer Science National Taiwan University Talk at 台 灣 資 料 科 學 愛 好 者 年 會, August 30, 2014 Chih-Jen Lin (National Taiwan Univ.)
More informationTeaching Scheme Credits Assigned Course Code Course Hrs./Week. BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics. Theory Marks
Teaching Scheme Credits Assigned Course Code Course Hrs./Week Name Theory Practical Tutorial Theory Practical/Oral Tutorial Tota l BEITC802 Big Data 04 02 --- 04 01 --- 05 Analytics Examination Scheme
More informationPractical Data Science @ Etsy. Dr. Jason Davis
Practical Data Science @ Etsy Dr. Jason Davis About me Ph.D. Machine learning & data mining Entrepreneur. Founder of ad startup Adtuitive Engineering Director @ Etsy. Search & Data Topics Etsy's data science
More informationRackscale- the things that matter GUSTAVO ALONSO SYSTEMS GROUP DEPT. OF COMPUTER SCIENCE ETH ZURICH
Rackscale- the things that matter GUSTAVO ALONSO SYSTEMS GROUP DEPT. OF COMPUTER SCIENCE ETH ZURICH HTDC 2014 Systems Group = www.systems.ethz.ch Enterprise Computing Center = www.ecc.ethz.ch On the way
More informationPractical Data Science with Azure Machine Learning, SQL Data Mining, and R
Practical Data Science with Azure Machine Learning, SQL Data Mining, and R Overview This 4-day class is the first of the two data science courses taught by Rafal Lukawiecki. Some of the topics will be
More informationBig Data at Spotify. Anders Arpteg, Ph D Analytics Machine Learning, Spotify
Big Data at Spotify Anders Arpteg, Ph D Analytics Machine Learning, Spotify Quickly about me Quickly about Spotify What is all the data used for? Quickly about Spark Hadoop MR vs Spark Need for (distributed)
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationBig Data in Enterprise challenges & opportunities. Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group
Big Data in Enterprise challenges & opportunities Yuanhao Sun 孙 元 浩 yuanhao.sun@intel.com Software and Service Group Big Data Phenomenon 1.8ZB in 2011 2 Days > the dawn of civilization to 2003 750M Photos
More informationOnline Content Optimization Using Hadoop. Jyoti Ahuja Dec 20 2011
Online Content Optimization Using Hadoop Jyoti Ahuja Dec 20 2011 What do we do? Deliver right CONTENT to the right USER at the right TIME o Effectively and pro-actively learn from user interactions with
More informationBIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
More informationSPARK USE CASE IN TELCO. Apache Spark Night 9-2-2014! Chance Coble!
SPARK USE CASE IN TELCO Apache Spark Night 9-2-2014! Chance Coble! Use Case Profile Telecommunications company Shared business problems/pain Scalable analytics infrastructure is a problem Pushing infrastructure
More informationConverged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities
Technology Insight Paper Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities By John Webster February 2015 Enabling you to make the best technology decisions Enabling
More informationComprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationCIS492 Special Topics: Cloud Computing د. منذر الطزاونة
CIS492 Special Topics: Cloud Computing د. منذر الطزاونة Big Data Definition No single standard definition Big Data is data whose scale, diversity, and complexity require new architecture, techniques, algorithms,
More informationBigtable is a proven design Underpins 100+ Google services:
Mastering Massive Data Volumes with Hypertable Doug Judd Talk Outline Overview Architecture Performance Evaluation Case Studies Hypertable Overview Massively Scalable Database Modeled after Google s Bigtable
More informationSpark and the Big Data Library
Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationPLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP
PLATFORA INTERACTIVE, IN-MEMORY BUSINESS INTELLIGENCE FOR HADOOP Your business is swimming in data, and your business analysts want to use it to answer the questions of today and tomorrow. YOU LOOK TO
More informationStreaming Analytics A Framework for Innovation
Streaming Analytics A Framework for Innovation Jan Humble Solutions Architect 1 Volume and Scale of Sensing Data Can you TURN IT ON? Can you Identify Insights in REAL-TIME? Can you REACT and ENGAGE in
More informationThe Big Data Paradigm Shift. Insight Through Automation
The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.
More informationHybrid Software Architectures for Big Data. Laurence.Hubert@hurence.com @hurence http://www.hurence.com
Hybrid Software Architectures for Big Data Laurence.Hubert@hurence.com @hurence http://www.hurence.com Headquarters : Grenoble Pure player Expert level consulting Training R&D Big Data X-data hot-line
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationIntroduction to Apache Kafka And Real-Time ETL. for Oracle DBAs and Data Analysts
Introduction to Apache Kafka And Real-Time ETL for Oracle DBAs and Data Analysts 1 About Myself Gwen Shapira System Architect @Confluent Committer @ Apache Kafka, Apache Sqoop Author of Hadoop Application
More informationTalend Big Data. Delivering instant value from all your data. Talend 2014 1
Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,
More informationLambda Architecture. CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014
Lambda Architecture CSCI 5828: Foundations of Software Engineering Lecture 29 12/09/2014 1 Goals Cover the material in Chapter 8 of the Concurrency Textbook The Lambda Architecture Batch Layer MapReduce
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More information