HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING. Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara
|
|
- Daniela Hancock
- 7 years ago
- Views:
Transcription
1 DRIVING INNOVATION THROUGH DATA HADOOP IN ENTERPRISE FUTURE-PROOF YOUR BIG DATA INVESTMENTS WITH CASCADING Supreet Oberoi Nov. 4-6, 2014 Big Data Expo Santa Clara
2 ABOUT ME I am a Data Engineer, not a Data Scientist I help Enterprises develop decisions on building their Big Data roadmap and technical strategy use cases, products, technology decisions, employee skills I design Hadoop applications with the intent to operationalize them in Enterprise settings applications on which business depend, and last longer than the technologies underneath them This talk is about learning how to design your BD strategy that leverages best 2
3 BUILDING AN OPEN PLATFORM IS KEY TO PREVENTING LOCK-IN Open Language Open Data Open Hardware Open Compute Platform Open Development Platform 3
4 OPEN LANGUAGES ALLOW YOU TO HARNESS THE TALENT OF YOUR ENTERPRISE Don t equate architecture with language; develop architecture to support multiple Support SQL and SQL-like languages Encourage development in proven & scalable languages as Java Develop architecture to support change of programming languages (even for same app) Have common performance-management tools across all programming environments 4
5 OPEN DATA ENABLES REUSE OF DATA AND APPS Develop a common operating picture by promoting reuse with open data Prevent exclusive access to data sets through proprietary tools Promote a common meta-data repository Forbid storing data in proprietary formats Build seamless integration capabilities 5
6 OPEN HARDWARE PROMOTES REUSE OF INFRASTRUCTURE Get commodity hardware commodity hardware will always cost less than optimized specialized hardware (note: definition of specialized is up for debate) Develop and maintain a cluster that can be reused by different applications and technology stacks avoid custom software installations on the cluster, or setting up dedicated clusters for given tech stacks Harness the power of collective from the cluster avoid fragmenting the cluster if possible 6
7 OPEN COMPUTE PLATFORM MAKES YOU SELECT THE RIGHT TOOL FOR THE PROBLEM Make tradeoffs between reliability & speed based on your business context Ensure that moving your application from one Hadoop compute platform (e.g. MapReduce) to another (e.g., Tez) does not: impact application code impact production-monitoring tools Resist compute platforms that require your enterprise to acquire significantly new skills (even if it is easy) to become productive Avoid new platforms that partition the cluster Avoid platforms that do not support Open Data 7
8 OPEN DEVELOPMENT PLATFORM PROVIDES LONG-TERM SUSTAINABILITY Development platforms improve developer productivity and operational excellence picking a correct platform gives you best practices developed by the community, achieving higher quality Invest in picking the correct development platform open, easy, scalable, popular, tools, Bet on a sustainable open source platform Measure the vitality of the community: number of downloads, extensions (living ecosystem), extensible architecture, consumers of the technology, code stability A proven platform provides tools to get your apps to production 8
9 GET TO KNOW CONCURRENT Leader in Application Infrastructure for Big Data Building enterprise software to simplify Big Data application development and management Products and Technology CASCADING Open Source - The most widely used application infrastructure for Founded: 2008 HQ: San Francisco, CA CEO: Gary Nakamura CTO, Founder: Chris Wensel building Big Data apps with over 175,000 downloads each month DRIVEN Enterprise data application management for Big Data apps Proven Simple, Reliable, Robust Thousands of enterprises rely on Concurrent to provide their data application infrastructure. 9
10 CASCADING - DE-FACTO STANDARD FOR DATA APPS Cascading Apps Standard for enterprise SQL Clojure Ruby data app development Your programming language of choice Supported Fabrics and Data Stores New Fabrics Cascading applications that run on MapReduce Mainframe DB / DW In-Memory Data Stores Hadoop Tez Storm will also run on Apache Spark, Storm, and 10
11 DEMO: WORD COUNT EXAMPLE WITH CASCADING String docpath = args[ 0 ]; String wcpath = args[ 1 ]; Properties properties = new Properties(); AppProps.setApplicationJarClass( properties, Main.class ); HadoopFlowConnector flowconnector = new HadoopFlowConnector( properties ); // create source and sink taps Tap doctap = new Hfs( new TextDelimited( true, "\t" ), docpath ); Tap wctap = new Hfs( new TextDelimited( true, "\t" ), wcpath ); // specify a regex to split "document" text lines into token stream Fields token = new Fields( "token" ); Fields text = new Fields( "text" ); RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ \\[\\]\\(\\),.]" ); // only returns "token" Pipe docpipe = new Each( "token", text, splitter, Fields.RESULTS ); // determine the word counts Pipe wcpipe = new Pipe( "wc", docpipe ); wcpipe = new GroupBy( wcpipe, token ); wcpipe = new Every( wcpipe, Fields.ALL, new Count(), Fields.ALL ); // connect the taps, pipes, etc., into a flow definition FlowDef flowdef = FlowDef.flowDef().setName( "wc" ).addsource( docpipe, doctap ).addtailsink( wcpipe, wctap ); // create the Flow Flow wcflow = flowconnector.connect( flowdef ); // <<-- Unit of Work wcflow.complete(); // <<-- Runs jobs on Cluster configuration integration processing scheduling 11
12 THE STANDARD FOR DATA APPLICATION DEVELOPMENT Application platform that addresses: Build data apps that are scale-free" Systems Integration" Application Portability" Design principals ensure best practices at any scale Hadoop never lives alone. Easily integrate to existing systems Write once, then run on different computation fabrics Staffing Bottleneck" Test-Driven Development" Operational Complexity" Use existing Java, SQL, modeling skill sets Efficiently test code and process local files before deploying on a cluster Simple - Package up into one jar and hand to operations Proven application development framework for building data apps 12
13 STRONG ORGANIC GROWTH 175,000+ downloads / month" Deployments" 13
14 BUSINESSES DEPEND ON US Cascading Java API Data normalization and cleansing of search and click-through logs for use by analytics tools, Hive analysts Easy to operationalize heavy lifting of data in one framework 14
15 BUSINESSES DEPEND ON US Cascalog (Clojure) Weather pattern modeling to protect growers against loss ETL against 20+ datasets daily Machine learning to create models Purchased by Monsanto for $930M US 15
16 BUSINESSES DEPEND ON US Scalding (Scala) TWITTER Makes complex analysis of very large data sets simple Machine learning, linear algebra to improve User experience Ad quality (matching users and ad effectiveness) All revenue applications are running on Cascading/Scalding 16
17 CASCADING DATA APPLICATIONS Enterprise IT" Extract Transform Load Log File Analysis Systems Integration Operations Analysis Corporate Apps" HR Analytics Employee Behavioral Analysis Customer Support ecrm Business Reporting Telecom" Data processing of Open Data Geospatial Indexing Consumer Mobile Apps Location based services Marketing / Retail" Mobile, Social, Search Analytics Funnel Analysis Revenue Attribution Customer Experiments Ad Optimization Retail Recommenders Consumer / Entertainment" Music Recommendation Comparison Shopping Restaurant Rankings Real Estate Rental Listings Travel Search & Forecast Finance" Fraud and Anomaly Detection Fraud Experiments Customer Analytics Insurance Risk Metric Health / Biotech" Aggregate Metrics For Govt Person Biometrics Veterinary Diagnostics Next-Gen Genomics Argonomics Environmental Maps 17
18 BROAD SUPPORT Hadoop ecosystem supports Cascading 18
19 CASCADING DEPLOYMENTS 19 Confidential
20 AND INCLUDES RICH SET OF EXTENSIONS 20
21 CASCADING CURRENTLY WIP Write once and deploy on your fabric of choice. The Innovation Cascading 3.0 will allow for data apps to execute on existing and emerging fabrics through its new customizable query Enterprise Data Applications planner. Local In-Memory MapReduce Apache Tez, Storm, Cascading 3.0 will support Local Computation Fabrics In-Memory, Apache MapReduce and soon thereafter (3.1) Apache Tez, Apache Spark and Apache Storm 21
22 USE LINGUAL TO MIGRATE ITERATIVE ETL TASKS Lingual is an extension to Cascading that executes ANSI SQL queries as Cascading apps CLI / Shell Enterprise Java Supports integrating with any data source that can be accessed through JDBC Cascading Tap can be created for any source supporting JDBC Lingual Provider API Catalog JDBC API Lingual API Query Planner Cascading Great for migration of data, integrating with non- Big Data assets extends life of existing IT assets in an organization Apache Hadoop Data Stores 22
23 SCALDING Scalding is a language binding to Cascading for Scala The name Scalding comes from the combining of SCALa and cascading Scalding is great for Scala developers; can crisply write constructs for matrix math Scalding has very large commercial deployments at: Twitter - Use cases such as the revenue quality team, ad targeting and traffic quality Ebay - Use cases include search analytics and other production data pipelines 23
24 PATTERN SCORES MODELS AT SCALE Pattern is an open source project that allows to leverage Predictive Model Markup Language (PMML) models and translate them into Cascading apps. PMML is an XML-based popular analytics framework that allows applications to describe data mining and machine learning algorithms PMML models from popular analytics frameworks can be reused and deployed within Cascading workflows" Vendor frameworks - SAS, IBM SPSS, MicroStrategy, Oracle Open source frameworks - R, Weka, KNIME, RapidMiner Pattern is great for migrating your model scoring to Hadoop from your decision systems 24
25 PATTERN SCORES MODELS AT SCALE Step 1: Train your model with industry-leading Tools Step 2: Score your models at scale with Pattern 25 Confidential
26 PATTERN DEMO: FROM TRAINING TO SCORING 26
27 WHY PATTERN Standards compliance provides integration with many tools Models are independent of data and integration Only debugging Cascading, not an ensemble of applications 27
28 PATTERN: ALGOS IMPLEMENTED Hierarchical Clustering K-Means Clustering Linear Regression Logistic Regression Random Forest algorithms extended based on customer use cases 28 Confidential
29 OPERATIONAL EXCELLENCE Visibility Through All Stages of App Lifecycle From Development Building and Testing" Design & Development Debugging Tuning To Production Monitoring and Tracking" Maintain Business SLAs Balance & Controls Application and Data Quality Operational Health Real-time Insights 29
30 DRIVEN ARCHITECTURE
31 CONTACT INFORMATION Supreet Oberoi"
32 DRIVING INNOVATION THROUGH DATA THANK YOU Supreet Oberoi
DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING
DRIVING INNOVATION THROUGH DATA ACCELERATING BIG DATA APPLICATION DEVELOPMENT WITH CASCADING Supreet Oberoi VP Field Engineering, Concurrent Inc GET TO KNOW CONCURRENT Leader in Application Infrastructure
More informationCascading Pattern - How to quickly migrate Predictive Models (PMML) from SAS, R, Micro Strategies etc., onto Hadoop and deploy them at scale
Cascading Pattern - How to quickly migrate Predictive Models (PMML) from SAS, R, Micro Strategies etc., onto Hadoop and deploy them at scale V1.0 September 12, 2013 Introduction Summary Cascading Pattern
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationPattern an open source project for migrating predictive models from SAS, etc., onto Hadoop. Paco Nathan Concurrent, Inc. San Francisco, CA @pacoid
Pattern an open source project for migrating predictive models from SAS, etc., onto Hadoop Paco Nathan Concurrent, Inc. San Francisco, CA @pacoid employee Join Count leads PMML classifier quarterly sales
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationHadoop s Advantages for! Machine! Learning and. Predictive! Analytics. Webinar will begin shortly. Presented by Hortonworks & Zementis
Webinar will begin shortly Hadoop s Advantages for Machine Learning and Predictive Analytics Presented by Hortonworks & Zementis September 10, 2014 Copyright 2014 Zementis, Inc. All rights reserved. 2
More informationBIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationMaking big data simple with Databricks
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
More information@Scalding. https://github.com/twitter/scalding. Based on talk by Oscar Boykin / Twitter
@Scalding https://github.com/twitter/scalding Based on talk by Oscar Boykin / Twitter What is Scalding? Why Scala for Map/Reduce? How is it used at Twitter? What s next for Scalding? Yep, we re counting
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationHadoop & Spark Using Amazon EMR
Hadoop & Spark Using Amazon EMR Michael Hanisch, AWS Solutions Architecture 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Agenda Why did we build Amazon EMR? What is Amazon EMR?
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationHow to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
More informationBig Data Analytics with Spark and Oscar BAO. Tamas Jambor, Lead Data Scientist at Massive Analytic
Big Data Analytics with Spark and Oscar BAO Tamas Jambor, Lead Data Scientist at Massive Analytic About me Building a scalable Machine Learning platform at MA Worked in Big Data and Data Science in the
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationAutomated Data Ingestion. Bernhard Disselhoff Enterprise Sales Engineer
Automated Data Ingestion Bernhard Disselhoff Enterprise Sales Engineer Agenda Pentaho Overview Templated dynamic ETL workflows Pentaho Data Integration (PDI) Use Cases Pentaho Overview Overview What we
More informationApril 2016 JPoint Moscow, Russia. How to Apply Big Data Analytics and Machine Learning to Real Time Processing. Kai Wähner. kwaehner@tibco.
April 2016 JPoint Moscow, Russia How to Apply Big Data Analytics and Machine Learning to Real Time Processing Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de LinkedIn / Xing Please connect!
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationBig Data for the JVM developer. Costin Leau, Elasticsearch @costinl
Big Data for the JVM developer Costin Leau, Elasticsearch @costinl Agenda Data Trends Data Pipelines JVM and Big Data Tool Eco-system Data Landscape Data Trends http://www.emc.com/leadership/programs/digital-universe.htm
More informationSEIZE THE DATA. 2015 SEIZE THE DATA. 2015
1 Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. BIG DATA CONFERENCE 2015 Boston August 10-13 Predicting and reducing deforestation
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationSpark in Action. Fast Big Data Analytics using Scala. Matei Zaharia. www.spark- project.org. University of California, Berkeley UC BERKELEY
Spark in Action Fast Big Data Analytics using Scala Matei Zaharia University of California, Berkeley www.spark- project.org UC BERKELEY My Background Grad student in the AMP Lab at UC Berkeley» 50- person
More informationTrends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum Siva Ravada Senior Director of Development Oracle Spatial and MapViewer 2 Evolving Technology Platforms
More informationIn-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet
In-memory data pipeline and warehouse at scale using Spark, Spark SQL, Tachyon and Parquet Ema Iancuta iorhian@gmail.com Radu Chilom radu.chilom@gmail.com Buzzwords Berlin - 2015 Big data analytics / machine
More informationExtend your analytic capabilities with SAP Predictive Analysis
September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics
More informationOracle Big Data Spatial & Graph Social Network Analysis - Case Study
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationThe Flink Big Data Analytics Platform. Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org
The Flink Big Data Analytics Platform Marton Balassi, Gyula Fora" {mbalassi, gyfora}@apache.org What is Apache Flink? Open Source Started in 2009 by the Berlin-based database research groups In the Apache
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationApache Flink Next-gen data analysis. Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas
Apache Flink Next-gen data analysis Kostas Tzoumas ktzoumas@apache.org @kostas_tzoumas What is Flink Project undergoing incubation in the Apache Software Foundation Originating from the Stratosphere research
More informationhttp://glennengstrand.info/analytics/fp
Functional Programming and Big Data by Glenn Engstrand (September 2014) http://glennengstrand.info/analytics/fp What is Functional Programming? It is a style of programming that emphasizes immutable state,
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationKnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationlocuz.com Big Data Services
locuz.com Big Data Services Big Data At Locuz, we help the enterprise move from being a data-limited to a data-driven one, thereby enabling smarter, faster decisions that result in better business outcome.
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationMicroStrategy Course Catalog
MicroStrategy Course Catalog 1 microstrategy.com/education 3 MicroStrategy course matrix 4 MicroStrategy 9 8 MicroStrategy 10 table of contents MicroStrategy course matrix MICROSTRATEGY 9 MICROSTRATEGY
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationData Mining. Dr. Saed Sayad. University of Toronto 2010 saed.sayad@utoronto.ca. http://chem-eng.utoronto.ca/~datamining/
Data Mining Dr. Saed Sayad University of Toronto 2010 saed.sayad@utoronto.ca http://chem-eng.utoronto.ca/~datamining/ 1 Data Mining Data mining is about explaining the past and predicting the future by
More informationSpark. Fast, Interactive, Language- Integrated Cluster Computing
Spark Fast, Interactive, Language- Integrated Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica UC
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationHadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015
Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib
More informationDATAMEER WHITE PAPER. Beyond BI. Big Data Analytic Use Cases
DATAMEER WHITE PAPER Beyond BI Big Data Analytic Use Cases This white paper discusses the types and characteristics of big data analytics use cases, how they differ from traditional business intelligence
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationMoving Faster: Why Intent Media Chose Cascalog for Data Processing and Machine Learning. Kurt Schrader May 20, 2014
Moving Faster: Why Intent Media Chose Cascalog for Data Processing and Machine Learning Kurt Schrader May 20, 2014 Overview History of data processing at Intent Media Hello World in various data processing
More information3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS
. 3 Reasons Enterprises Struggle with Storm & Spark Streaming and Adopt DataTorrent RTS Deliver fast actionable business insights for data scientists, rapid application creation for developers and enterprise-grade
More informationBuilding Your Big Data Team
Building Your Big Data Team With all the buzz around Big Data, many companies have decided they need some sort of Big Data initiative in place to stay current with modern data management requirements.
More informationAccelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationCreating Big Data Applications with Spring XD
Creating Big Data Applications with Spring XD Thomas Darimont @thomasdarimont THE FASTEST PATH TO NEW BUSINESS VALUE Journey Introduction Concepts Applications Outlook 3 Unless otherwise indicated, these
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationBig Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
More informationMySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering
MySQL and Hadoop: Big Data Integration Shubhangi Garg & Neha Kumari MySQL Engineering 1Copyright 2013, Oracle and/or its affiliates. All rights reserved. Agenda Design rationale Implementation Installation
More informationA very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
More informationHadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time?
Hadoop and Data Warehouse Friends, Enemies or Profiteers? What about Real Time? Kai Wähner kwaehner@tibco.com @KaiWaehner www.kai-waehner.de Disclaimer! These opinions are my own and do not necessarily
More informationThe Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS?
Conclusions Paper The Use of Open Source Is Growing. So Why Do Organizations Still Turn to SAS? Insights from a presentation at the 2014 Hadoop Summit Featuring Brian Garrett, Principal Solutions Architect
More informationSAP Solution Brief SAP HANA. Transform Your Future with Better Business Insight Using Predictive Analytics
SAP Brief SAP HANA Objectives Transform Your Future with Better Business Insight Using Predictive Analytics Dealing with the new reality Dealing with the new reality Organizations like yours can identify
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationThis Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
More informationMake Better Decisions Through Predictive Intelligence
IBM SPSS Modeler Professional Make Better Decisions Through Predictive Intelligence Highlights Easily access, prepare and model structured data with this intuitive, visual data mining workbench Rapidly
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationUsing Data Mining and Machine Learning in Retail
Using Data Mining and Machine Learning in Retail Omeid Seide Senior Manager, Big Data Solutions Sears Holdings Bharat Prasad Big Data Solution Architect Sears Holdings Over a Century of Innovation A Fortune
More informationHow To Manage Marketing With A Cloud Based Software
MARKETING ANALYTICS AS A SERVICE Retail Marketing Analytics APRIL 2012 Powered by: 1 Who we are Company Overview Experienced team with a proven history of solving difficult analytics problems for Fortune
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More information#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationAtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationA Scalable Data Transformation Framework using the Hadoop Ecosystem
A Scalable Data Transformation Framework using the Hadoop Ecosystem Raj Nair Director Data Platform Kiru Pakkirisamy CTO AGENDA About Penton and Serendio Inc Data Processing at Penton PoC Use Case Functional
More informationLambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015
Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document
More informationData Mining + Business Intelligence. Integration, Design and Implementation
Data Mining + Business Intelligence Integration, Design and Implementation ABOUT ME Vijay Kotu Data, Business, Technology, Statistics BUSINESS INTELLIGENCE - Result Making data accessible Wider distribution
More informationApache Kylin Introduction Dec 8, 2014 @ApacheKylin
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance
More informationData Challenges in Telecommunications Networks and a Big Data Solution
Data Challenges in Telecommunications Networks and a Big Data Solution Abstract The telecom networks generate multitudes and large sets of data related to networks, applications, users, network operations
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationHarnessing Big Data with KNIME
Harnessing Big Data with KNIME Tobias Kötter KNIME.com Agenda The three V s of Big Data Big Data Extension and Databases Nodes Demo 2 Variety, Volume, Velocity Variety: integrating heterogeneous data (and
More informationBig Data must become a first class citizen in the enterprise
Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More information