Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
|
|
- Brice Williams
- 8 years ago
- Views:
Transcription
1 Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
2 Self Service at scale
3 ? Relational? MPP? Hadoop?
4 Linkedin data 350M Members 25B 3.5M 4.8B 2M Quarterly page views Active company profiles Endorsements Jobs
5 Translate data into insights Analytics Infrastructure Business Insights Member Insights
6 The Good Old Days
7 Data ft
8 Scale Challenges 1. Human intervention 2. Long latencies to obtain insights from data. 3. Complexity of integration with increasing data sources.
9 What does it take to build a self-service, real-time, democratic analytics platform?
10 Analytics Infra Self Serve Applications [reporting, lineage, perf tuning etc] (WhereHows, Dr. Elephant, ) Core Data Warehouse [Views, Metrics, Dimensions, Datasets, Core flows] Data Management Systems [ingest, export, access, workflows] (Gobblin, ) Storage and Compute (Hadoop, Pinot, Cubert)
11 Storage and Compute Platforms Y A R N Pig Hive Cubert Scalding Map-Reduce Spark Tez HDFS Hadoop Pinot
12 LinkedIn Online Data Serving Deployment x Clusters (~x000 nodes) xx+ PB of data xxx k jobs / week xm compute hrs / month Ingest ETL ETL R & D Export PROD PROD
13 Supporting > 1000 Hadoop users Development process do code, [review], deploy, while (! good); Hadoop is complex: lots of knobs, tuning helps Performance symptoms not easily identifiable: scattered evidence Performance implications of changes
14 Dr. Elephant: diagnosis
15 What about real-time analytics?
16 Slow Queries
17 Solution Avoid joins at query time when possible. Denormalize data in Hadoop and load into a fast engine for slice-n-dice.
18 Real-time analytics A challenge for Hadoop Slice and dice billions of records, hundreds of dimensions End to end freshness of minutes not hours Sub-second query response times e.g. Which are top regions that contribute to my profile views? Which industries in those regions?
19 Pinot for realtime analytics g Distributed, fault-tolerant Compressed Columnar indexes Data ingestion from Kafka and Hadoop No joins, yet.
20 Who viewed my profile
21 Pinot: Data Flow Profile Member-facing Who Viewed My Profile ProfileViewEvent Internal Kafka minutes Pinot Profile Analytics Dashboard Hadoop hours / days segment building
22 Pig and Hive are great but... Operate on individual records Re-compute scheduled batch ETL jobs with full scans. Can do better by reorganization and processing data in blocks
23 Cubert: Accelerating Batch computation Pig/Hive Cubert 40 hours 35 hours 30 hours 25 hours 20 hours 15 hours 10 hours 5 hours 0 hours XLNT (Statistical) SPI (Graph) Plato (OLAP Cube)
24 Cubert Internals Organizes data in blocks Blocks created and transformed with operators Cubert provides a scripting language and a runtime to execute the operators in Map-Reduce operations.
25 Technology Stack Self Serve Applications [reporting, lineage, perf tuning etc] (WhereHows, Dr. Elephant, ) Core Data Warehouse [Views, Metrics, Dimensions, Datasets, Core flows] Data Management Systems [ingest, export, access, workflows] (Gobblin, ) Storage and Compute (Hadoop, Pinot, Cubert)
26 Perception
27 Reality
28 Unifying Ingress into Hadoop
29 Ingest operator chain
30 Gobblin: roadmap Open source in 2014 Current work Continuous and batch ingest Data profiling, summarization Flexible deployment Resource utilization and sharing
31 Workflow Management Workflow Mgmt Apps Scheduling Backend Azkaban EasyDat a Oozie
32 Technology Stack Self Serve Applications [reporting, lineage, perf tuning etc] (WhereHows, Dr. Elephant, ) Core Data Warehouse [Views, Metrics, Dimensions, Datasets, Core flows] Data Management Systems [ingest, export, access, workflows] (Gobblin, ) Storage and Compute (Hadoop, Pinot, Cubert)
33 WhereHows Data Exploration Discover datasets Spread across storage systems (HDFS, TD, Kafka ) Murky semantics for data and columns Lineage to traverse relationships Discover processes Spread across process execution engines (Azkaban, Adhoc, Appworx, EasyData) See code and logic Correlate data and processes
34 WhereHows
35 Lineage in action
36 Reporting and Visualization 1. Dashboards
37 Reporting and Visualization 1. Dashboards 2. Curated Exploration
38 Reporting and Visualization 1. Dashboards 2. Curated Exploration 3. Ad-hoc
39 Summary Reporting: dashboards, curated exploration, ad-hoc WhereHows: explore data, lineage Workflow Mgmt Gobblin*: data ingest Azkaban EasyDat a Oozie Dr. Elephant* for tuning Hadoop Cubert* for batch M/R Hadoop storage & compute Pinot* for real-time querying Y A R N Pinot Pig Hive Cubert Scalding Map-Reduce Spark Tez HDFS Hadoop
40 Thanks! Greg Arnold, Sr. Director Engineering
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationUpcoming Announcements
Enterprise Hadoop Enterprise Hadoop Jeff Markham Technical Director, APAC jmarkham@hortonworks.com Page 1 Upcoming Announcements April 2 Hortonworks Platform 2.1 A continued focus on innovation within
More informationBeyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
More informationThe Big Data Ecosystem at LinkedIn. Presented by Zhongfang Zhuang
The Big Data Ecosystem at LinkedIn Presented by Zhongfang Zhuang Based on the paper The Big Data Ecosystem at LinkedIn, written by Roshan Sumbaly, Jay Kreps, and Sam Shah. The Ecosystems Hadoop Ecosystem
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationSOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera
SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP Eva Andreasson Cloudera Most FAQ: Super-Quick Overview! The Apache Hadoop Ecosystem a Zoo! Oozie ZooKeeper Hue Impala Solr Hive Pig Mahout HBase MapReduce
More informationAnalytics on Spark & Shark @Yahoo
Analytics on Spark & Shark @Yahoo PRESENTED BY Tim Tully December 3, 2013 Overview Legacy / Current Hadoop Architecture Reflection / Pain Points Why the movement towards Spark / Shark New Hybrid Environment
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationAn Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov
An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
More informationBringing Big Data to People
Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process
More informationUsing distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationApache Hadoop: Past, Present, and Future
The 4 th China Cloud Computing Conference May 25 th, 2012. Apache Hadoop: Past, Present, and Future Dr. Amr Awadallah Founder, Chief Technical Officer aaa@cloudera.com, twitter: @awadallah Hadoop Past
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationBig Data Approaches. Making Sense of Big Data. Ian Crosland. Jan 2016
Big Data Approaches Making Sense of Big Data Ian Crosland Jan 2016 Accelerate Big Data ROI Even firms that are investing in Big Data are still struggling to get the most from it. Make Big Data Accessible
More informationBIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
More informationHow Companies are! Using Spark
How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationDatabricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationApache Kylin Introduction Dec 8, 2014 @ApacheKylin
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance
More informationData Governance in the Hadoop Data Lake. Kiran Kamreddy May 2015
Data Governance in the Hadoop Data Lake Kiran Kamreddy May 2015 One Data Lake: Many Definitions A centralized repository of raw data into which many data-producing streams flow and from which downstream
More informationBIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES
BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data
More informationBig Data Infrastructure at Spotify
Big Data Infrastructure at Spotify Wouter de Bie Team Lead Data Infrastructure June 12, 2013 2 Agenda Let s talk about Data Infrastructure, how we did it, what we learned and how we ve failed Some Context
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationDashboard Engine for Hadoop
Matt McDevitt Sr. Project Manager Pavan Challa Sr. Data Engineer June 2015 Dashboard Engine for Hadoop Think Big Start Smart Scale Fast Agenda Think Big Overview Engagement Model Solution Offerings Dashboard
More informationNext-Generation Cloud Analytics with Amazon Redshift
Next-Generation Cloud Analytics with Amazon Redshift What s inside Introduction Why Amazon Redshift is Great for Analytics Cloud Data Warehousing Strategies for Relational Databases Analyzing Fast, Transactional
More informationMoving From Hadoop to Spark
+ Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee
More informationData Lake In Action: Real-time, Closed Looped Analytics On Hadoop
1 Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop 2 Pivotal s Full Approach It s More Than Just Hadoop Pivotal Data Labs 3 Why Pivotal Exists First Movers Solve the Big Data Utility Gap
More informationData Governance in the Hadoop Data Lake. Michael Lang May 2015
Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com StreamHorizon & Big Data Integrates into your Data Processing Pipeline Seamlessly integrates at any point of your your data processing pipeline Implements
More informationTE's Analytics on Hadoop and SAP HANA Using SAP Vora
TE's Analytics on Hadoop and SAP HANA Using SAP Vora Naveen Narra Senior Manager TE Connectivity Santha Kumar Rajendran Enterprise Data Architect TE Balaji Krishna - Director, SAP HANA Product Mgmt. -
More informationBig Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.
Big Data Analytics 1 Priority Discussion Topics What are the most compelling business drivers behind big data analytics? Do you have or expect to have data scientists on your staff, and what will be their
More informationA very short talk about Apache Kylin Business Intelligence meets Big Data. Fabian Wilckens EMEA Solutions Architect
A very short talk about Apache Kylin Business Intelligence meets Big Data Fabian Wilckens EMEA Solutions Architect 1 The challenge today 2 Very quickly: OLAP Online Analytical Processing How many beers
More informationInformation Builders Mission & Value Proposition
Value 10/06/2015 2015 MapR Technologies 2015 MapR Technologies 1 Information Builders Mission & Value Proposition Economies of Scale & Increasing Returns (Note: Not to be confused with diminishing returns
More informationBig Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
More informationBig Data Success Step 1: Get the Technology Right
Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation
More informationBig Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth
MAKING BIG DATA COME ALIVE Big Data Architecture & Analytics A comprehensive approach to harness big data architecture and analytics for growth Steve Gonzales, Principal Manager steve.gonzales@thinkbiganalytics.com
More informationBig Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationHow To Turn Big Data Into An Insight
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
More informationRCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG
1 RCFile: A Fast and Space-efficient Data Placement Structure in MapReduce-based Warehouse Systems CLOUD COMPUTING GROUP - LITAO DENG Background 2 Hive is a data warehouse system for Hadoop that facilitates
More informationSQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
More informationBig Data Patterns. Ron Bodkin Founder and President, Think Big
Big Data Patterns Ron Bodkin Founder and President, Think Big 1 About Me Ron Bodkin Founder and President, Think Big I have 9 years experience working with Big Data and Hadoop. In 2010, I founded Think
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationPulsar Realtime Analytics At Scale. Tony Ng April 14, 2015
Pulsar Realtime Analytics At Scale Tony Ng April 14, 2015 Big Data Trends Bigger data volumes More data sources DBs, logs, behavioral & business event streams, sensors Faster analysis Next day to hours
More informationCloudera Enterprise Data Hub in Telecom:
Cloudera Enterprise Data Hub in Telecom: Three Customer Case Studies Version: 103 Table of Contents Introduction 3 Cloudera Enterprise Data Hub for Telcos 4 Cloudera Enterprise Data Hub in Telecom: Customer
More informationLiktap 2012 Search Engine
Related Searches at LinkedIn Mitul Tiwari Joint work with Azarias Reda, Yubin Park, Christian Posse, and Sam Shah LinkedIn 1 Who am I 2 Outline About LinkedIn Related Searches Design Implementation Evaluation
More informationRoadmap Talend : découvrez les futures fonctionnalités de Talend
Roadmap Talend : découvrez les futures fonctionnalités de Talend Cédric Carbone Talend Connect 9 octobre 2014 Talend 2014 1 Connecting the Data-Driven Enterprise Talend 2014 2 Agenda Agenda Why a Unified
More informationUnified Big Data Analytics Pipeline. 连 城 lian@databricks.com
Unified Big Data Analytics Pipeline 连 城 lian@databricks.com What is A fast and general engine for large-scale data processing An open source implementation of Resilient Distributed Datasets (RDD) Has an
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationBusiness Intelligence for Big Data
Business Intelligence for Big Data Will Gorman, Vice President, Engineering May, 2011 2010, Pentaho. All Rights Reserved. www.pentaho.com. What is BI? Business Intelligence = reports, dashboards, analysis,
More informationData Integration Hub
Data Integration Hub Data Integration Hub Provides a Better Way Actual Customer Point-to-Point Data Architecture Modern Data Integration Hub Masked Informatica Data Integration Hub Accelerate data projects
More informationChase Wu New Jersey Ins0tute of Technology
CS 698: Special Topics in Big Data Chapter 4. Big Data Analytics Platforms Chase Wu New Jersey Ins0tute of Technology Some of the slides have been provided through the courtesy of Dr. Ching-Yung Lin at
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationIntegrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April 9 2013
Integrating Hadoop Into Business Intelligence & Data Warehousing Philip Russom TDWI Research Director for Data Management, April 9 2013 TDWI would like to thank the following companies for sponsoring the
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationData Security in Hadoop
Data Security in Hadoop Eric Mizell Director, Solution Engineering Page 1 What is Data Security? Data Security for Hadoop allows you to administer a singular policy for authentication of users, authorize
More informationthe missing log collector Treasure Data, Inc. Muga Nishizawa
the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationwww.ducenit.com Self-Service Business Intelligence: The hunt for real insights in hidden knowledge Whitepaper
Self-Service Business Intelligence: The hunt for real insights in hidden knowledge Whitepaper Shift in BI usage In this fast paced business environment, organizations need to make smarter and faster decisions
More informationBusiness Intelligence and Healthcare
Business Intelligence and Healthcare SUTHAN SIVAPATHAM SENIOR SHAREPOINT ARCHITECT Agenda Who we are What is BI? Microsoft s BI Stack Case Study (Healthcare) Who we are Point Alliance is an award-winning
More informationMaking big data simple with Databricks
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
More informationSPARK USE CASE IN TELCO. Apache Spark Night 9-2-2014! Chance Coble!
SPARK USE CASE IN TELCO Apache Spark Night 9-2-2014! Chance Coble! Use Case Profile Telecommunications company Shared business problems/pain Scalable analytics infrastructure is a problem Pushing infrastructure
More informationAtScale Intelligence Platform
AtScale Intelligence Platform PUT THE POWER OF HADOOP IN THE HANDS OF BUSINESS USERS. Connect your BI tools directly to Hadoop without compromising scale, performance, or control. TURN HADOOP INTO A HIGH-PERFORMANCE
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationAlejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer
Alejandro Vaisman Esteban Zimanyi Data Warehouse Systems Design and Implementation ^ Springer Contents Part I Fundamental Concepts 1 Introduction 3 1.1 A Historical Overview of Data Warehousing 4 1.2 Spatial
More informationPARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights
More informationAre You Big Data Ready?
ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain
More informationSterling Business Intelligence
Sterling Business Intelligence Release Note Release 9.0 March 2010 Copyright 2010 Sterling Commerce, Inc. All rights reserved. Additional copyright information is located on the documentation library:
More informationbrief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 PART 2 PART 3 BIG DATA PATTERNS...253 PART 4 BEYOND MAPREDUCE...385
brief contents PART 1 BACKGROUND AND FUNDAMENTALS...1 1 Hadoop in a heartbeat 3 2 Introduction to YARN 22 PART 2 DATA LOGISTICS...59 3 Data serialization working with text and beyond 61 4 Organizing and
More informationBIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationTalend Big Data. Delivering instant value from all your data. Talend 2014 1
Talend Big Data Delivering instant value from all your data Talend 2014 1 I may say that this is the greatest factor: the way in which the expedition is equipped. Roald Amundsen race to the south pole,
More informationBig Data for Investment Research Management
IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment firms turn big data into actionable research
More informationJune 2011. Production Hadoop systems in the enterprise
June 2011 Production Hadoop systems in the enterprise 1 What Hadoop changes about data 2 The system past and present 3 Living with it your present and future 4 Q&A 2 2011 Cloudera, Inc. All Rights Reserved.
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More informationINTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS
INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS MICHAEL E. DRISCOLL CEO @ METAMARKETS - @MEDRISCOLL Metamarkets is the bridge from
More informationIBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look
IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based
More informationKNIME & Avira, or how I ve learned to love Big Data
KNIME & Avira, or how I ve learned to love Big Data Facts about Avira (AntiVir) 100 mio. customers Extreme Reliability 500 employees (Tettnang, San Francisco, Kuala Lumpur, Bucharest, Amsterdam) Company
More informationApigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps
White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,
More informationMonitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center
Monitor and Manage Your MicroStrategy BI Environment Using Enterprise Manager and Health Center Presented by: Dennis Liao Sales Engineer Zach Rea Sales Engineer January 27 th, 2015 Session 4 This Session
More informationPlease give me your feedback
Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &
More information