Efficient Big Data Analytics using SQL and Map-Reduce
|
|
|
- Magdalen Wiggins
- 9 years ago
- Views:
Transcription
1 Efficient Big Data Analytics using SQL and Map-Reduce Pekka Kostamaa, VP of Engineering and Big Data Lab ACM Fifteenth International Workshop On Data Warehousing and OLAP DOLAP 2012 Conference, Maui, Hawaii November 2, 2012
2 Agenda Aster Data Introduction Introduction to ncluster SQL-MR Overview Word Count Example npath SQL-MapReduce Use Cases 2
3 3
4 4
5 5
6 6
7 SQL-MapReduce 7
8 Predictive Analytics with SQL-MR SELECT * FROM npath ( ON ( SELECT WHERE u.event_description IN ( SELECT aper.event FROM attrition_paths_event_rank aper ORDER BY aper.count DESC LIMIT 10) ) PATTERN ('(OTHER EVENT){1,20}$') SYMBOLS ( ) RESULT ( ) ) ) n; Interactive Analytics 8
9 Agenda Aster Data Introduction Introduction to ncluster SQL-MR Overview Word Count Example npath SQL-MapReduce Use Cases 9
10 Overview Aster Data was founded in 2005, based in SF/Bay Area, focused on the Big Data space Founders are from Stanford Tasso Argyros Co-President, Teradata Aster Tasso Argyros is Co-President, Teradata Aster, leading the Aster Center of Innovation. Tasso has a background in data management, data mining and large-scale distributed systems. Before founding Aster Data, he was in the Ph.D. program at Stanford University. Tasso was recognized as one of Bloomberg BusinessWeek's Best Young Tech Entrepreneurs for He holds a Master's Degree in Computer Science from Stanford University and a Diploma in Computer Engineering from Technical University of Athens. Mayank Bawa Co-President, Teradata Aster 10 Mayank Bawa, Co-President, Teradata Aster, leads the Aster Center of Innovation's Research and Development and Customer Support organization devoted to building and supporting Aster Data s product portfolio. Mayank co-founded Aster Data in 2005 and led it as its CEO from 2005 to 2010, growing the company from three persons to a strong, well-rounded team that shipped products with market-defining features. Mayank was Aster Data s Chief Customer Officer from 2010 to 2011, working with customers to do fascinating projects on Big Data. Mayank has a Bachelor's degree in Computer Science from IIT-Bombay, and a Master's and Ph.D. degree in Computer Science from Stanford University. Mayank was the recipient of the Stanford Graduate Fellowship (Sequoia Capital Fellow) at Stanford University during his doctorate studies. Mayank has published a dozen academic papers in top database conferences and has been affiliated with IBM Research and Microsoft Research.
11 Overview Teradata completed the acquisition of Aster Data in April Expands Teradata analytical capabilities to span emerging analytics on new data types with Aster Data Customers 11
12 What is Aster Data s Solution? A highly scalable Discovery Platform for Big Data Scale and speed of analytics Accelerated SQL querying Rich analytics on large data sets (MapReduce, SQL-MapReduce) Out-of-the-box analytic functions (Time-series, Sessionization ) Ultra-fast, efficient access to data Analytics applications run inside the DB eliminating data movement Push down any analytics application, packaged or custom Highly scalable, high performance Always On Always Parallel High Concurrency Exceptional price/performance MPP, shared-nothing architecture Teradata Appliance Commodity hardware 12
13 A Look Inside the Aster Data Analytic Platform Analysts Customers Business Users Data Scientists Your Analytics & Advanced Reporting Applications Develop Pattern Matching Graph Statistical ELT Java, C, Python, Perl 45+ pre-built analytic modules Visual IDE; develop apps in hours Many programming languages Process SQL SQL-MapReduce Platform Services (e.g. query planning, dynamic workload management, security ) SQL-MapReduce framework Analyze both non-relational + relational data Linear, incremental scalability Store Relational Row Relational Column Architecture to add Commodity-hardware or Appliance Software only, cloud, or appliance Relational-data architecture can be extended for non-relational types 13
14 Introduction to SQL/MR
15 SQL/MR: Goals Motto: a UDF for the 21 st century Scalable It should be easy to leverage hardware resources of 100 s of servers Fault-tolerance should be handled by the system Analyst-friendly Want flexible, declarative language for analysts Enable developers to create widely-reusable tools for analysts Semantics of queries should not get mixed up with implementation details Developer-friendly Want straightforward programming model Provide useful platform services to developer to maximize freedom 15
16 What is SQL/MR? SQL/MapReduce (SQL/MR) is Aster s framework for enabling the execution of user code within ncluster: User code is installed in the cluster, and then it s invoked on database data from SQL. Execution is automatically parallelized across the cluster. Supports Java and C as primary languages, but other languages are supported through the Streaming interface. 16
17 Basic Primitive: SQL/MR Function Self-describing, parallelized, relation-to-relation transform Operates on an input relation Input relation (table, sub-query, etc.) specified by query By default, allow any input schema; function can choose to reject Input rows accessed row-wise or partition-wise (group-wise) Emits an output relation Function can be used in query like any relation; joined, aggregated, etc. Free-form transformation Output schema chosen by function; based on input schema, argument clauses, external considerations, etc. No enforced relationship between rows in input and output 17
18 Example: Word Count
19 What is Map/Reduce? Map/Reduce is a parallel programming paradigm. A parallel computation is specified in Map/Reduce by defining two functions: map and reduce. Independent of any particular implementation. The first implementation of a Map/Reduce system for dealing with large data was built at Google in
20 Word Count in Map/Reduce Goal: find frequencies of words in a set of documents Canonical teach MapReduce example since Dean and Ghemawat s paper Input data set: Documents (docid int, body text) Visually: word count docid body 1 Jack and Jill went up the hill to fetch a pail of water. Jack fell down and broke his crown, and Jill came tumbling after. 2 Little Miss Muffet sat on a tuffet, eating her curds and whey. Along came a spider, who sat down beside her, and frightened Miss Muffet away! 3 The itsy bitsy spider went up the water spout. Down came the ran and washed the spider out. Out came the sun, and dried up all the rain. And the itsy bitsy spider went up the spout again. went 3 hill 1 up 4 down 3 spider 4 tuffet 1 20
21 Word Count in Map/Reduce - MapReduce performs (logically) in two steps Map: for each document, tokenize the document body into <word, c>, where c is just the count in that one document Reduce: for each word, sum up all of the counts, and emit <word, count> - MapReduce performs (physically) in three steps: 1. Map, on each document independently 2. Shuffle, to bring the partial counts to one place, for each word 3. Reduce, on each word (and its partial counts) independently - Easy to parallelize Map task can run on each document independently Reduce task can run on each word independently 21
22 Input: The Documents Table BEGIN; CREATE FACT TABLE documents (docid int, body text, PARTITION KEY(docid)); INSERT INTO documents VALUES (0, this is a single test document. it is simple to count the words in this single document by hand. do we need a cluster? ); END; SELECT body FROM documents; 22
23 Map Function: tokenize public class tokenize implements RowFunction {... public void operateonsomerows(rowiterator inputiterator, RowEmitter outputemitter) { while ( inputiterator.advancetonextrow() ) { String[] parts = splitpattern_.split(inputiterator.getstringat(1)); } 23 } } for (String part : parts) { outputemitter.addstring(part); outputemitter.addint(1); outputemitter.emitrow(); }
24 Reduce Function: count_tokens public class count_tokens implements PartitionFunction {... public void operateonpartition( PartitionDefinition partitiondefinition, RowIterator inputiterator, RowEmitter outputemitter) { int count = 0; String word = inputiterator.getstringat(0); while ( inputiterator.advancetonextrow() ) count++; } 24 } outputemitter.addstring(word); outputemitter.addint(count); outputemitter.emitrow();
25 Invoking the Functions \install tokenize.jar \install count_tokens.jar SELECT word, count FROM count_tokens ( ON ( SELECT word, count FROM tokenize(on documents)) PARTITION BY word ) ORDER BY word DESC; 25
26 Even Better: Forget the Reduce \install tokenize.jar SELECT word, sum(count) FROM tokenize(on documents) GROUP BY word ORDER BY word; 26
27 Types of SQL/MR Functions RowFunction Corresponds to a map function. Must implement the operateonsomerows method. Must be invoked without a PARTITION BY. Sees all the appropriate rows on a particular worker. PartitionFunction Corresponds to a reduce function. Must implement the operateonpartition method. Must be invoked with a PARTITION BY, which specifies how rows are reshuffled. Sees all the appropriate rows in a partition. 27
28 Introduction to npath
29 What is npath? npath is a SQL/MR function included with ncluster. npath enables analysis of ordered data: Clickstream data Financial transaction data User interaction data Anything of a time series nature Leverages the power of the SQL/MR framework to transcend SQL s limitations with respect to ordered data
30 npath Syntax SELECT FROM npath ( ON { table_name (query) } PARTITION BY expression ORDER BY expression [ ASC DESC ] MODE ( { OVERLAPPING NONOVERLAPPING } ) PATTERN ( pattern_of_symbols ) SYMBOLS ( condition AS symbol [, ] ) RESULT ( aggr_func (expr of symbol) ) ) AS <ALIAS> WHERE GROUP BY HAVING ORDER BY (6) Select output (eg. count) (1) Fetch the data (2) Create partitioned buckets (3) Sort within each bucket (4) Apply symbol pattern match & associated symbol with predicate (5) Conditions to filter npath output Traditional conditions & filters (includes WHERE clause above) 30
31 Review: The Four npath Steps 1. Partition the source data into groups. 2. Order each group to form a sequence. 3. Match subsequences of interest. a. Define a set of symbols via predicates. b. Define the subsequences of interest via a regular expression of symbols. 4. Compute aggregates over each matching subsequence. 31
32 Aster Data Customer Use Cases
33 Retail Banking: Last Mile Marketing
34 Aster in Retail Banking: Last Mile Marketing Challenge Know the last mile of a decision Data Mining tools predict probability but do not ID the last mile Cross-Channel Customer Interactions 17,000 Customers, 1 Month With Aster SQL-MapReduce listens and predicts the last mile - Identifies all interaction patterns prior to acquisition or attrition 34k Branch Visits 25k ATM Sessions Business Impact x less effort to pinpoint a customer in the last mile 5,000 Call Center Sessions 43k s 92k Online Sessions 34
35 Aster MapReduce: Understanding the Last Mile Jan 5: Reverse Fee Request Jan 10: Request Made Again Jan 20: Account Closed Jan 7: Request Made Again Jan 15: Request Made Again What if I knew that this customer was likely to leave? I could Apologize Offer an explanation Reverse the $5 fee It takes 3x more to acquire a customer than to retain one 35
36 Multi-Channel Customer Analysis Iterative Discovery Analytics Business Question(s): Is there any identifiable pattern of behavior prior to account closure? Prior to new product additions? If so, what does this pattern look like? STORE DATA 36
37 Events Preceding Account Closure 37
38 Events Preceding Account Closure SELECT * FROM npath ( ON ( SELECT WHERE u.event_description IN ( SELECT aper.event FROM attrition_paths_event_rank aper ORDER BY aper.count DESC LIMIT 10) ) PATTERN ('(OTHER EVENT){1,20}$') SYMBOLS ( ) RESULT ( ) ) ) n; Interactive Analytics Reducing the Noise to find the Signal 38
39 Events Preceding Account Closure SELECT * FROM npath ( ON ( ) PARTITION BY sba_id ORDER BY datestamp MODE (NONOVERLAPPING) PATTERN ('(OTHER_EVENT FEE_EVENT)+') SYMBOLS ( event LIKE '%REVERSE FEE%' AS FEE_EVENT, event NOT LIKE '%REVERSE FEE%' AS OTHER_EVENT) RESULT ( ) ) n; Closed Accounts Fee reversal seems to be a Signal 39
40 How did Big Data Help? Collect & analyze not only customer transactions, but also customer interactions Use SQL-MapReduce pre-built operators to identify behavioral patterns and uncover business insight Utilize standard BI tools to ensure insights are consumed by the right business analysts and acted upon Big Data for Business = use more data and more analytics to achieve a competitive edge 40
41 Multi-Touch Attribution Optimize Marketing Spend to Drive Higher ROI
42 Strategy for Success Where should I increase my Marketing Spend to drive higher ROI? Multi-Touch Attribution Go beyond last click and identify paths customers take to conversion Quantify channels effectiveness (attribute) to drive revenue Identify which channels perform the best Time sensitive - campaigns and channels to drive rev in the immediate short term 42
43 Customer Journey Leading to Purchase on Online Store Purchase on Online Store 43
44 Customer Path Analysis and Referral Channels 44
45 Attributing Revenue to Various Channels Attribution Function Attributed Revenue EVENT_COLUMN_NAME('event') CONVERSION_EVENT_TYPE_VALU E('purchase') EXCLUDING_EVENT_TYPE_VALUE ('website') TIMESTAMP_COLUMN_NAME('seq uence') WINDOW('rows:10') MODEL1('SIMPLE','EXPONENTI AL:.5') Sequence Event Attribution Rev 10 Purchase $ Registration $ Ad Click $ Google Impression $
46 Attribution by Channel 46
47 Strategy for Success Multi-Touch Attribution Business Impact / Go beyond last click and identify paths customers take to conversion Quantify channels effectiveness (attribute) to drive revenue Identify which channels perform the best Time sensitive - campaigns and channels to drive rev in the immediate short term ROI Calculate true ROMI on a per channel basis Run time-sensitive promotions by knowing which channels convert the fastest. 47
48 Summary Teradata Aster Platform Power of SQL and MapReduce in a single, easy to use platform - SQL/MR - npath Support for multiple types of relational and non-relational data Discovery Analytics to drill down 48
ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE
ADVANCED ANALYTICS AND FRAUD DETECTION THE RIGHT TECHNOLOGY FOR NOW AND THE FUTURE Big Data Big Data What tax agencies are or will be seeing! Big Data Large and increased data volumes New and emerging
Teradata s Big Data Technology Strategy & Roadmap
Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any
Teradata Unified Big Data Architecture
Teradata Unified Big Data Architecture Agenda Recap the challenges of Big Analytics The 2 analytical gaps for most enterprises Teradata Unified Data Architecture - How we bridge the gaps - The 3 core elements
How To Analyze Data In A Database In A Microsoft Microsoft Computer System
Big Data Technical Workshop Sept 24 Minneapolis, MN Data Discovery, Modern Architecture & Visualization Big Data Discovery Demo: Financial Services Customer Journey 1 9/25/2014 AGENDA Key Functionality
Welcome. Host: Eric Kavanagh. [email protected]. The Briefing Room. Twitter Tag: #briefr
The Briefing Room Welcome Host: Eric Kavanagh [email protected] Twitter Tag: #briefr The Briefing Room Mission! Reveal the essential characteristics of enterprise software, good and bad! Provide
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
Discovering Business Insights in Big Data Using SQL-MapReduce
Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013
Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected]
Challenges of Handling Big Data Ramesh Bhashyam Teradata Fellow Teradata Corporation [email protected] Trend Too much information is a storage issue, certainly, but too much information is also
SQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
Harnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
INVESTOR PRESENTATION. First Quarter 2014
INVESTOR PRESENTATION First Quarter 2014 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
Using distributed technologies to analyze Big Data
Using distributed technologies to analyze Big Data Abhijit Sharma Innovation Lab BMC Software 1 Data Explosion in Data Center Performance / Time Series Data Incoming data rates ~Millions of data points/
IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances
IBM Software Business Analytics Cognos Business Intelligence IBM Cognos 10: Enhancing query processing performance for IBM Netezza appliances 2 IBM Cognos 10: Enhancing query processing performance for
Investor Presentation. Second Quarter 2015
Investor Presentation Second Quarter 2015 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences
INVESTOR PRESENTATION. Third Quarter 2014
INVESTOR PRESENTATION Third Quarter 2014 Note to Investors Certain non-gaap financial information regarding operating results may be discussed during this presentation. Reconciliations of the differences
EMC/Greenplum Driving the Future of Data Warehousing and Analytics
EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,
Driving Value From Big Data
Big Data Executive Forum Data Discovery, Modern Architecture & Visualization Driving Value From Big Data Bill Franks Chief Analytics Officer, Teradata It s Not So Much Big Data As it is different data.
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
Datalogix. Using IBM Netezza data warehouse appliances to drive online sales with offline data. Overview. IBM Software Information Management
Datalogix Using IBM Netezza data warehouse appliances to drive online sales with offline data Overview The need Infrastructure could not support the growing online data volumes and analysis required The
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
Microsoft SQL Server Business Intelligence and Teradata Database
Microsoft SQL Server Business Intelligence and Teradata Database Help improve customer response rates by using the most sophisticated marketing automation application available. Integrated Marketing Management
Implement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
Real-time Big Data Analytics with Storm
Ron Bodkin Founder & CEO, Think Big June 2013 Real-time Big Data Analytics with Storm Leading Provider of Data Science and Engineering Services Accelerating Your Time to Value IMAGINE Strategy and Roadmap
Hadoop Job Oriented Training Agenda
1 Hadoop Job Oriented Training Agenda Kapil CK [email protected] Module 1 M o d u l e 1 Understanding Hadoop This module covers an overview of big data, Hadoop, and the Hortonworks Data Platform. 1.1 Module
Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum
Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All
<Insert Picture Here> Oracle and/or Hadoop And what you need to know
Oracle and/or Hadoop And what you need to know Jean-Pierre Dijcks Data Warehouse Product Management Agenda Business Context An overview of Hadoop and/or MapReduce Choices, choices,
Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
Lecture 10 - Functional programming: Hadoop and MapReduce
Lecture 10 - Functional programming: Hadoop and MapReduce Sohan Dharmaraja Sohan Dharmaraja Lecture 10 - Functional programming: Hadoop and MapReduce 1 / 41 For today Big Data and Text analytics Functional
Course Outline: Course: Implementing a Data Warehouse with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning
Course Outline: Course: Implementing a Data with Microsoft SQL Server 2012 Learning Method: Instructor-led Classroom Learning Duration: 5.00 Day(s)/ 40 hrs Overview: This 5-day instructor-led course describes
Customer Insight Appliance. Enabling retailers to understand and serve their customer
Customer Insight Appliance Enabling retailers to understand and serve their customer Customer Insight Appliance Enabling retailers to understand and serve their customer. Technology has empowered today
Using an In-Memory Data Grid for Near Real-Time Data Analysis
SCALEOUT SOFTWARE Using an In-Memory Data Grid for Near Real-Time Data Analysis by Dr. William Bain, ScaleOut Software, Inc. 2012 ScaleOut Software, Inc. 12/27/2012 IN today s competitive world, businesses
MOC 20461C: Querying Microsoft SQL Server. Course Overview
MOC 20461C: Querying Microsoft SQL Server Course Overview This course provides students with the knowledge and skills to query Microsoft SQL Server. Students will learn about T-SQL querying, SQL Server
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
Introduction to Hadoop
Introduction to Hadoop 1 What is Hadoop? the big data revolution extracting value from data cloud computing 2 Understanding MapReduce the word count problem more examples MCS 572 Lecture 24 Introduction
Universal PMML Plug-in for EMC Greenplum Database
Universal PMML Plug-in for EMC Greenplum Database Delivering Massively Parallel Predictions Zementis, Inc. [email protected] USA: 6125 Cornerstone Court East, Suite #250, San Diego, CA 92121 T +1(619)
Artur Borycki. Director International Solutions Marketing
Artur Borycki Director International Solutions Agenda! Evolution of Teradata s Unified Architecture Analytical and Workloads! Teradata s Reference Information Architecture Evolution of Teradata s" Unified
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Big Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
Click Stream Data Analysis Using Hadoop
Governors State University OPUS Open Portal to University Scholarship Capstone Projects Spring 2015 Click Stream Data Analysis Using Hadoop Krishna Chand Reddy Gaddam Governors State University Sivakrishna
Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778
Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778 Course Outline Module 1: Introduction to Business Intelligence and Data Modeling This module provides an introduction to Business
CERULIUM TERADATA COURSE CATALOG
CERULIUM TERADATA COURSE CATALOG Cerulium Corporation has provided quality Teradata education and consulting expertise for over seven years. We offer customized solutions to maximize your warehouse. Prepared
End Small Thinking about Big Data
CITO Research End Small Thinking about Big Data SPONSORED BY TERADATA Introduction It is time to end small thinking about big data. Instead of thinking about how to apply the insights of big data to business
Programming Hadoop 5-day, instructor-led BD-106. MapReduce Overview. Hadoop Overview
Programming Hadoop 5-day, instructor-led BD-106 MapReduce Overview The Client Server Processing Pattern Distributed Computing Challenges MapReduce Defined Google's MapReduce The Map Phase of MapReduce
Introduction to Hadoop
1 What is Hadoop? Introduction to Hadoop We are living in an era where large volumes of data are available and the problem is to extract meaning from the data avalanche. The goal of the software tools
Big Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
BIG DATA SOLUTION DATA SHEET
BIG DATA SOLUTION DATA SHEET Highlight. DATA SHEET HGrid247 BIG DATA SOLUTION Exploring your BIG DATA, get some deeper insight. It is possible! Another approach to access your BIG DATA with the latest
III JORNADAS DE DATA MINING
III JORNADAS DE DATA MINING EN EL MARCO DE LA MAESTRÍA EN DATA MINING DE LA UNIVERSIDAD AUSTRAL PRESENTACIÓN TECNOLÓGICA IBM Alan Schcolnik, Cognos Technical Sales Team Leader, IBM Software Group. IAE
ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION
ORACLE BUSINESS INTELLIGENCE, ORACLE DATABASE, AND EXADATA INTEGRATION EXECUTIVE SUMMARY Oracle business intelligence solutions are complete, open, and integrated. Key components of Oracle business intelligence
Cloudera Certified Developer for Apache Hadoop
Cloudera CCD-333 Cloudera Certified Developer for Apache Hadoop Version: 5.6 QUESTION NO: 1 Cloudera CCD-333 Exam What is a SequenceFile? A. A SequenceFile contains a binary encoding of an arbitrary number
MapReduce Jeffrey Dean and Sanjay Ghemawat. Background context
MapReduce Jeffrey Dean and Sanjay Ghemawat Background context BIG DATA!! o Large-scale services generate huge volumes of data: logs, crawls, user databases, web site content, etc. o Very useful to be able
Implementing a Data Warehouse with Microsoft SQL Server 2012
Implementing a Data Warehouse with Microsoft SQL Server 2012 Module 1: Introduction to Data Warehousing Describe data warehouse concepts and architecture considerations Considerations for a Data Warehouse
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
How To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463)
Implementing a Data Warehouse with Microsoft SQL Server 2012 (70-463) Course Description Data warehousing is a solution organizations use to centralize business data for reporting and analysis. This five-day
UNIFY YOUR (BIG) DATA
UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs [email protected] t Unify Your (Big) Data Analytic Strategy Technology excitement:
SQL Databases Course. by Applied Technology Research Center. This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases.
SQL Databases Course by Applied Technology Research Center. 23 September 2015 This course provides training for MySQL, Oracle, SQL Server and PostgreSQL databases. Oracle Topics This Oracle Database: SQL
Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05
Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970
Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777
Implementing a Data Warehouse with Microsoft SQL Server 2012 MOC 10777 Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing
Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
Big Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
SQL Performance for a Big Data 22 Billion row data warehouse
SQL Performance for a Big Data Billion row data warehouse Dave Beulke dave @ d a v e b e u l k e.com Dave Beulke & Associates Session: F19 Friday May 8, 15 8: 9: Platform: z/os D a v e @ d a v e b e u
Microsoft Big Data. Solution Brief
Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,
Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5 Days
Lincoln Land Community College Capital City Training Center 130 West Mason Springfield, IL 62702 217-782-7436 www.llcc.edu/cctc Implementing Data Models and Reports with Microsoft SQL Server 20466C; 5
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive
Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database
White Paper Using Attunity Replicate with Greenplum Database Using Attunity Replicate for data migration and Change Data Capture to the Greenplum Database Abstract This white paper explores the technology
Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
COSC 6397 Big Data Analytics. 2 nd homework assignment Pig and Hive. Edgar Gabriel Spring 2015
COSC 6397 Big Data Analytics 2 nd homework assignment Pig and Hive Edgar Gabriel Spring 2015 2 nd Homework Rules Each student should deliver Source code (.java files) Documentation (.pdf,.doc,.tex or.txt
Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities
Understanding Data Warehouse Needs Session #1568 Trends, Issues and Capabilities Dr. Frank Capobianco Advanced Analytics Consultant Teradata Corporation Tracy Spadola CPCU, CIDM, FIDM Practice Lead - Insurance
Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot
www.etidaho.com (208) 327-0768 Data Mining, Predictive Analytics with Microsoft Analysis Services and Excel PowerPivot 3 Days About this Course This course is designed for the end users and analysts that
Completing the Big Data Ecosystem:
Completing the Big Data Ecosystem: in sqrrl data INC. August 3, 2012 Design Drivers in Analysis of big data is central to our customers requirements, in which the strongest drivers are: Scalability: The
Connecting Hadoop with Oracle Database
Connecting Hadoop with Oracle Database Sharon Stephen Senior Curriculum Developer Server Technologies Curriculum The following is intended to outline our general product direction.
Accelerating Hadoop MapReduce Using an In-Memory Data Grid
Accelerating Hadoop MapReduce Using an In-Memory Data Grid By David L. Brinker and William L. Bain, ScaleOut Software, Inc. 2013 ScaleOut Software, Inc. 12/27/2012 H adoop has been widely embraced for
Data processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 OVERVIEW About this Course Data warehousing is a solution organizations use to centralize business data for reporting and analysis.
16.1 MAPREDUCE. For personal use only, not for distribution. 333
For personal use only, not for distribution. 333 16.1 MAPREDUCE Initially designed by the Google labs and used internally by Google, the MAPREDUCE distributed programming model is now promoted by several
Understanding Your Customer Journey by Extending Adobe Analytics with Big Data
SOLUTION BRIEF Understanding Your Customer Journey by Extending Adobe Analytics with Big Data Business Challenge Today s digital marketing teams are overwhelmed by the volume and variety of customer interaction
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Course Outline. Module 1: Introduction to Data Warehousing
Course Outline Module 1: Introduction to Data Warehousing This module provides an introduction to the key components of a data warehousing solution and the highlevel considerations you must take into account
Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata
Bringing Intergalactic Data Speak (a.k.a.: SQL) to Hadoop Martin Willcox [@willcoxmnk], Director Big Data Centre of Excellence (Teradata International) 4 th June 2015 Agenda A (very!) short history of
IBM Netezza High Capacity Appliance
IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data
Workshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
Implementing a Data Warehouse with Microsoft SQL Server 2012
Course 10777A: Implementing a Data Warehouse with Microsoft SQL Server 2012 Length: Audience(s): 5 Days Level: 200 IT Professionals Technology: Microsoft SQL Server 2012 Type: Delivery Method: Course Instructor-led
Big Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
How to Enhance Traditional BI Architecture to Leverage Big Data
B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...
Oracle Business Intelligence EE. Prab h akar A lu ri
Oracle Business Intelligence EE Prab h akar A lu ri Agenda 1.Overview 2.Components 3.Oracle Business Intelligence Server 4.Oracle Business Intelligence Dashboards 5.Oracle Business Intelligence Answers
Course ID#: 1401-801-14-W 35 Hrs. Course Content
Course Content Course Description: This 5-day instructor led course provides students with the technical skills required to write basic Transact- SQL queries for Microsoft SQL Server 2014. This course
In-Memory Analytics for Big Data
In-Memory Analytics for Big Data Game-changing technology for faster, better insights WHITE PAPER SAS White Paper Table of Contents Introduction: A New Breed of Analytics... 1 SAS In-Memory Overview...
