Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications
|
|
- Stephany Holmes
- 8 years ago
- Views:
Transcription
1 Optimization of Analytic Data Flows for Next Generation Business Intelligence Applications Umeshwar Dayal, Kevin Wilkinson, Alkis Simitsis, Malu Castellanos, Lupita Paz HP Labs Palo Alto, CA, USA 1 Copyright Copyright 2010 Hewlett-Packard 2010 Hewlett-Packard Development Development Company, Company, L.P. L.P.
2 OUTLINE Next Generation Business Intelligence Analytic Data Flows Challenges in Analytic Data Flow Design and Optimization QoX Framework QoX-driven Optimization of Data Integration Flows Micro-benchmarks Open Problems Summary
3 TRADITIONAL BUSINESS INTELLIGENCE ARCHITECTURE BI: Technologies, tools, and practices for collecting, integrating, analyzing, and presenting large volumes of information to enable better decision making Focus on enterprise OLTP data sources Batch ETL (Extract-Transform-Load) pipeline Front-end analytics for strategic decision making
4 EMERGENCE OF AN INFLECTION POINT Sensors, real time events, unstructured data and web contents have the promise of transforming the way we manage our customers, resources, environments, health, The shift to the web makes available unprecedented quantities of structured and unstructured databases about human activities and sensing technologies are making available great quantities of data that provide unprecedented scope and resolution the availability of rich streams of data, create(s) an inflection point.. for generating insights and guiding decision making To date, we have only scratched the surface of the potential for learning from these large-scale data sets.. - From Data to Knowledge to Action A Global Enabler for the 21 st Century, Computing Community Consortium, August 2010 Analytics, and specifically technologies that enable real-time analysis of large data volumes, social media analytics, and mobile BI will continue to dominate the enterprise agenda - Computer World, Jan
5 Live BI: Decisions at any time scale Executives Model Builders Enterprise Operational systems Business Control Systems Operations staff Operational Data Stores Enterprise Data Warehouse Decreasing time scales for response (decision latencies) Traditional Business Intelligence 5 5 Increasing need for automation Live Business Intelligence
6 ANALYTICS DATA FLOWS End-to-end Analytics Data Flow Front-end Analytics Back-end Analytics Analysis querying, reporting, data mining, modeling, analytics, visualization operations Loading Transformation Cleaning, de-noising, entity identification, etc. Information extraction, access Streaming Challenges Long development cycles Little support for optimization or automation Possibly many objectives: Correctness, Fault-tolerance, Scalability, Maintainability, Cost Many types of data sources: Structured/ Unstructured, Stored/ Streaming, Internal/ External Complex flows, involving ETL, stream processing, queries, analytics, visualization operations Possibly executed on multiple engines
7 NEXT-GENERATION: LIVE BI Development & Deployment-time Execution-time Interaction and Delivery Data Flow Composer and Editor Federation and Orchestration Optimizer Execution Engines RDBMS Hadoop ETL engine Massively parallel, large memory cluster Stream engine Analytics Engine Data Stores RDB Unstructured File Data Cloud Storage other Streaming data sources (sensor data, event logs, social media, web data )
8 EXAMPLE FLOW 1: STRUCTURED DATA -- TPC-H SOURCES Store 1 Lineitem Union Store n Lineitem Surrogate key generate Lineitem Orders Compute sales = quantity * unitcost Filter on date range Join on orderkey (get relevant sales for date range) Filter on cmpnkey (get products, date range) Join on prodkey (get relevant sales for products) 9 CmpnDim Copyright 2010 Hewlett-Packard Development Company, L.P. Rollup sales on prodkey, cmpnkey RptSalesByProdCmpn Sales of products in response to marketing campaign
9 EXAMPLE FLOW 2: UNSTRUCTURED DATA SENTIMENT ANALYSIS 10 Copyright 2010 Hewlett-Packard Development Company, L.P.
10 EXAMPLE FLOW 3 (STRUCTURED & UNSTRUCTURED DATA) Tweets Filter on any mention of enterprise or its products Relevant tweets Sentiment analysis for products by tweet RptSAbyTwtProd CmpnDim Surrogate key generate Filter on cmpnkey (get products, date range) Filter on date range (get tweets during campaign) Join on prodkey (get tweets for products) 11 RptSalesByProdCmpn Copyright 2010 Hewlett-Packard Development Company, L.P. Rollup sentimentscore on prodkey, cpmnkey, attribute (aggregate sentiment scores) Join on prodkey, cmpnkey RptSAbyProdCmpn Correlate Sales and Sentiments for products in response to marketing campaign
11 EXAMPLE FLOW 4: EVENT AND STREAM ANALYTICS Sensor data, external event Streams Stream pre-processing Detect Patterns/Events Events Event Stream Event Correlation Diagnosis & Root Cause Analysis Complex Events Diagnoses Contextual data (structured, unsructured) Event History Prediction Action Assessment Predicted Events Recommended Actions Event Discovery and Creation Event Analysis 12
12 document batch file Data Sources Preprocessing (Adaptor) (Doc Info, Text body) vertical slice profile info profile info Token, POS Tag, Negation Flag, Attribute Flag, Sentiment Flag] Relate Sentiment to Attribute (Doc Info, Text body) attribute sentiment value Doc Info = (Doc id, Time Stamp, Source type, Source URL, Author Profile, GeoTag, Score) Author Profile = (id, name, , location, gender, ) Splitter on Doc Id, hash partition, 3x parallel Sentiment Word Detection sentiment word dict. Attribute, Sentiment Value] (Doc Info, Text body) Token, POS Tag, Negation Flag, Attribute Flag] Join on doc Id profile info Remove Docs from Blacklisted Authors (Doc Info, Text body) Attribute Extraction attribute dict. Attribute, Sentiment Value] Sentence Detector Token, POS Tag, Negation flag] Aggregate by Doc document sentiment value Sentence] Negation Detection negation dict. (Doc Info, Attribute, Sentiment Value]) Tokenizer Token, POS Tag], order <Doc Id, Sentence Id, Token Loc> Token id, Token], order <Doc Id, Sentence Id, Token Loc> Compound Phrases phrase dict. DASHBOARD POS Tagging Token id, Token], order <Doc Id, Sentence Id, Token Loc> (Doc Info, [ Token id, Token, POS Tag]) recovery point Merger on Doc id, 3x parallel create recovery point FROM DESIGN TO EXECUTION [Dayal, et al., EDBT 09] [Wilkinson, Simitsis ER 10] Flow design editor xlm Logical Data Flow Graph Performance Fault-tolerance Maintainability Freshness Scalability Availability Cost QoX Objectives Optimizer Optimized Physical Data Flow Graph Sources Hadoop Scripts xlm Enginespecific codegen tool-specific execution instructions Scripts DBMS ETL Tool Target 13
13 Quality objectives functional correctness the optimization of the initial flow does not affect the flow semantics or correctness performance measures the flow completion in terms of time and resources used freshness the latency between the occurrence of an event at a source system and the reflection of that event at the target system reliability the ability of flow to complete successfully within the specified time window despite failures 14 14
14 QUALITY OBJECTIVES Performance: usually throughput, total execution time Reliability: probability that a flow will complete correctly (~ mean time between failures) Recoverability: ability to restore the flow after a failure (~mean time to recover) Scalability: ability to process higher volumes of data or higher data rates Freshness: latency between source data and data in the DW Consistency: accuracy (correctness) of the data in the DW Availability: probability that the data is available for queries Maintainability: ability to operate at specified service levels Flexibility: ability to respond to evolving requirements Affordability: cost of integration project Traceability: ability to track lineage (provenance) of data Auditability: ability to satisfy legal/regulatory compliance requirements
15 TRADEOFFS AMONG QoX OBJECTIVES Fastest Recoverable REC REC Perf Rec Rel Cost w/o RPs Performance w/ 2 RPs STATE=AZ REC REC Perf Rec Rel Cost STATE=CA REC REC w/o RPs STATE=NV REC REC w/ 6 RPs Reliable Perf Rec Rel Cost w/o RPs + n/a
16 QoX Optimization [Simitsis, et al., ICDE 10 Simitsis, et al., SIGMOD 09] Logical ] Optimization: Flow restructuring Parallelization, partitioning Collapsing chains Reordering operators Inserting recovery (persistence) points Replication Physical optimization Implementations of operators Materialize vs. Virtualize Choice of execution engines Data shipping vs.function shipping Optimization for performance and faulttolerance Given a data flow F, objective function OF w n: input recordset size w: execution time window k: number of faults to be tolerated
17 STATE SPACE SEARCH initial state min-cost state 18
18 OPTIMIZATION TACTICS: TRANSITIONS IN THE STATE SPACE GRAPH factorize/ distribute swap addrp partition replicate 19
19 OPTIMIZATION FOR MULTI-ENGINE EXECUTION Characterize the execution of operations on different execution engines Implement micro-benchmarks for individual operators Learn cost functions to plug into optimization objective Interpolate from micro-benchmarks Compose costs for individual operations into cost of flow Account for resource contention Extend transitions considered by the optimizer to include implementation and engine choices Extend heuristics for state space search
20 Example: Sentiment Analysis Flow Logical Model Data Sources Doc Info = (Doc id, Time Stamp, Source type, Source URL, Author Profile, GeoTag, Score) Author Profile = (id, name, , location, gender, ) Other Operators Preprocessing (Adaptor) (Doc Info, Text body) document batch file (Doc Info, Attribute, Sentiment Value]) Aggregate by Doc Sentence Detector Attribute, Sentiment Value] Sentence] Remove Docs from Blacklisted Authors Tokenizer Token id, Token], order <Doc Id, Sentence Id, Token Loc> Attribute, Sentiment Value] POS Tagging Relate Sentiment to Attribute (Doc Info, [ Token id, Token, POS Tag]) Token, POS Tag, Negation Flag, Attribute Flag, Sentiment Flag] phrase dict. Compound Phrases Sentiment Word Detection Token, POS Tag], order <Doc Id, Sentence Id, Token Loc> Token, POS Tag, Negation Flag, Attribute Flag] negation dict. Negation Detection Token, POS Tag, Negation flag] Attribute Extraction document sentiment value attribute sentiment value sentiment word dict. attribute dict. 21
21 Example: Sentiment Analysis Flow Physical Model Sources Data Sources Doc Info = (Doc id, Time Stamp, Source type, Source URL, Author Profile, GeoTag, Score) Author Profile = (id, name, , location, gender, ) document batch file Scripts Preprocessing (Adaptor) (Doc Info, Text body) vertical slice profile info profile info (Doc Info, Text body) Splitter on Doc Id, hash partition, 3x parallel Hadoop (Doc Info, Text body) Remove Docs from Blacklisted Authors (Doc Info, Text body) Sentence Detector Sentence] Tokenizer Token id, Token], order <Doc Id, Sentence Id, Token Loc> POS Tagging Token id, Token], order <Doc Id, Sentence Id, Token Loc> recovery point Scripts Merger on Doc id, 3x parallel create recovery point DBMS Token, POS Tag, Negation Flag, Attribute Flag, Sentiment Flag] Sentiment Word Detection sentiment word dict. Token, POS Tag, Negation Flag, Attribute Flag] Attribute Extraction attribute dict. Token, POS Tag, Negation flag] Negation Detection negation dict. Token, POS Tag], order <Doc Id, Sentence Id, Token Loc> Compound Phrases phrase dict. (Doc Info, [ Token id, Token, POS Tag]) ETL Tool Relate Sentiment to Attribute Attribute, Sentiment Value] Join on doc Id Attribute, Sentiment Value] Aggregate by Doc (Doc Info, Attribute, Sentiment Value]) Target DASHBOARD attribute sentiment value profile info document sentiment value 22
22 MICRO-BENCHMARKS FOR CONVENTIONAL (ETL, RELATIONAL) OPERATORS Experiments on sort, join, surrogate-key, etc. Example: Sort os-sort home-grown distributed sort implementation hd-sort Hadoop-based sort (used Pig Latin) pdb-sort used a commercial, parallel DBMS Dataset TPC-H lineitem SF Size (GBs) Rows (x10 6 )
23 Sort (Comparison) Hadoop vs. Custom script vs. pdb (with and without loading) Pig script (hd-sort): sf$sf = load 'lineitem.tbl.sf$sf' using PigStorage(' ') as (f1,f2,...,f17); ord1_$nd = order sf$sf by f1 parallel $nd;... ord10_$nd = order ord9_$nd by f10 parallel $nd; store ord$op_$nd into 'res_sf$sf_ord$op-$nd.dat using PigStorage(' '); C and shell scripts (os-sort): split file: filesize/#nodes (leftovers go to chunk #1) transfer data sort chunks on nodes transfer back and merge (hierarchical) (to be fair with hd-sort and etl-sort, avoid pipelining) 24 (TPC-H data SF:1,10,100)
24 Sort (Tuning) Hadoop e.g., #reducers Shell scripts e.g., analyze sort internal phases 25
25 MICRO-BENCHMARKS FOR UNCONVENTIONAL OPERATORS Experiments on Sentiment Analysis operators individual operators the whole sentiment analysis flow treated as a single operator Implementations Single node Distributed, scripts Distributed, Hadoop Datasets Small- to medium-sized document corpus: up to 200,000 documents Large corpus: ~30 million documents
26 MICRO-BENCHMARKS FOR UNCONVENTIONAL OPERATORS (SENTIMENT ANALYSIS) Single node, small to mediumsized corpus Use linear regression to learn interpolation function 27 Copyright 2010 Hewlett-Packard Development Company, L.P. Single node, large corpus; effect of limited memory 12-node Hadoop cluster
27 SENTIMENT ANALYSIS (COMPARISON) Single node vs. Distributed with Hadoop vs. Distributed without Hadoop
28 OPEN PROBLEMS: FUTURE WORK Micro-benchmarks of other operators: ETL, relational, content analytics, front-end analytics, stream processing Cost functions: interpolation, composition Benchmarks and objective functions for other QoX measures Optimization strategies for physical optimization Assign segments of the flow to execution engines Where, when, and what to materialize
29 SUMMARY Design of data flows for BI applications is a very expensive, largely manual activity: little support for automation, optimization Situation gets worse with next gen BI: fuse back-end and front-end operations into end-to-end analytic data flows Many different types of sources, more complex operators, multiple execution engines Layered methodology: QoX-driven optimization allows trading off different quality objectives Need benchmarks and micro-benchmarks to drive the design and optimization Many challenges remain
30 31 Copyright 2010 Hewlett-Packard Development Company, L.P. Thank You
Business Processes Meet Operational Business Intelligence
Business Processes Meet Operational Business Intelligence Umeshwar Dayal, Kevin Wilkinson, Alkis Simitsis, Malu Castellanos HP Labs, Palo Alto, CA, USA Abstract As Business Intelligence architectures evolve
More informationData Integration Flows for Business Intelligence
Data Integration Flows for Business Intelligence Umeshwar Dayal HP Labs Palo Alto, Ca, USA umeshwar.dayal@hp.com Malu Castellanos HP Labs Palo Alto, Ca, USA malu.castellanos@hp.com Alkis Simitsis HP Labs
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationTrafodion Operational SQL-on-Hadoop
Trafodion Operational SQL-on-Hadoop SophiaConf 2015 Pierre Baudelle, HP EMEA TSC July 6 th, 2015 Hadoop workload profiles Operational Interactive Non-interactive Batch Real-time analytics Operational SQL
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationReal Time Analytics for Big Data. NtiSh Nati Shalom @natishalom
Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationGetting Real Real Time Data Integration Patterns and Architectures
Getting Real Real Time Data Integration Patterns and Architectures Nelson Petracek Senior Director, Enterprise Technology Architecture Informatica Digital Government Institute s Enterprise Architecture
More informationHow To Use Hp Vertica Ondemand
Data sheet HP Vertica OnDemand Enterprise-class Big Data analytics in the cloud Enterprise-class Big Data analytics for any size organization Vertica OnDemand Organizations today are experiencing a greater
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationTesting Big data is one of the biggest
Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing
More informationHadoop Introduction. Olivier Renault Solution Engineer - Hortonworks
Hadoop Introduction Olivier Renault Solution Engineer - Hortonworks Hortonworks A Brief History of Apache Hadoop Apache Project Established Yahoo! begins to Operate at scale Hortonworks Data Platform 2013
More information5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014
5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for
More informationThe Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn
The Big Data Ecosystem at LinkedIn Roshan Sumbaly, Jay Kreps, and Sam Shah LinkedIn Presented by :- Ishank Kumar Aakash Patel Vishnu Dev Yadav CONTENT Abstract Introduction Related work The Ecosystem Ingress
More informationIl mondo dei DB Cambia : Tecnologie e opportunita`
Il mondo dei DB Cambia : Tecnologie e opportunita` Giorgio Raico Pre-Sales Consultant Hewlett-Packard Italiana 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject
More informationSimplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!
Simplifying Big Data Analytics: Unifying Batch and Stream Processing John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!! Streaming Analy.cs S S S Scale- up Database Data And Compute Grid
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationApigee Insights Increase marketing effectiveness and customer satisfaction with API-driven adaptive apps
White provides GRASP-powered big data predictive analytics that increases marketing effectiveness and customer satisfaction with API-driven adaptive apps that anticipate, learn, and adapt to deliver contextual,
More informationCustomized Report- Big Data
GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.
More informationBig Data and Analytics in Government
Big Data and Analytics in Government Nov 29, 2012 Mark Johnson Director, Engineered Systems Program 2 Agenda What Big Data Is Government Big Data Use Cases Building a Complete Information Solution Conclusion
More informationBuilding a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering
Building a real-time, self-service data analytics ecosystem Greg Arnold, Sr. Director Engineering Self Service at scale 6 5 4 3 2 1 ? Relational? MPP? Hadoop? Linkedin data 350M Members 25B 3.5M 4.8B 2M
More informationSQL Server Administrator Introduction - 3 Days Objectives
SQL Server Administrator Introduction - 3 Days INTRODUCTION TO MICROSOFT SQL SERVER Exploring the components of SQL Server Identifying SQL Server administration tasks INSTALLING SQL SERVER Identifying
More informationAn Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationData Warehouse (DW) Maturity Assessment Questionnaire
Data Warehouse (DW) Maturity Assessment Questionnaire Catalina Sacu - csacu@students.cs.uu.nl Marco Spruit m.r.spruit@cs.uu.nl Frank Habers fhabers@inergy.nl September, 2010 Technical Report UU-CS-2010-021
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationUnified Batch & Stream Processing Platform
Unified Batch & Stream Processing Platform Himanshu Bari Director Product Management Most Big Data Use Cases Are About Improving/Re-write EXISTING solutions To KNOWN problems Current Solutions Were Built
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationBig Data Are You Ready? Jorge Plascencia Solution Architect Manager
Big Data Are You Ready? Jorge Plascencia Solution Architect Manager Big Data: The Datafication Of Everything Thoughts Devices Processes Thoughts Things Processes Run the Business Organize data to do something
More informationSession 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges
Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges James Campbell Corporate Systems Engineer HP Vertica jcampbell@vertica.com Big
More informationHow to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW
How to make BIG DATA work for you. Faster results with Microsoft SQL Server PDW Roger Breu PDW Solution Specialist Microsoft Western Europe Marcus Gullberg PDW Partner Account Manager Microsoft Sweden
More informationAssociate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2
Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue
More informationTHE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS
THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS WHITE PAPER Successfully writing Fast Data applications to manage data generated from mobile, smart devices and social interactions, and the
More informationSafe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
More informationOracle9i Data Warehouse Review. Robert F. Edwards Dulcian, Inc.
Oracle9i Data Warehouse Review Robert F. Edwards Dulcian, Inc. Agenda Oracle9i Server OLAP Server Analytical SQL Data Mining ETL Warehouse Builder 3i Oracle 9i Server Overview 9i Server = Data Warehouse
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationAzure Data Lake Analytics
Azure Data Lake Analytics Compose and orchestrate data services at scale Fully managed service to support orchestration of data movement and processing Connect to relational or non-relational data
More informationPALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management
PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their
More informationInformation Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More informationA Brief Introduction to Apache Tez
A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value
More informationSQL Server 2012 Performance White Paper
Published: April 2012 Applies to: SQL Server 2012 Copyright The information contained in this document represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.
More informationAddressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO. Big Data Everywhere Conference, NYC November 2015
Addressing Risk Data Aggregation and Risk Reporting Ben Sharma, CEO Big Data Everywhere Conference, NYC November 2015 Agenda 1. Challenges with Risk Data Aggregation and Risk Reporting (RDARR) 2. How a
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationAdvanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationTrustworthiness of Big Data
Trustworthiness of Big Data International Journal of Computer Applications (0975 8887) Akhil Mittal Technical Test Lead Infosys Limited ABSTRACT Big data refers to large datasets that are challenging to
More informationAugmented Search for Web Applications. New frontier in big log data analysis and application intelligence
Augmented Search for Web Applications New frontier in big log data analysis and application intelligence Business white paper May 2015 Web applications are the most common business applications today.
More informationBig Data. Fast Forward. Putting data to productive use
Big Data Putting data to productive use Fast Forward What is big data, and why should you care? Get familiar with big data terminology, technologies, and techniques. Getting started with big data to realize
More informationBeyond Lambda - how to get from logical to physical. Artur Borycki, Director International Technology & Innovations
Beyond Lambda - how to get from logical to physical Artur Borycki, Director International Technology & Innovations Simplification & Efficiency Teradata believe in the principles of self-service, automation
More informationXpoLog Competitive Comparison Sheet
XpoLog Competitive Comparison Sheet New frontier in big log data analysis and application intelligence Technical white paper May 2015 XpoLog, a data analysis and management platform for applications' IT
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationBig Data Architect Certification Self-Study Kit Bundle
Big Data Architect Certification Bundle This certification bundle provides you with the self-study materials you need to prepare for the exams required to complete the Big Data Architect Certification.
More informationData Management in the Cloud
Data Management in the Cloud Ryan Stern stern@cs.colostate.edu : Advanced Topics in Distributed Systems Department of Computer Science Colorado State University Outline Today Microsoft Cloud SQL Server
More informationBenchmarking No One Size Fits All Big Data Analytics
Benchmarking No One Size Fits All Big Data Analytics Zhian He, Mayuresh Kunjir, Harold Lim, Eric Lo, Shivnath Babu, Meichun Hsu, and Malu Castellanos The Hong Kong Polytechnic University and Duke University
More informationUp Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
More informationCONNECTING DATA WITH BUSINESS
CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm
More informationLecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop
Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social
More informationData Virtualization for Agile Business Intelligence Systems and Virtual MDM. To View This Presentation as a Video Click Here
Data Virtualization for Agile Business Intelligence Systems and Virtual MDM To View This Presentation as a Video Click Here Agenda Data Virtualization New Capabilities New Challenges in Data Integration
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationMicrosoft Business Intelligence
Microsoft Business Intelligence P L A T F O R M O V E R V I E W M A R C H 1 8 TH, 2 0 0 9 C H U C K R U S S E L L S E N I O R P A R T N E R C O L L E C T I V E I N T E L L I G E N C E I N C. C R U S S
More informationAgile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
More informationWhat's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
More informationSAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013
SAP HANA SAP s In-Memory Database Dr. Martin Kittel, SAP HANA Development January 16, 2013 Disclaimer This presentation outlines our general product direction and should not be relied on in making a purchase
More informationEnd to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ
End to End Solution to Accelerate Data Warehouse Optimization Franco Flore Alliance Sales Director - APJ Big Data Is Driving Key Business Initiatives Increase profitability, innovation, customer satisfaction,
More informationBIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationWelcome. Host: Eric Kavanagh. eric.kavanagh@bloorgroup.com. The Briefing Room. Twitter Tag: #briefr
The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com Twitter Tag: #briefr The Briefing Room Mission! Reveal the essential characteristics of enterprise software, good and bad! Provide
More information70-467: Designing Business Intelligence Solutions with Microsoft SQL Server
70-467: Designing Business Intelligence Solutions with Microsoft SQL Server The following tables show where changes to exam 70-467 have been made to include updates that relate to SQL Server 2014 tasks.
More informationSAP and Hortonworks Reference Architecture
SAP and Hortonworks Reference Architecture Hortonworks. We Do Hadoop. June Page 1 2014 Hortonworks Inc. 2011 2014. All Rights Reserved A Modern Data Architecture With SAP DATA SYSTEMS APPLICATIO NS Statistical
More informationQLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM
QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM QlikView Technical Case Study Series Big Data June 2012 qlikview.com Introduction This QlikView technical case study focuses on the QlikView deployment
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationHow, What, and Where of Data Warehouses for MySQL
How, What, and Where of Data Warehouses for MySQL Robert Hodges CEO, Continuent. Introducing Continuent The leading provider of clustering and replication for open source DBMS Our Product: Continuent Tungsten
More informationOracle Big Data Discovery Unlock Potential in Big Data Reservoir
Oracle Big Data Discovery Unlock Potential in Big Data Reservoir Gokula Mishra Premjith Balakrishnan Business Analytics Product Group September 29, 2014 Copyright 2014, Oracle and/or its affiliates. All
More informationOracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.
Oracle BI EE Implementation on Netezza Prepared by SureShot Strategies, Inc. The goal of this paper is to give an insight to Netezza architecture and implementation experience to strategize Oracle BI EE
More informationIntegrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics
Paper 1828-2014 Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics John Cunningham, Teradata Corporation, Danville, CA ABSTRACT SAS High Performance Analytics (HPA) is a
More informationMobile Storage and Search Engine of Information Oriented to Food Cloud
Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:
More informationBuilding Scalable Big Data Infrastructure Using Open Source Software. Sam William sampd@stumbleupon.
Building Scalable Big Data Infrastructure Using Open Source Software Sam William sampd@stumbleupon. What is StumbleUpon? Help users find content they did not expect to find The best way to discover new
More informationInnovative technology for big data analytics
Technical white paper Innovative technology for big data analytics The HP Vertica Analytics Platform database provides price/performance, scalability, availability, and ease of administration Table of
More informationBeyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.
Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has
More informationCapitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes
Capitalize on Big Data for Competitive Advantage with Bedrock TM, an integrated Management Platform for Hadoop Data Lakes Highly competitive enterprises are increasingly finding ways to maximize and accelerate
More informationData Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep. Neil Raden Hired Brains Research, LLC
Data Catalogs for Hadoop Achieving Shared Knowledge and Re-usable Data Prep Neil Raden Hired Brains Research, LLC Traditionally, the job of gathering and integrating data for analytics fell on data warehouses.
More informationPractical Considerations for Real-Time Business Intelligence. Donovan Schneider Yahoo! September 11, 2006
Practical Considerations for Real-Time Business Intelligence Donovan Schneider Yahoo! September 11, 2006 Outline Business Intelligence (BI) Background Real-Time Business Intelligence Examples Two Requirements
More informationOracle Real Time Decisions
A Product Review James Taylor CEO CONTENTS Introducing Decision Management Systems Oracle Real Time Decisions Product Architecture Key Features Availability Conclusion Oracle Real Time Decisions (RTD)
More informationAugmented Search for Software Testing
Augmented Search for Software Testing For Testers, Developers, and QA Managers New frontier in big log data analysis and application intelligence Business white paper May 2015 During software testing cycles,
More informationSQL Server 2005 Features Comparison
Page 1 of 10 Quick Links Home Worldwide Search Microsoft.com for: Go : Home Product Information How to Buy Editions Learning Downloads Support Partners Technologies Solutions Community Previous Versions
More informationBig Data and Advanced Analytics Technologies for the Smart Grid
1 Big Data and Advanced Analytics Technologies for the Smart Grid Arnie de Castro, PhD SAS Institute IEEE PES 2014 General Meeting July 27-31, 2014 Panel Session: Using Smart Grid Data to Improve Planning,
More informationSQL Server 2012 Parallel Data Warehouse. Solution Brief
SQL Server 2012 Parallel Data Warehouse Solution Brief Published February 22, 2013 Contents Introduction... 1 Microsoft Platform: Windows Server and SQL Server... 2 SQL Server 2012 Parallel Data Warehouse...
More informationPARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA
PARC and SAP Co-innovation: High-performance Graph Analytics for Big Data Powered by SAP HANA Harnessing the combined power of SAP HANA and PARC s HiperGraph graph analytics technology for real-time insights
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationBig Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies
Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies 1 Copyright 2011, Oracle and/or its affiliates. All rights Big Data, Advanced Analytics:
More informationBig Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island
Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm
More informationSQL Server 2012 Business Intelligence Boot Camp
SQL Server 2012 Business Intelligence Boot Camp Length: 5 Days Technology: Microsoft SQL Server 2012 Delivery Method: Instructor-led (classroom) About this Course Data warehousing is a solution organizations
More information