Chapter 2: Data Analytics Life Cycle

Size: px
Start display at page:

Download "Chapter 2: Data Analytics Life Cycle"

Transcription

1 Section 1. Data Analytics Lifecycle Overview The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle has six phases, and project work can occur in several phases at once. For most phases in the lifecycle, the movement can be either forward or backward. This iterative depiction of the lifecycle is intended to more closely portray a real project, in which aspects of the project move forward and may return to earlier stages as new information is uncovered and team members learn more about various stages of the project. This enables participants to move iteratively through the process and drive toward operationalizing the project work. The following are the key roles for a successful analytics project: Business User: Someone who understands the domain area and usually benefits from the results. This person can consult and advise the project team on the context of the project, the value of the results, and how the outputs will be operationalized. Usually a business analyst, line manager, or deep subject matter expert in the project domain fulfills this role. Project Sponsor: Responsible for the genesis of the project. Provides the impetus and requirements for the project and defines the core business problem. Generally provides the funding and gauges the degree of value from the final outputs of the working team. This person sets the priorities for the project and clarifies the desired outputs. Project Manager: Ensures that key milestones and objectives are met on time and at the expected quality. Database Administrator (DBA): Provisions and configures the database environment to support the analytics needs of the working team. These responsibilities may include providing access to key databases or tables and ensuring the appropriate security levels are in place related to the data repositories. Data Engineer: Leverages deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion into the analytic sandbox, which DATA ANALYTICS LIFECYCLE was discussed in Chapter 1, "Introduction to Big Data Analytics." Whereas the DBA sets up and configures the databases to be used, the data engineer executes the actual data extractions and performs substantial data manipulation to facilitate the analytics. The data engineer works closely with the data scientist to help shape data in the right ways for analyses. 1

2 Data Scientist: Provides subject matter expertise for analytical techniques, data modeling, and applying valid analytical techniques to given business problems. Ensures overall analytics objectives are met. Designs and executes analytical methods and approaches with the data available to the project. Although most of these roles are not new, the last two roles-data engineer and data scientist. Section 2. Background and Overview of Data Analytics Lifecycle The Data Analytics Lifecycle defines analytics process best practices spanning discovery to project completion. The lifecycle draws from established methods in the realm of data analytics and decision science. This synthesis was developed after gathering input from data scientists and consulting established approaches that provided input on pieces of the process. Several of the processes that were consulted include these: Scientific method [3], in use for centuries, still provides a solid framework for thinking about and deconstructing problems into their principal parts. One of the most valuable ideas of the scientific method relates to forming hypotheses and finding ways to test ideas. CRISP-OM [4] provides useful input on ways to frame analytics problems and is a popular approach for data mining. Tom Davenport's DELTA framework [5]: The DELTA framework offers an approach for data analytics projects, including the context of the organization's skills, datasets, and leadership engagement. Doug Hubbard's Applied Information Economics (AlE) approach [6]: AlE provides a framework for measuring intangibles and provides guidance on developing decision models, calibrating expert estimates, and deriving the expected value of information. "MAD Skills" by Cohen et al. [7] offers input for several of the techniques mentioned in Phases 2-4 that focus on model planning, execution, and key findings. Section 3. illustrates the overview of the Data Analyics Life Cycle Phase 1- Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data. 2

3 Phase 2- Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data (Section 2.3.4). DATA ANALYTICS LIFECYCLE Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models. Phase 4-Model building: Phase 4, the team develops data sets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and work flows (for example, fast hardware and parallel processing, if applicable). Phase 5-Communicate results: Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders. Phase 6-0perationalize: Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment. Key Concepts to keep In mind: Developing Initial Hypotheses Developing a set of IHs is a key facet of the discovery phase. This step involves forming ideas that the team can test with data. Generally, it is best to come up with a few primary hypotheses to test and then be creative about developing several more. 3

4 These IHs form the basis of the analytical tests the team will use in later phases and serve as the foundation for the findings in Phase 5. Hypothesis testing from a statistical perspective is covered in greater detail in Chapter 3, "Review of Basic Data Analytic Methods Using R." Preparing the Analytic Sandbox 2.3 Phase 2: Data Preparation Do I have enough good quality data to start building the model? The first subphase of data preparation requires the team to obtain an analytic sandbox (also commonly referred to as a workspace), in which the team ca n explore the data without interfering with live production databases. Consider an example in which the team needs to work with a company's financial data. Section 4: The following Common Tools are use for the Model Planning Phase R [14] has a complete set of modeling capabilities and provides a good environment for building interpretive models with high-quality code.ln addition, it has the ability to interface with databases via an ODBC connection and execute statistical tests and analyses against Big Data via an open source connection. These two factors make R well suited to performing statistical tests and analytics on Big Data. As of this writing, R contains nearly 5,000 packages for data analysis and graphical representation. New packages are posted frequently, and many companies are providing value-add DATA ANALVTICS LIFECVCLE services for R (such as training, instruction, and best practices), as well as packaging it in ways to make it easier to use and more robust. This phenomenon is similar to what happened with Linux in the late 1980s and early 1990s, when companies appeared to package and make Linux easier for companies to consume and deploy. UseR with fi le extracts for offline analysis and optimal performance, and use RODBC connections for dynamic queries and faster development. SQL Analysis services [1 5] can perform in-database analytics of common data mining functions, involved aggregations, and basic predictive models. SAS/ACCESS [16] provides integration between SAS and the analytics sandbox via multiple data connectors such as OBDC, JOB(, and OLE DB. SAS itself is generally used on file extract s, but with SAS/ACCESS, users can connect to relational databases (such as Oracle or Teradata) and data warehouse appliances (such as Green plum or Aster), files, and enterprise applications (such as SAP and Salesforce.com). 2.5 Phase 4: Model Building 4

5 The following are more Common Tools for the Data Preparation Phase Several tools are commonly used for this phase: Hadoop [10] can perform massively parallel ingest and custom analysis for web traffic parsing, GPS location analytics, genomic analysis, and combining of massive unstructured data feeds from multiple sources. Alpine Miner [11] provides a graphical user interface (GUI) for creating analytic work flows, including data manipulations and a series of analytic events such as staged data-mining techniques (for example, first select the top 100 customers, and then run descriptive statistics and clustering) on Postgres SQL and other Big Data sources. Open Refine (formerly called Google Refine) [12] is "a free, open source, powerful tool for working with messy data." It is a popular GUI-based tool for performing data transformations, and it's one of the most robust free tools currently available. Similar to Open Refine, Data Wrangler [13] is an interactive tool for data cleaning and transformation. Wrangler was developed at Stanford University and can be used to perform many transformations on a given dataset. In addition, data transformation outputs can be put into Java or Python. The advantage of this feature is that a subset of the data can be manipulated in Wrangler via its GUI, and then the same operations can be written out as Java or Python code to be executed against the full, larger dataset offline in a local analytic sandbox. Section 4: The following are a list of the key outputs for each of the main stakeholders of an analytics project and what they usually expect at the conclusion of a project. Business User typically tries to determine the benefits and implications of the findings to the business. Project Sponsor typically asks questions related to the business impact of the project, the risks and return on investment (ROI), and the way the project can be evangelized within the organization (and beyond) Project Manager needs to determine if the project was completed on time and within budget and how well the goals were met. Business Intelligence Analyst needs to know if the reports and dashboards he manages will be impacted and need to change. Data Engineer and Database Administrator (DBA) typical ly need to share their code from the analytics project and create a technical document on how to implement it. 5

6 Data Scientist needs to share the code and explain the model to her peers, managers, and other stakeholders. Although these seven roles represent many interests within a project, these interests usually overlap, and most of them can be met with four main deliverables. Chapter Summay: This chapter described the Data Analytics Lifecycle, which is an approach to managing and executing analytical projects. This approach describes the process in six phases. 1. Discovery 2. Data preparation 3. Model planning 4. Model building 5. Commun icate results 6. Operationalize The above steps are used by Data Scirnce teams to identify problems and perform rigorous investigation of the datasets needed for in-depth analysis. Big Dat Pilot Program Requirements and Planning Define and articulate Big Data Solution requirements, scoping and planning Solution Outline Create a conceptual high-level view of the solutions by defining the components of the solutions and the scope Macro/Micro Design Top-level logical design for the big data solution including the data integration system, data repositories, the analytics system and the access system. Deliver the final, detailed set of blue prints for the building the big data solution Build Construct the big data solution/environments including the data integration system, the data repositories, the analytics system and the access system Deploy Implement the big data solutions in the productions environment and deliver to the user community. Training and knowledge tranfer 6

7 Full Scale In Big Data an area that is still in its infancy full-scale experimentation is essential to test, evaluate and validate new ideas. But for these experiments to be meaningful, they need to be carried out at full scale, with the tools, the amounts and types of data that are specific to Big Data. However for most organizations, especially smaller ones, it s difficult to meet these conditions. Exercises; 1. In which phase would the team expect to invst most of the project time? Why? Where would the team expect to spend the least time? A team would invst most of the project time in the Phase 1: Discovery because of the amount of time to examine the following steps: o Learning the business Domain o Learning the resources o Framing the problem o Identifying Key stakeholders o Interviewing the Analytics Sponson o Developing Initial hypotheses o Identifing Potential Data Sources The Phase 5 Communicate would be the lease time because of the presentation and conferences as well as promoted through social media and blogs. 2. Whar are the benefits of doing a pilot program before a full-scale rollout of a new analytical method-ology? Dicuss this in the context of the mini case study. A pilot project can refer to a project prior to a full scale rollout of new algorithms or functionality. This pilot can be a project with a more limited scope and rollout to the line of business, products, or services affected by new methods The teram s ability to quanity the benefits and share them in a compelling way with the Stakeholders will determine if the work will move forward into a pilot project and ultimately be run in a production environment. Therefore it is critical to identify the benefits and state them in a clear eay in the final presentation. 7

8 3. What kinds of tools would be used in the following phases; and for which kinds of use scenarios? a. Phase 2; Data preparation Phase 2 which includes thye steps to explore. Preprocess, and condition data prior to modeling and analysis, In this phase the team needs to create a robust environment to explore the data that is sepate from th production environment. This is done by preparing an analytic sandbox. The following tools are known to be used for Phase 2 of Data Preparation Hadoop [10] can perform massively parallel ingest and custom analysis for web traffic parsing, GPS location analytics, genomic analysis, and combining of massive unstructured data feeds from multiple sources. Alpine Miner [11] provides a graphical user interface (GUI) for creating analytic work flows, including data manipulations and a series of analytic events such as staged data-mining techniques (for example, first select the top 100 customers, and then run descriptive statistics and clustering) on Postgres SQL and other Big Data sources. Open Refine (formerly called Google Refine) [12] is "a free, open source, powerful tool for working with messy data." It is a popular GUI-based tool for performing data transformations, and it's one of the most robust free tools currently available. Similar to Open Refine, Data Wrangler [13] is an interactive tool for data cleaning and transformation. Wrangler was developed at Stanford University and can be used to perform many transformations on a given dataset. In addition, data transformation outputs can be put into Java or Python. The advantage of this feature is that a subset of the data can be manipulated in Wrangler via its GUI, and then the same operations can be written out as Java or Python code to be executed against the full, larger dataset offline in a local analytic sandbox. Section 4: 8

9 B. Phase 4; Model building Common Tools for the Model Building Phase: There are many toolsavai lable to assist in this phase, focused primarily on statistical analysis ordata mining software. Common tools in this space include, but are not limited to, the following: Commercial Tools: SAS Enterprise Miner (17) allows users to run predictive and descriptive models based on large volumes ofdata from across the enterprise. It interoperates with otherlarge data stores, has many partnerships, and is built for enterpri se-level computing and analytics. SPSSModeler [18) (provided by IBM and now called IBM SPSS Modeler) offers methods to explore and analyze data through a GUI. Matlab [19) provides a high-level language for performing a variety of data analytics, algorithms, and data exploration. Alpine Miner [11) provides a GUI front end for usersto develop analytic workfiows and interact with Big Data tools and platforms on the back end. STATISTI CA [20) and Mathemati ca [21) are also popular and well-regarded data mining and analytics tools. Free or Open Source tools: Rand PL/R [14) Rwas described earlier in the model planning phase, and PL!R is aprocedural language for PostgreSQL with R. Using this approach means that Rcommands can be executed in database. This technique provides higher performance and is more scalable than running Rin memory. Octave [22), afree software programming language for computational modeling, has some of the functionality ofmatlab. Because it is freely available, Octave is used in major universities when teaching machine learning. 9

10 WEKA [23) is a free data mining softwarepackage with an analytic workbench. The functions created in WEKA can be executed within Java code. Pyth on is aprogramming language that provides toolkits for machine learning and analysis, such as scikit-learn, numpy, scipy, pandas, and related data visualization using matplotlib. SQL in-database implementations, such as MADl ib [241. provide an alterative to in memory desktop analytical tools. MADiib provides an open-source machine learning library ofalgorithms that can be executed in-database, for PostgreSQL or Greenplum. End of Chapter 2 10

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues

More information

In-Database Analytics

In-Database Analytics Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing

More information

Big Data Specialized Studies

Big Data Specialized Studies Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate

More information

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010

Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Decision Support Optimization through Predictive Analytics - Leuven Statistical Day 2010 Ernst van Waning Senior Sales Engineer May 28, 2010 Agenda SPSS, an IBM Company SPSS Statistics User-driven product

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Data Science Certificate Program

Data Science Certificate Program Information Technologies Programs Data Science Certificate Program Accelerate Your Career extension.uci.edu/datascience Offered in partnership with University of California, Irvine Extension s professional

More information

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation

9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation SAS Platform Overview Second Edition 9.4 Intelligence SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS 9.4 Intelligence Platform: Overview,

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Unlocking the True Value of Hadoop with Open Data Science

Unlocking the True Value of Hadoop with Open Data Science Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Extend your analytic capabilities with SAP Predictive Analysis

Extend your analytic capabilities with SAP Predictive Analysis September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics

More information

SAS and Teradata Partnership

SAS and Teradata Partnership SAS and Teradata Partnership Ed Swain Senior Industry Consultant Energy & Resources Ed.Swain@teradata.com 1 Innovation and Leadership Teradata SAS Magic Quadrant for Data Warehouse Database Management

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

CERULIUM TERADATA COURSE CATALOG

CERULIUM TERADATA COURSE CATALOG CERULIUM TERADATA COURSE CATALOG Cerulium Corporation has provided quality Teradata education and consulting expertise for over seven years. We offer customized solutions to maximize your warehouse. Prepared

More information

IBM BigInsights for Apache Hadoop

IBM BigInsights for Apache Hadoop IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Conquering Big Data Analytics with SAS, Teradata and Hadoop

Conquering Big Data Analytics with SAS, Teradata and Hadoop Paper BI15-2014 Conquering Big Data Analytics with SAS, Teradata and Hadoop John Cunningham, Teradata Corporation, Danville, California Tho Nguyen, Teradata Corporation, Raleigh, North Carolina Paul Segal,

More information

An interdisciplinary model for analytics education

An interdisciplinary model for analytics education An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram

More information

Bringing the Power of SAS to Hadoop. White Paper

Bringing the Power of SAS to Hadoop. White Paper White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What

More information

What's New in SAS Data Management

What's New in SAS Data Management Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases

More information

Big Data and the Data Lake. February 2015

Big Data and the Data Lake. February 2015 Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act

More information

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata

BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING

More information

Big Data and Healthcare Payers WHITE PAPER

Big Data and Healthcare Payers WHITE PAPER Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other

More information

Senior Business Intelligence/Engineering Analyst

Senior Business Intelligence/Engineering Analyst We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We

More information

Discovering Business Insights in Big Data Using SQL-MapReduce

Discovering Business Insights in Big Data Using SQL-MapReduce Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013

More information

Big Data Executive Survey

Big Data Executive Survey Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the

More information

OpenChorus: Building a Tool-Chest for Big Data Science

OpenChorus: Building a Tool-Chest for Big Data Science OpenChorus: Building a Tool-Chest for Big Data Science Milind Bhandarkar Chief Scientist, Machine Learning Platforms EMC Greenplum 1 Agenda! Tools for Data Science! Data Science Workflow! Greenplum OpenChorus!

More information

Internet of Things Data Analytics - Part 1

Internet of Things Data Analytics - Part 1 Internet of Things Data Analytics - Part 1 Introduction to Data Analytics Aveek Dutta Assistant Professor Electrical Engineering and Computer Science University of Kansas e-mail: aveekd@ku.edu http://www.ittc.ku.edu/~aveekd

More information

Advanced In-Database Analytics

Advanced In-Database Analytics Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??

More information

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved. Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,

More information

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances

High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve

More information

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R Table of Contents 3 Predictive Analytics with TIBCO Spotfire 4 TIBCO Spotfire Statistics Services 8 TIBCO Enterprise Runtime

More information

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R

Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R PREDICTIVE ANALYTICS WITH TIBCO SPOTFIRE TIBCO Spotfire is the premier data discovery and analytics platform, which provides

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Making Sense of the Madness

Making Sense of the Madness Making Sense of the Madness Deploying Big Data techniques to deal with real world Bigish Data issues Copyright James Mitchell 2014 1 Introduction Warning! Parental Guidance Recommended Please read the

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Harnessing the power of advanced analytics with IBM Netezza

Harnessing the power of advanced analytics with IBM Netezza IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Hexaware E-book on Predictive Analytics

Hexaware E-book on Predictive Analytics Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive

More information

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers 60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative

More information

INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH GREENPLUM DATABASE - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT)

INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH GREENPLUM DATABASE - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT) White Paper INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT) Abstract This paper presents interoperability of SAP Business Objects 4.0 with Greenplum.

More information

Three Open Blueprints For Big Data Success

Three Open Blueprints For Big Data Success White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

College of Engineering, Technology, and Computer Science

College of Engineering, Technology, and Computer Science College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology

More information

POLAR IT SERVICES. Business Intelligence Project Methodology

POLAR IT SERVICES. Business Intelligence Project Methodology POLAR IT SERVICES Business Intelligence Project Methodology Table of Contents 1. Overview... 2 2. Visualize... 3 3. Planning and Architecture... 4 3.1 Define Requirements... 4 3.1.1 Define Attributes...

More information

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data

From Raw Data to. Actionable Insights with. MATLAB Analytics. Learn more. Develop predictive models. 1Access and explore data 100 001 010 111 From Raw Data to 10011100 Actionable Insights with 00100111 MATLAB Analytics 01011100 11100001 1 Access and Explore Data For scientists the problem is not a lack of available but a deluge.

More information

Manage the Analytical Life Cycle for Continuous Innovation

Manage the Analytical Life Cycle for Continuous Innovation Manage the Analytical Life Cycle for Continuous Innovation From Data to Decision WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 The Complexity of Managing the Analytical Life Cycle....

More information

Data warehouse and Business Intelligence Collateral

Data warehouse and Business Intelligence Collateral Data warehouse and Business Intelligence Collateral Page 1 of 12 DATA WAREHOUSE AND BUSINESS INTELLIGENCE COLLATERAL Brains for the corporate brawn: In the current scenario of the business world, the competition

More information

Hadoop & SAS Data Loader for Hadoop

Hadoop & SAS Data Loader for Hadoop Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle

More information

Cray: Enabling Real-Time Discovery in Big Data

Cray: Enabling Real-Time Discovery in Big Data Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics

Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics Paper 1828-2014 Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics John Cunningham, Teradata Corporation, Danville, CA ABSTRACT SAS High Performance Analytics (HPA) is a

More information

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney

Getting Started with Oracle Data Miner 11g R2. Brendan Tierney Getting Started with Oracle Data Miner 11g R2 Brendan Tierney Scene Setting This is not about DB log mining This is an introduction to ODM And how ODM can be included in OBIEE (next presentation) Domain

More information

CHAPTER-29 Data Mining, System Products and Research Prototypes

CHAPTER-29 Data Mining, System Products and Research Prototypes CHAPTER-29 Data Mining, System Products and Research Prototypes 29.1 How to Choose a Data Mining System 29.2 Data, mining functions and methodologies: 29.3 Coupling data mining with database anti/or data

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX

UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their

More information

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.

Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users. Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major

More information

IBM Netezza 1000. High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum

IBM Netezza 1000. High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum IBM Netezza 1000 High-performance business intelligence and advanced analytics for the enterprise Our approach to data analysis is patented and proven. Minimize data movement, while processing it at physics

More information

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics

WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics WHITE PAPER Harnessing the Power of Advanced How an appliance approach simplifies the use of advanced analytics Introduction The Netezza TwinFin i-class advanced analytics appliance pushes the limits of

More information

Introducing Oracle Exalytics In-Memory Machine

Introducing Oracle Exalytics In-Memory Machine Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Beyond Traditional Management Reporting. 2013 IBM Corporation

Beyond Traditional Management Reporting. 2013 IBM Corporation Beyond Traditional Management Reporting 1 Agenda From Reporting to Business Analytics Expanding your capabilities set Workspace Authoring Statistical Analysis Predictive Modeling What-if analysis and planning

More information

Big Data 101: Harvest Real Value & Avoid Hollow Hype

Big Data 101: Harvest Real Value & Avoid Hollow Hype Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

Custom Development Methodology Appendix

Custom Development Methodology Appendix 1 Overview Custom Development Methodology Appendix Blackboard s custom software development methodology incorporates standard software development lifecycles in a way that allows for rapid development

More information

IBM Netezza High Capacity Appliance

IBM Netezza High Capacity Appliance IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

IBM SPSS Modeler Professional

IBM SPSS Modeler Professional IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model

More information

PAGE 1 l Teradata Magazine l Q1/2011 l 2011 Teradata Corporation l AR-6309

PAGE 1 l Teradata Magazine l Q1/2011 l 2011 Teradata Corporation l AR-6309 PAGE 1 l Teradata Magazine l Q1/2011 l 2011 Teradata Corporation l AR-6309 It s going mainstream, and it s your next opportunity. by Merv Adrian Enterprises have never had more data, and it s no surprise

More information

Big Data and Its Impact on the Data Warehousing Architecture

Big Data and Its Impact on the Data Warehousing Architecture Big Data and Its Impact on the Data Warehousing Architecture Sponsored by SAP Speaker: Wayne Eckerson, Director of Research, TechTarget Wayne Eckerson: Hi my name is Wayne Eckerson, I am Director of Research

More information

EMC ACCELERATES JOURNEY TO BIG DATA WITH BUSINESS ANALYTICS-AS-A-SERVICE

EMC ACCELERATES JOURNEY TO BIG DATA WITH BUSINESS ANALYTICS-AS-A-SERVICE EMC ACCELERATES JOURNEY TO BIG DATA WITH BUSINESS ANALYTICS-AS-A-SERVICE An account of EMC IT s transformation to empower business and IT users with streamlined access to Big Data Analytics ABSTRACT This

More information

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata

Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling

More information

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS! The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader

More information

Sisense. Product Highlights. www.sisense.com

Sisense. Product Highlights. www.sisense.com Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze

More information

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8

Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8 Enterprise Solutions Data Warehouse & Business Intelligence Chapter-8 Learning Objectives Concepts of Data Warehouse Business Intelligence, Analytics & Big Data Tools for DWH & BI Concepts of Data Warehouse

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission

White Paper: SAS and Apache Hadoop For Government. Inside: Unlocking Higher Value From Business Analytics to Further the Mission White Paper: SAS and Apache Hadoop For Government Unlocking Higher Value From Business Analytics to Further the Mission Inside: Using SAS and Hadoop Together Design Considerations for Your SAS and Hadoop

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

EMC/Greenplum Driving the Future of Data Warehousing and Analytics

EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC/Greenplum Driving the Future of Data Warehousing and Analytics EMC 2010 Forum Series 1 Greenplum Becomes the Foundation of EMC s Data Computing Division E M C A CQ U I R E S G R E E N P L U M Greenplum,

More information

WHAT S NEW IN SAS 9.4

WHAT S NEW IN SAS 9.4 WHAT S NEW IN SAS 9.4 PLATFORM, HPA & SAS GRID COMPUTING MICHAEL GODDARD CHIEF ARCHITECT SAS INSTITUTE, NEW ZEALAND SAS 9.4 WHAT S NEW IN THE PLATFORM Platform update SAS Grid Computing update Hadoop support

More information

SAS IT Intelligence for VMware Infrastructure: Resource Optimization and Cost Recovery Frank Lieble, SAS Institute Inc.

SAS IT Intelligence for VMware Infrastructure: Resource Optimization and Cost Recovery Frank Lieble, SAS Institute Inc. Paper 346-2009 SAS IT Intelligence for VMware Infrastructure: Resource Optimization and Cost Recovery Frank Lieble, SAS Institute Inc. ABSTRACT SAS and VMware have collaborated on an offering that leverages

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

How to Enhance Traditional BI Architecture to Leverage Big Data

How to Enhance Traditional BI Architecture to Leverage Big Data B I G D ATA How to Enhance Traditional BI Architecture to Leverage Big Data Contents Executive Summary... 1 Traditional BI - DataStack 2.0 Architecture... 2 Benefits of Traditional BI - DataStack 2.0...

More information

Copyright 2012 EMC Corporation. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved. 1 Greenplum UAP Enabling Big Data Analytics Brendon Moran Data Scientist 2 Agenda Background On Greenplum And Big Data Analytics Greenplum UAP Greenplum: Not Just Infrastructure Pivotal Labs Customers

More information

Oracle RAC Services Appendix

Oracle RAC Services Appendix 1 Overview Oracle RAC Services Appendix As usage of the Blackboard Academic Suite grows and the system reaches a mission critical level, customers must evaluate the overall effectiveness, stability and

More information

IBM Big Data in Government

IBM Big Data in Government IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an

More information

Hadoop in the Hybrid Cloud

Hadoop in the Hybrid Cloud Presented by Hortonworks and Microsoft Introduction An increasing number of enterprises are either currently using or are planning to use cloud deployment models to expand their IT infrastructure. Big

More information

HIGH PERFORMANCE ANALYTICS FOR TERADATA

HIGH PERFORMANCE ANALYTICS FOR TERADATA F HIGH PERFORMANCE ANALYTICS FOR TERADATA F F BORN AND BRED IN FINANCIAL SERVICES AND HEALTHCARE. DECADES OF EXPERIENCE IN PARALLEL PROGRAMMING AND ANALYTICS. FOCUSED ON MAKING DATA SCIENCE HIGHLY PERFORMING

More information

Net Developer Role Description Responsibilities Qualifications

Net Developer Role Description Responsibilities Qualifications Net Developer We are seeking a skilled ASP.NET/VB.NET developer with a background in building scalable, predictable, high-quality and high-performance web applications on the Microsoft technology stack.

More information

A discussion of information integration solutions November 2005. Deploying a Center of Excellence for data integration.

A discussion of information integration solutions November 2005. Deploying a Center of Excellence for data integration. A discussion of information integration solutions November 2005 Deploying a Center of Excellence for data integration. Page 1 Contents Summary This paper describes: 1 Summary 1 Introduction 2 Mastering

More information

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS

A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize

More information