How To Write A Data Analysis Project
|
|
|
- Christiana Parker
- 5 years ago
- Views:
Transcription
1 Section 1. Data Analytics Lifecycle Overview The Data Analytics Lifecycle is designed specifically for Big Data problems and data science projects. The lifecycle has six phases, and project work can occur in several phases at once. For most phases in the lifecycle, the movement can be either forward or backward. This iterative depiction of the lifecycle is intended to more closely portray a real project, in which aspects of the project move forward and may return to earlier stages as new information is uncovered and team members learn more about various stages of the project. This enables participants to move iteratively through the process and drive toward operationalizing the project work. The following are the key roles for a successful analytics project: Business User: Someone who understands the domain area and usually benefits from the results. This person can consult and advise the project team on the context of the project, the value of the results, and how the outputs will be operationalized. Usually a business analyst, line manager, or deep subject matter expert in the project domain fulfills this role. Project Sponsor: Responsible for the genesis of the project. Provides the impetus and requirements for the project and defines the core business problem. Generally provides the funding and gauges the degree of value from the final outputs of the working team. This person sets the priorities for the project and clarifies the desired outputs. Project Manager: Ensures that key milestones and objectives are met on time and at the expected quality. Database Administrator (DBA): Provisions and configures the database environment to support the analytics needs of the working team. These responsibilities may include providing access to key databases or tables and ensuring the appropriate security levels are in place related to the data repositories. Data Engineer: Leverages deep technical skills to assist with tuning SQL queries for data management and data extraction, and provides support for data ingestion into the analytic sandbox, which DATA ANALYTICS LIFECYCLE was discussed in Chapter 1, "Introduction to Big Data Analytics." Whereas the DBA sets up and configures the databases to be used, the data engineer executes the actual data extractions and performs substantial data manipulation to facilitate the analytics. The data engineer works closely with the data scientist to help shape data in the right ways for analyses. 1
2 Data Scientist: Provides subject matter expertise for analytical techniques, data modeling, and applying valid analytical techniques to given business problems. Ensures overall analytics objectives are met. Designs and executes analytical methods and approaches with the data available to the project. Although most of these roles are not new, the last two roles-data engineer and data scientist. Section 2. Background and Overview of Data Analytics Lifecycle The Data Analytics Lifecycle defines analytics process best practices spanning discovery to project completion. The lifecycle draws from established methods in the realm of data analytics and decision science. This synthesis was developed after gathering input from data scientists and consulting established approaches that provided input on pieces of the process. Several of the processes that were consulted include these: Scientific method [3], in use for centuries, still provides a solid framework for thinking about and deconstructing problems into their principal parts. One of the most valuable ideas of the scientific method relates to forming hypotheses and finding ways to test ideas. CRISP-OM [4] provides useful input on ways to frame analytics problems and is a popular approach for data mining. Tom Davenport's DELTA framework [5]: The DELTA framework offers an approach for data analytics projects, including the context of the organization's skills, datasets, and leadership engagement. Doug Hubbard's Applied Information Economics (AlE) approach [6]: AlE provides a framework for measuring intangibles and provides guidance on developing decision models, calibrating expert estimates, and deriving the expected value of information. "MAD Skills" by Cohen et al. [7] offers input for several of the techniques mentioned in Phases 2-4 that focus on model planning, execution, and key findings. Section 3. illustrates the overview of the Data Analyics Life Cycle Phase 1- Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data. Important activities in this phase include framing the business problem as an analytics challenge that can be addressed in subsequent phases and formulating initial hypotheses (IHs) to test and begin learning the data. 2
3 Phase 2- Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT. Data should be transformed in the ETLT process so the team can work with it and analyze it. In this phase, the team also needs to familiarize itself with the data thoroughly and take steps to condition the data (Section 2.3.4). DATA ANALYTICS LIFECYCLE Phase 3-Model planning: Phase 3 is model planning, where the team determines the methods, techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models. Phase 4-Model building: Phase 4, the team develops data sets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase. The team also considers whether its existing tools will suffice for running the models, or if it will need a more robust environment for executing models and work flows (for example, fast hardware and parallel processing, if applicable). Phase 5-Communicate results: Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1. The team should identify key findings, quantify the business value, and develop a narrative to summarize and convey findings to stakeholders. Phase 6-0perationalize: Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment. Key Concepts to keep In mind: Developing Initial Hypotheses Developing a set of IHs is a key facet of the discovery phase. This step involves forming ideas that the team can test with data. Generally, it is best to come up with a few primary hypotheses to test and then be creative about developing several more. 3
4 These IHs form the basis of the analytical tests the team will use in later phases and serve as the foundation for the findings in Phase 5. Hypothesis testing from a statistical perspective is covered in greater detail in Chapter 3, "Review of Basic Data Analytic Methods Using R." Preparing the Analytic Sandbox 2.3 Phase 2: Data Preparation Do I have enough good quality data to start building the model? The first subphase of data preparation requires the team to obtain an analytic sandbox (also commonly referred to as a workspace), in which the team ca n explore the data without interfering with live production databases. Consider an example in which the team needs to work with a company's financial data. Section 4: The following Common Tools are use for the Model Planning Phase R [14] has a complete set of modeling capabilities and provides a good environment for building interpretive models with high-quality code.ln addition, it has the ability to interface with databases via an ODBC connection and execute statistical tests and analyses against Big Data via an open source connection. These two factors make R well suited to performing statistical tests and analytics on Big Data. As of this writing, R contains nearly 5,000 packages for data analysis and graphical representation. New packages are posted frequently, and many companies are providing value-add DATA ANALVTICS LIFECVCLE services for R (such as training, instruction, and best practices), as well as packaging it in ways to make it easier to use and more robust. This phenomenon is similar to what happened with Linux in the late 1980s and early 1990s, when companies appeared to package and make Linux easier for companies to consume and deploy. UseR with fi le extracts for offline analysis and optimal performance, and use RODBC connections for dynamic queries and faster development. SQL Analysis services [1 5] can perform in-database analytics of common data mining functions, involved aggregations, and basic predictive models. SAS/ACCESS [16] provides integration between SAS and the analytics sandbox via multiple data connectors such as OBDC, JOB(, and OLE DB. SAS itself is generally used on file extract s, but with SAS/ACCESS, users can connect to relational databases (such as Oracle or Teradata) and data warehouse appliances (such as Green plum or Aster), files, and enterprise applications (such as SAP and Salesforce.com). 2.5 Phase 4: Model Building 4
5 The following are more Common Tools for the Data Preparation Phase Several tools are commonly used for this phase: Hadoop [10] can perform massively parallel ingest and custom analysis for web traffic parsing, GPS location analytics, genomic analysis, and combining of massive unstructured data feeds from multiple sources. Alpine Miner [11] provides a graphical user interface (GUI) for creating analytic work flows, including data manipulations and a series of analytic events such as staged data-mining techniques (for example, first select the top 100 customers, and then run descriptive statistics and clustering) on Postgres SQL and other Big Data sources. Open Refine (formerly called Google Refine) [12] is "a free, open source, powerful tool for working with messy data." It is a popular GUI-based tool for performing data transformations, and it's one of the most robust free tools currently available. Similar to Open Refine, Data Wrangler [13] is an interactive tool for data cleaning and transformation. Wrangler was developed at Stanford University and can be used to perform many transformations on a given dataset. In addition, data transformation outputs can be put into Java or Python. The advantage of this feature is that a subset of the data can be manipulated in Wrangler via its GUI, and then the same operations can be written out as Java or Python code to be executed against the full, larger dataset offline in a local analytic sandbox. Section 4: The following are a list of the key outputs for each of the main stakeholders of an analytics project and what they usually expect at the conclusion of a project. Business User typically tries to determine the benefits and implications of the findings to the business. Project Sponsor typically asks questions related to the business impact of the project, the risks and return on investment (ROI), and the way the project can be evangelized within the organization (and beyond) Project Manager needs to determine if the project was completed on time and within budget and how well the goals were met. Business Intelligence Analyst needs to know if the reports and dashboards he manages will be impacted and need to change. Data Engineer and Database Administrator (DBA) typical ly need to share their code from the analytics project and create a technical document on how to implement it. 5
6 Data Scientist needs to share the code and explain the model to her peers, managers, and other stakeholders. Although these seven roles represent many interests within a project, these interests usually overlap, and most of them can be met with four main deliverables. Chapter Summay: This chapter described the Data Analytics Lifecycle, which is an approach to managing and executing analytical projects. This approach describes the process in six phases. 1. Discovery 2. Data preparation 3. Model planning 4. Model building 5. Commun icate results 6. Operationalize The above steps are used by Data Scirnce teams to identify problems and perform rigorous investigation of the datasets needed for in-depth analysis. Big Dat Pilot Program Requirements and Planning Define and articulate Big Data Solution requirements, scoping and planning Solution Outline Create a conceptual high-level view of the solutions by defining the components of the solutions and the scope Macro/Micro Design Top-level logical design for the big data solution including the data integration system, data repositories, the analytics system and the access system. Deliver the final, detailed set of blue prints for the building the big data solution Build Construct the big data solution/environments including the data integration system, the data repositories, the analytics system and the access system Deploy Implement the big data solutions in the productions environment and deliver to the user community. Training and knowledge tranfer 6
7 Full Scale In Big Data an area that is still in its infancy full-scale experimentation is essential to test, evaluate and validate new ideas. But for these experiments to be meaningful, they need to be carried out at full scale, with the tools, the amounts and types of data that are specific to Big Data. However for most organizations, especially smaller ones, it s difficult to meet these conditions. Exercises; 1. In which phase would the team expect to invst most of the project time? Why? Where would the team expect to spend the least time? A team would invst most of the project time in the Phase 1: Discovery because of the amount of time to examine the following steps: o Learning the business Domain o Learning the resources o Framing the problem o Identifying Key stakeholders o Interviewing the Analytics Sponson o Developing Initial hypotheses o Identifing Potential Data Sources The Phase 5 Communicate would be the lease time because of the presentation and conferences as well as promoted through social media and blogs. 2. Whar are the benefits of doing a pilot program before a full-scale rollout of a new analytical method-ology? Dicuss this in the context of the mini case study. A pilot project can refer to a project prior to a full scale rollout of new algorithms or functionality. This pilot can be a project with a more limited scope and rollout to the line of business, products, or services affected by new methods The teram s ability to quanity the benefits and share them in a compelling way with the Stakeholders will determine if the work will move forward into a pilot project and ultimately be run in a production environment. Therefore it is critical to identify the benefits and state them in a clear eay in the final presentation. 7
8 3. What kinds of tools would be used in the following phases; and for which kinds of use scenarios? a. Phase 2; Data preparation Phase 2 which includes thye steps to explore. Preprocess, and condition data prior to modeling and analysis, In this phase the team needs to create a robust environment to explore the data that is sepate from th production environment. This is done by preparing an analytic sandbox. The following tools are known to be used for Phase 2 of Data Preparation Hadoop [10] can perform massively parallel ingest and custom analysis for web traffic parsing, GPS location analytics, genomic analysis, and combining of massive unstructured data feeds from multiple sources. Alpine Miner [11] provides a graphical user interface (GUI) for creating analytic work flows, including data manipulations and a series of analytic events such as staged data-mining techniques (for example, first select the top 100 customers, and then run descriptive statistics and clustering) on Postgres SQL and other Big Data sources. Open Refine (formerly called Google Refine) [12] is "a free, open source, powerful tool for working with messy data." It is a popular GUI-based tool for performing data transformations, and it's one of the most robust free tools currently available. Similar to Open Refine, Data Wrangler [13] is an interactive tool for data cleaning and transformation. Wrangler was developed at Stanford University and can be used to perform many transformations on a given dataset. In addition, data transformation outputs can be put into Java or Python. The advantage of this feature is that a subset of the data can be manipulated in Wrangler via its GUI, and then the same operations can be written out as Java or Python code to be executed against the full, larger dataset offline in a local analytic sandbox. Section 4: 8
9 B. Phase 4; Model building Common Tools for the Model Building Phase: There are many toolsavai lable to assist in this phase, focused primarily on statistical analysis ordata mining software. Common tools in this space include, but are not limited to, the following: Commercial Tools: SAS Enterprise Miner (17) allows users to run predictive and descriptive models based on large volumes ofdata from across the enterprise. It interoperates with otherlarge data stores, has many partnerships, and is built for enterpri se-level computing and analytics. SPSSModeler [18) (provided by IBM and now called IBM SPSS Modeler) offers methods to explore and analyze data through a GUI. Matlab [19) provides a high-level language for performing a variety of data analytics, algorithms, and data exploration. Alpine Miner [11) provides a GUI front end for usersto develop analytic workfiows and interact with Big Data tools and platforms on the back end. STATISTI CA [20) and Mathemati ca [21) are also popular and well-regarded data mining and analytics tools. Free or Open Source tools: Rand PL/R [14) Rwas described earlier in the model planning phase, and PL!R is aprocedural language for PostgreSQL with R. Using this approach means that Rcommands can be executed in database. This technique provides higher performance and is more scalable than running Rin memory. Octave [22), afree software programming language for computational modeling, has some of the functionality ofmatlab. Because it is freely available, Octave is used in major universities when teaching machine learning. 9
10 WEKA [23) is a free data mining softwarepackage with an analytic workbench. The functions created in WEKA can be executed within Java code. Pyth on is aprogramming language that provides toolkits for machine learning and analysis, such as scikit-learn, numpy, scipy, pandas, and related data visualization using matplotlib. SQL in-database implementations, such as MADl ib [241. provide an alterative to in memory desktop analytical tools. MADiib provides an open-source machine learning library ofalgorithms that can be executed in-database, for PostgreSQL or Greenplum. End of Chapter 2 10
EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data
EMC Greenplum Driving the Future of Data Warehousing and Analytics Tools and Technologies for Big Data Steven Hillion V.P. Analytics EMC Data Computing Division 1 Big Data Size: The Volume Of Data Continues
In-Database Analytics
Embedding Analytics in Decision Management Systems In-database analytics offer a powerful tool for embedding advanced analytics in a critical component of IT infrastructure. James Taylor CEO CONTENTS Introducing
Data Science Certificate Program
Information Technologies Programs Data Science Certificate Program Accelerate Your Career extension.uci.edu/datascience Offered in partnership with University of California, Irvine Extension s professional
Unlocking the True Value of Hadoop with Open Data Science
Unlocking the True Value of Hadoop with Open Data Science Kristopher Overholt Solution Architect Big Data Tech 2016 MinneAnalytics June 7, 2016 Overview Overview of Open Data Science Python and the Big
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically
Databricks. A Primer
Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful
Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
Introduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
Extend your analytic capabilities with SAP Predictive Analysis
September 9 11, 2013 Anaheim, California Extend your analytic capabilities with SAP Predictive Analysis Charles Gadalla Learning Points Advanced analytics strategy at SAP Simplifying predictive analytics
Bringing the Power of SAS to Hadoop. White Paper
White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What
Integrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
How To Learn To Use Big Data
Information Technologies Programs Big Data Specialized Studies Accelerate Your Career extension.uci.edu/bigdata Offered in partnership with University of California, Irvine Extension s professional certificate
9.4 Intelligence. SAS Platform. Overview Second Edition. SAS Documentation
SAS Platform Overview Second Edition 9.4 Intelligence SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016. SAS 9.4 Intelligence Platform: Overview,
W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
CERULIUM TERADATA COURSE CATALOG
CERULIUM TERADATA COURSE CATALOG Cerulium Corporation has provided quality Teradata education and consulting expertise for over seven years. We offer customized solutions to maximize your warehouse. Prepared
SAS and Teradata Partnership
SAS and Teradata Partnership Ed Swain Senior Industry Consultant Energy & Resources [email protected] 1 Innovation and Leadership Teradata SAS Magic Quadrant for Data Warehouse Database Management
IBM BigInsights for Apache Hadoop
IBM BigInsights for Apache Hadoop Efficiently manage and mine big data for valuable insights Highlights: Enterprise-ready Apache Hadoop based platform for data processing, warehousing and analytics Advanced
Big Data and the Data Lake. February 2015
Big Data and the Data Lake February 2015 My Vision: Our Mission Data Intelligence is a broad term that describes the real, meaningful insights that can be extracted from your data truths that you can act
What's New in SAS Data Management
Paper SAS034-2014 What's New in SAS Data Management Nancy Rausch, SAS Institute Inc., Cary, NC; Mike Frost, SAS Institute Inc., Cary, NC, Mike Ames, SAS Institute Inc., Cary ABSTRACT The latest releases
Big Data and Healthcare Payers WHITE PAPER
Knowledgent White Paper Series Big Data and Healthcare Payers WHITE PAPER Summary With the implementation of the Affordable Care Act, the transition to a more member-centric relationship model, and other
Harnessing the power of advanced analytics with IBM Netezza
IBM Software Information Management White Paper Harnessing the power of advanced analytics with IBM Netezza How an appliance approach simplifies the use of advanced analytics Harnessing the power of advanced
Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.
Mike Maxey Senior Director Product Marketing Greenplum A Division of EMC 1 Greenplum Becomes the Foundation of EMC s Big Data Analytics (July 2010) E M C A C Q U I R E S G R E E N P L U M For three years,
BIG DATA: FROM HYPE TO REALITY. Leandro Ruiz Presales Partner for C&LA Teradata
BIG DATA: FROM HYPE TO REALITY Leandro Ruiz Presales Partner for C&LA Teradata Evolution in The Use of Information Action s ACTIVATING MAKE it happen! Insights OPERATIONALIZING WHAT IS happening now? PREDICTING
An interdisciplinary model for analytics education
An interdisciplinary model for analytics education Raffaella Settimi, PhD School of Computing, DePaul University Drew Conway s Data Science Venn Diagram http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
Senior Business Intelligence/Engineering Analyst
We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We
Big Data Executive Survey
Big Data Executive Full Questionnaire Big Date Executive Full Questionnaire Appendix B Questionnaire Welcome The survey has been designed to provide a benchmark for enterprises seeking to understand the
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
OpenChorus: Building a Tool-Chest for Big Data Science
OpenChorus: Building a Tool-Chest for Big Data Science Milind Bhandarkar Chief Scientist, Machine Learning Platforms EMC Greenplum 1 Agenda! Tools for Data Science! Data Science Workflow! Greenplum OpenChorus!
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
Advanced In-Database Analytics
Advanced In-Database Analytics Tallinn, Sept. 25th, 2012 Mikko-Pekka Bertling, BDM Greenplum EMEA 1 That sounds complicated? 2 Who can tell me how best to solve this 3 What are the main mathematical functions??
Big Data 101: Harvest Real Value & Avoid Hollow Hype
Big Data 101: Harvest Real Value & Avoid Hollow Hype 2 Executive Summary Odds are you are hearing the growing hype around the potential for big data to revolutionize our ability to assimilate and act on
Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
Discovering Business Insights in Big Data Using SQL-MapReduce
Discovering Business Insights in Big Data Using SQL-MapReduce A Technical Whitepaper Rick F. van der Lans Independent Business Intelligence Analyst R20/Consultancy July 2013 Sponsored by Copyright 2013
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland
P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland IBM Center of Excellence for Data Science, Cognitive
Making Sense of the Madness
Making Sense of the Madness Deploying Big Data techniques to deal with real world Bigish Data issues Copyright James Mitchell 2014 1 Introduction Warning! Parental Guidance Recommended Please read the
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances
High-Performance Business Analytics: SAS and IBM Netezza Data Warehouse Appliances Highlights IBM Netezza and SAS together provide appliances and analytic software solutions that help organizations improve
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics
Paper 1828-2014 Integrated Big Data: Hadoop + DBMS + Discovery for SAS High Performance Analytics John Cunningham, Teradata Corporation, Danville, CA ABSTRACT SAS High Performance Analytics (HPA) is a
whitepaper Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R Table of Contents 3 Predictive Analytics with TIBCO Spotfire 4 TIBCO Spotfire Statistics Services 8 TIBCO Enterprise Runtime
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R
Predictive Analytics with TIBCO Spotfire and TIBCO Enterprise Runtime for R PREDICTIVE ANALYTICS WITH TIBCO SPOTFIRE TIBCO Spotfire is the premier data discovery and analytics platform, which provides
Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers
60 Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative
Information Architecture
The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to
Three Open Blueprints For Big Data Success
White Paper: Three Open Blueprints For Big Data Success Featuring Pentaho s Open Data Integration Platform Inside: Leverage open framework and open source Kickstart your efforts with repeatable blueprints
DATA SCIENCE CURRICULUM WEEK 1 ONLINE PRE-WORK INSTALLING PACKAGES COMMAND LINE CODE EDITOR PYTHON STATISTICS PROJECT O5 PROJECT O3 PROJECT O2
DATA SCIENCE CURRICULUM Before class even begins, students start an at-home pre-work phase. When they convene in class, students spend the first eight weeks doing iterative, project-centered skill acquisition.
Introducing Oracle Exalytics In-Memory Machine
Introducing Oracle Exalytics In-Memory Machine Jon Ainsworth Director of Business Development Oracle EMEA Business Analytics 1 Copyright 2011, Oracle and/or its affiliates. All rights Agenda Topics Oracle
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX
UNLEASHING THE VALUE OF THE TERADATA UNIFIED DATA ARCHITECTURE WITH ALTERYX 1 Successful companies know that analytics are key to winning customer loyalty, optimizing business processes and beating their
Hadoop & SAS Data Loader for Hadoop
Turning Data into Value Hadoop & SAS Data Loader for Hadoop Sebastiaan Schaap Frederik Vandenberghe Agenda What s Hadoop SAS Data management: Traditional In-Database In-Memory The Hadoop analytics lifecycle
Manage the Analytical Life Cycle for Continuous Innovation
Manage the Analytical Life Cycle for Continuous Innovation From Data to Decision WHITE PAPER SAS White Paper Table of Contents Introduction.... 1 The Complexity of Managing the Analytical Life Cycle....
IBM Netezza High Capacity Appliance
IBM Netezza High Capacity Appliance Petascale Data Archival, Analysis and Disaster Recovery Solutions IBM Netezza High Capacity Appliance Highlights: Allows querying and analysis of deep archival data
Enterprise Solutions. Data Warehouse & Business Intelligence Chapter-8
Enterprise Solutions Data Warehouse & Business Intelligence Chapter-8 Learning Objectives Concepts of Data Warehouse Business Intelligence, Analytics & Big Data Tools for DWH & BI Concepts of Data Warehouse
Hexaware E-book on Predictive Analytics
Hexaware E-book on Predictive Analytics Business Intelligence & Analytics Actionable Intelligence Enabled Published on : Feb 7, 2012 Hexaware E-book on Predictive Analytics What is Data mining? Data mining,
Cisco Data Preparation
Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and
WHITE PAPER. Harnessing the Power of Advanced Analytics How an appliance approach simplifies the use of advanced analytics
WHITE PAPER Harnessing the Power of Advanced How an appliance approach simplifies the use of advanced analytics Introduction The Netezza TwinFin i-class advanced analytics appliance pushes the limits of
Data Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
IBM Big Data in Government
IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group [email protected] The Big Paradigm Shift 2 Big Creates A Challenge And an
POLAR IT SERVICES. Business Intelligence Project Methodology
POLAR IT SERVICES Business Intelligence Project Methodology Table of Contents 1. Overview... 2 2. Visualize... 3 3. Planning and Architecture... 4 3.1 Define Requirements... 4 3.1.1 Define Attributes...
Solve your toughest challenges with data mining
IBM Software IBM SPSS Modeler Solve your toughest challenges with data mining Use predictive intelligence to make good decisions faster Solve your toughest challenges with data mining Imagine if you could
BIG DATA. Value 8/14/2014 WHAT IS BIG DATA? THE 5 V'S OF BIG DATA WHAT IS BIG DATA?
WHAT IS BIG DATA? BIG DATA DR. KLARA NELSON THE UNIVERSITY OF TAMPA "Volumes of data that are unusually large, or types of data that are unstructured" Thomas Davenport, Keeping Up with the Quants, 2013,
An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics
An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS
A GENERAL TAXONOMY FOR VISUALIZATION OF PREDICTIVE SOCIAL MEDIA ANALYTICS Stacey Franklin Jones, D.Sc. ProTech Global Solutions Annapolis, MD Abstract The use of Social Media as a resource to characterize
HIGH PERFORMANCE ANALYTICS FOR TERADATA
F HIGH PERFORMANCE ANALYTICS FOR TERADATA F F BORN AND BRED IN FINANCIAL SERVICES AND HEALTHCARE. DECADES OF EXPERIENCE IN PARALLEL PROGRAMMING AND ANALYTICS. FOCUSED ON MAKING DATA SCIENCE HIGHLY PERFORMING
Big Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
This Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
Introduction to Data Mining
Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association
Beyond Traditional Management Reporting. 2013 IBM Corporation
Beyond Traditional Management Reporting 1 Agenda From Reporting to Business Analytics Expanding your capabilities set Workspace Authoring Statistical Analysis Predictive Modeling What-if analysis and planning
IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!
The Bloor Group IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS VENDOR PROFILE The IBM Big Data Landscape IBM can legitimately claim to have been involved in Big Data and to have a much broader
IBM InfoSphere BigInsights Enterprise Edition
IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade
Data Analysis with Various Oracle Business Intelligence and Analytic Tools
Data Analysis with Various Oracle Business Intelligence and Analytic Tools Session ID: 108680 Prepared by: Tim and Dan Vlamis Vlamis Software Solutions www.vlamis.com @TimVlamis Agenda What we will talk
Native Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
Cray: Enabling Real-Time Discovery in Big Data
Cray: Enabling Real-Time Discovery in Big Data Discovery is the process of gaining valuable insights into the world around us by recognizing previously unknown relationships between occurrences, objects
IBM SPSS Modeler Professional
IBM SPSS Modeler Professional Make better decisions through predictive intelligence Highlights Create more effective strategies by evaluating trends and likely outcomes. Easily access, prepare and model
Green Migration from Oracle
Green Migration from Oracle Greenplum Migration Approach Strong Experiences on Oracle Migration Automate all tasks DDL Migration Data Migration PL-SQL and SQL Scripts Migration Data Quality Tests ETL and
Some vendors have a big presence in a particular industry; some are geared toward data scientists, others toward business users.
Bonus Chapter Ten Major Predictive Analytics Vendors In This Chapter Angoss FICO IBM RapidMiner Revolution Analytics Salford Systems SAP SAS StatSoft, Inc. TIBCO This chapter highlights ten of the major
Next Generation Data Warehousing Appliances 23.10.2014
Next Generation Data Warehousing Appliances 23.10.2014 Presentert av: Espen Jorde, Executive Advisor Bjørn Runar Nes, CTO/Chief Architect Bjørn Runar Nes Espen Jorde 2 3.12.2014 Agenda Affecto s new Data
Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate
Data Science and Business Analytics Certificate Data Science and Business Intelligence Certificate Description The Helzberg School of Management has launched two graduate-level certificates: one in Data
SAS. Predictive Analytics Suite. Overview. Derive useful insights to make evidence-based decisions. Challenges SOLUTION OVERVIEW
SOLUTION OVERVIEW SAS Predictive Analytics Suite Derive useful insights to make evidence-based decisions Overview Turning increasingly large amounts of data into useful insights and finding how to better
Net Developer Role Description Responsibilities Qualifications
Net Developer We are seeking a skilled ASP.NET/VB.NET developer with a background in building scalable, predictable, high-quality and high-performance web applications on the Microsoft technology stack.
Making big data simple with Databricks
Making big data simple with Databricks We are Databricks, the company behind Spark Founded by the creators of Apache Spark in 2013 Data 75% Share of Spark code contributed by Databricks in 2014 Value Created
IBM Netezza 1000. High-performance business intelligence and advanced analytics for the enterprise. The analytics conundrum
IBM Netezza 1000 High-performance business intelligence and advanced analytics for the enterprise Our approach to data analysis is patented and proven. Minimize data movement, while processing it at physics
Up Your R Game. James Taylor, Decision Management Solutions Bill Franks, Teradata
Up Your R Game James Taylor, Decision Management Solutions Bill Franks, Teradata Today s Speakers James Taylor Bill Franks CEO Chief Analytics Officer Decision Management Solutions Teradata 7/28/14 3 Polling
College of Engineering, Technology, and Computer Science
College of Engineering, Technology, and Computer Science Design and Implementation of Cloud-based Data Warehousing In partial fulfillment of the requirements for the Degree of Master of Science in Technology
INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH GREENPLUM DATABASE - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT)
White Paper INTEROPERABILITY OF SAP BUSINESS OBJECTS 4.0 WITH - AN INTEGRATION GUIDE FOR WINDOWS USERS (64 BIT) Abstract This paper presents interoperability of SAP Business Objects 4.0 with Greenplum.
Sisense. Product Highlights. www.sisense.com
Sisense Product Highlights Introduction Sisense is a business intelligence solution that simplifies analytics for complex data by offering an end-to-end platform that lets users easily prepare and analyze
How to Optimize Your Data Mining Environment
WHITEPAPER How to Optimize Your Data Mining Environment For Better Business Intelligence Data mining is the process of applying business intelligence software tools to business data in order to create
NASCIO EA Development Tool-Kit Solution Architecture. Version 3.0
NASCIO EA Development Tool-Kit Solution Architecture Version 3.0 October 2004 TABLE OF CONTENTS SOLUTION ARCHITECTURE...1 Introduction...1 Benefits...3 Link to Implementation Planning...4 Definitions...5
Data warehouse and Business Intelligence Collateral
Data warehouse and Business Intelligence Collateral Page 1 of 12 DATA WAREHOUSE AND BUSINESS INTELLIGENCE COLLATERAL Brains for the corporate brawn: In the current scenario of the business world, the competition
Getting Started with Oracle Data Miner 11g R2. Brendan Tierney
Getting Started with Oracle Data Miner 11g R2 Brendan Tierney Scene Setting This is not about DB log mining This is an introduction to ODM And how ODM can be included in OBIEE (next presentation) Domain
UNIFY YOUR (BIG) DATA
UNIFY YOUR (BIG) DATA ANALYTIC STRATEGY GIVE ANY USER ANY ANALYTIC ON ANY DATA Scott Gnau President, Teradata Labs [email protected] t Unify Your (Big) Data Analytic Strategy Technology excitement:
KnowledgeSTUDIO HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES
HIGH-PERFORMANCE PREDICTIVE ANALYTICS USING ADVANCED MODELING TECHNIQUES Translating data into business value requires the right data mining and modeling techniques which uncover important patterns within
Big Data Integration: A Buyer's Guide
SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology
