Analytics Best Practices: The Analytical Hub

Similar documents
Business Intelligence represents a fundamental shift in the purpose, objective and use of information

Data Abstraction Best Practices with Cisco Data Virtualization

Table of Contents. This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Integrate Marketing Automation, Lead Management and CRM

Data Mining & Advanced Analytics

Case Study. Sonata develops. comprehensive BI Application for a leading provider of Animal Nutrition Solutions. Ananthakrishnan

The Importance Advanced Data Collection System Maintenance. Berry Drijsen Global Service Business Manager. knowledge to shape your future

Getting Started Guide

Feature Guide. Virto Commerce Platform

2008 BA Insurance Systems Pty Ltd

Licensing Windows Server 2012 for use with virtualization technologies

Data Warehouse Scope Recommendations

UC4 AUTOMATED VIRTUALIZATION Intelligent Service Automation for Physical and Virtual Environments

Licensing Windows Server 2012 R2 for use with virtualization technologies

How Does Cloud Computing Work?

To transform information into knowledge- a firm must expend additional resources to discover, patterns, rules, and context where the knowledge works

WHITE PAPER. Vendor Managed Inventory (VMI) is Not Just for A Items

Information Services Hosting Arrangements

Why Sage CRM? Robert Kramer Managing Consultant, BKD Technologies Sean Mohan President, Strategic Sales Systems

Professional Leaders/Specialists

Business Intelligence and DataWarehouse workshop

Integrating With incontact dbprovider & Screen Pops

Interworks Cloud Platform Citrix CPSM Integration Specification

Zimbra Professional Services Portfolio, Purchasing Guide & Price List

Dec Transportation Management System. An Alternative Traffic Solution for the Logistics Professionals

Build the cloud OpenStack Installation & Configuration Integration with existing tools and processes Cloud Migration

The Importance of Market Research

Solution. Industry. Challenges. Client Case Study. Legacy Systems too Costly to Maintain. Supply Chain Advantage. Delivered.

Getting Started Guide

Big Advantages Of Small Adadvantage

Online Learning Portal best practices guide

Mobilizing Healthcare Staff with Cloud Services

Process Automation With VMware

ALM in the Cloud an Overview of Oracle Developer Cloud Service. Introduction. By Dana Singleterry

Implementing an electronic document and records management system using SharePoint 7

SERVICES BEST PRACTICES

WHITEPAPER Reference Architectures for Portal-based Rich Internet Applications

Systems Support - Extended

Job Profile Data & Reporting Analyst (Grant Fund)

Basics of Supply Chain Management

Gartner Magic Quadrant Salesforce Automation 2009

Google Adwords Pay Per Click Checklist

Licensing the Core Client Access License (CAL) Suite and Enterprise CAL Suite

Best Practices for Optimizing Performance and Availability in Virtual Infrastructures

Succession Planning & Leadership Development: Your Utility s Bridge to the Future

Port Manager. Microsoft Dynamics CRM for Ports

Mobile Workforce. Improving Productivity, Improving Profitability

QAD Operations BI Metrics Demonstration Guide. May 2015 BI 3.11

White Paper for Mobile Workforce Management and Monitoring Copyright 2014 by Patrol-IT Inc.

NC3A SOA Techwatch Day Call for Presentations

The actions discussed below in this Appendix assume that the firm has already taken three foundation steps:

Basic concept of Cloud computing

Case Study Law Firm Profit and Growth LBMS Transforms a Major Law Firm s Market Expansion & Increased Profitability Vision into Reality

Information Technology Policy

Configuring, Monitoring and Deploying a Private Cloud with System Center 2012 Boot Camp

366 Degrees Gaining Extra Degrees of Success

Systems Load Testing Appendix

Migrating to SharePoint 2010 Don t Upgrade Your Mess

Cloud Services Frequently Asked Questions FAQ

Research Report. Abstract: The Emerging Intersection Between Big Data and Security Analytics. November 2012

Introduction to Mindjet MindManager Server

Network Security Trends in the Era of Cloud and Mobile Computing

Privacy Policy. The Central Equity Group understands how highly people value the protection of their privacy.

Seattle Police Department

In addition to assisting with the disaster planning process, it is hoped this document will also::

SYSTEM MONITORING PLUG-IN FOR MICROSOFT SQL SERVER

Micrsft Business Intelligence - Tablets of a Computer Search

An Oracle White Paper January Oracle WebLogic Server on Oracle Database Appliance

Implementing SQL Manage Quick Guide

QBT - Making business travel simple

UNIVERSITY OF CALIFORNIA MERCED PERFORMANCE MANAGEMENT GUIDELINES

IN-HOUSE OR OUTSOURCED BILLING

Talking Bout. a Revolution 100% 110% 120% 90% 80% 70% 130% 140%

The Cost Benefits of the Cloud are More About Real Estate Than IT

An Oracle White Paper January Comprehensive Data Quality with Oracle Data Integrator and Oracle Enterprise Data Quality

Team Process Data Warehouse Goals and High-Level Requirements

OCR LEVEL 2 CAMBRIDGE TECHNICAL

Knowledge Base Article

Agenda. o Purpose of IT Assessment o Scope of IT Assessment o Deloitte Recommendations o IBM Discussions o Research Data Center o Open Season

Delivering Business Value Through IT Cost Transparency Using IT CMF

Research Report. Abstract: Security Management and Operations: Changes on the Horizon. July 2012

Process Improvement Center of Excellence Service Proposal Recommendation. Operational Oversight Committee Report Submission

OFFICIAL JOB SPECIFICATION. Network Services Analyst. Network Services Team Manager

Installation Guide Marshal Reporting Console

What is Software Risk Management? (And why should I care?)

THOMSON REUTERS C-TRACK CASE MANAGEMENT SYSTEM SOFTWARE AS A SERVICE SERVICE DEFINITION FOR G-CLOUD 6

Retirement Planning Options Annuities

WHITE PAPER SIP Solutions, Determining What is Right for You. By Peter Bernstein, Senior Editor TMCnet.com

The Cost of Not Nurturing Leads

Best Practices on Monitoring Hotel Review Sites By Max Starkov and Mariana Mechoso Safer

The ADVANTAGE of Cloud Based Computing:

Performance Test Modeling with ANALYTICS

WEB APPLICATION SECURITY TESTING

TOWARDS OF AN INFORMATION SERVICE TO EDUCATIONAL LEADERSHIPS: BUSINESS INTELLIGENCE AS ANALYTICAL ENGINE OF SERVICE

BEST PRACTICES IN DELIVERING SUPERIOR CUSTOMER INTERACTIONS

The AccuSpeechMobile solution is a fully mobile voice-enabling software solution, that noninvasively. existing mobile enterprise wide applications.

This guide is intended for administrators, who want to install, configure, and manage SAP Lumira, server for BI Platform

Gateway Agent - First Amendment to the High Level Design Document

GIS Service Provider. GIS Service Management

Transcription:

W H I T E P A P E R Analytics Best Practices: The Analytical Hub Spnsred by: Cmpsite Sftware www.cmpsitesw.cm Rick Sherman Athena IT Slutins

TABLE OF CONTENTS INTRODUCTION... 2 SECTION 1: BUSINESS NEED... 2 SECTION 2: DEFINITION... 3 SECTION 3: ARCHITECTURE DESIGN PRINCIPLES... 4 SECTION 4: ARCHITECTURE OPTIONS... 5 Analytics Business Analytics and Advanced Analytics... 5 Analytical Hub Platfrm... 6 Predictive Mdeling... 7 Data Access and Integratin... 7 SECTION 5: ADVICE... 9 INTRODUCTION The whitepaper A Better Way t Fuel Analytical Needs discussed the key inhibitrs t implementing analytics and enabling self-service business intelligence (BI). It made fur key recmmendatins fr vercming the barriers t pervasive and self-service BI: 1. Establish an verall data-integratin prtfli 2. Add data virtualizatin t the data integratin prtfli 3. Differentiate analytical discvery frm recurring business analysis 4. Create self-service data envirnments fr self-service BI In the furth recmmendatin, tw architectural framewrks, analytical sandbxes and analytical hubs, were mentined as the fundatin t create self-service data envirnments fr self-service BI. The purpse f this paper is t fcus n the specific business needs and technlgy slutins fr implementing analytical hubs. SECTION 1: BUSINESS NEED Enterprises are flded with a deluge f data abut their custmers, prspects, business prcesses, suppliers, partners and cmpetitrs. It cmes frm traditinal internal systems, clud applicatins, scial netwrking and mbile cmmunicatins. With the fld f new data cmes the pprtunity fr business peple t perfrm new types f analysis t gain greater insight int their business and custmers. Enterprises have been expanding their traditinal BI ftprint t prvide much mre cmprehensive and timely reprting fr their business. These investments are valuable, but they are limited t analyzing hw a business has perfrmed histrically. Lking beynd histrical data, there s a significant business pprtunity in analyzing what the future may hld, e.g., predictive mdeling, r examining custmer behavir frm surces utside the enterprise, e.g., scial media. This shift t frward-lking analytics dictates changes bth fr the business and IT. Traditinally, IT received detailed data requirements, used ETL tls t extract data and lad it int a data warehuse (DW), and then prvided business peple with read-nly access t that data. It s a lng prcess t lng fr peple wrking with advanced analytics (we call them data scientists). They typically d nt knw all the data they need until they start mdeling, and need great flexibility fr prcessing data. IT needs t change t a supprting rle with an analytical hub, and relinquish cntrl t the data scientists. IT needs t understand that data scientists are much mre data savvy than traditinal BI users, and can be trusted with data. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 2

SECTION 2: DEFINITION The gal f an analytical hub is t allw the analytical elite, such as data scientists, t perfrm advanced analytics and predictive mdeling in a timely, scalable and cmprehensive manner. They need it fr develping predictive mdels r practive analysis that will be used in business prcesses and fr decisin-making. Befre analytical hubs, the advanced analytical elite resrted t building their wn makeshift hubs, typically in a reactive manner. T ften, these makeshift hubs were severely resurce-cnstrained, s data scientists wasted their time n infrastructure rather than analytics. The intent f the analytical hub is t prvide the dedicated strage, tls and prcessing resurces t establish a fundatin fr recurring discvery needs. The key cmpnents f an analytical hub (Figure 1: Analytical Hub - Functinal Layers) include: Business analytics - cntains the self-service BI tls used fr discvery and situatinal analysis Advanced analytics - cntains analytical tls used fr statistical analysis, predictive mdeling, data mining and data visualizatin Analytical hub platfrm - prvides the prcessing, strage and netwrking capabilities Predictive mdeling analytical servers. e.g., statistical databases and predictive mdeling engines Data access and delivery - enables accessing and/r integrating a variety f data surces and types Data surces surced frm within and utside the enterprise, it can be big data (unstructured) and transactinal data (structured); e.g., extracts, feeds, messages, spreadsheets and dcuments Figure 1: Analytical Hub - Functinal Layers Data cmmnly cmes frm an enterprise data warehuse and varius business applicatins, hwever that is rarely sufficient fr advanced analytics. Syndicated data feeds, data publically n the web, Big Data surces, unstructured data and spreadsheets are typically used t supplement traditinal enterprise BI data surces. The analytical hub prvide data scientists with the ability t gather and, either physically r virtually, integrate data frm these diverse data surces. Cntrary t the traditinal IT managed data envirnment, data scientists need the flexibility t gather data regardless f, and smetime in spite f its quality in rder t perfrm analysis. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 3

SECTION 3: ARCHITECTURE DESIGN PRINCIPLES When creating analytical hubs, fllw these design principles t prvide the right enterprise envirnment: Data frm everywhere needs t be accessible and integrated in a timely fashin Expanding beynd traditinal internal BI surces is necessary as data scientists examine such areas as the behavir f a cmpany s custmers and prspects; exchange data with partners, suppliers and gvernments; gather machine data; acquire attitudinal survey data; and examine ecnmetric data. Unlike internal systems that IT can use t manage data quality, many f these new data surces are incmplete and incnsistent frcing data scientists t leverage the analytical hub t clean the data r synthesize it fr analysis. Advanced analytics has been inhibited by the difficulty in accessing data and by the length f time it takes fr traditinal IT appraches t physically integrate it. The analytical hub needs t enable data scientists t get the data they need in a timely fashin, either physical integrating it r accessing virtually-integrated data. Data virtualizatin speeds time-t-analysis and avids the prductivity and errr-prne trap f physically integrating data. Building slutins must be fast, iterative and repeatable Tday s cmpetitive business envirnment and fluctuating ecnmy are putting the pressure n businesses t make fast, smart decisins. Predictive mdeling and advanced analytics enable thse decisins t be infrmed. Data scientists need t get data and create tentative mdels fast, change variables and data t refine the mdels, and d it all ver again as behavir, attitudes, prducts, cmpetitin and the ecnmy change. The analytical hub needs t be architected t ensure that slutins can be built t be fast, iterative and repeatable. The advanced analytics elite needs run the shw IT has traditinally managed the data and applicatin envirnments. In this custdial rle, IT has cntrlled access and has gne thrugh a rigrus prcess t ensure that data is managed and integrated as an enterprise asset. The enterprise, and IT, needs t entrust data scientists with the respnsibility t understand and apprpriately use data f varying quality in creating their analytical slutins. Data is ften imperfect, but data scientists are the business s trusted advisrs wh have the knwledge required t be the decisin-makers. Slutins mdels must be integrated back int business prcesses When predictive mdels are built, they ften need t be integrated int business prcesses t enable mre infrmed decisin-making. After the data scientists build the mdels, there is a hand-ff t IT t perfrm the necessary integratin and supprt their nging peratin. Sufficient infrastructure must be available fr cnducting advanced analytics This infrastructure must be scalable and expandable as the data vlumes, integratin needs and analytical cmplexities naturally increase. Insufficient infrastructure has histrically limited the depth, breadth and timeliness f advanced analytics as data scientists ften used makeshift envirnments. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 4

SECTION 4: ARCHITECTURE OPTIONS See Figure 2: Analytical Hub - Architecture fr the verall analytical hub with its cmpnents: business analytics, advanced analytics, hub platfrm, predictive analytics and data access t a variety f data surces. Architectural ptins are utlined fr each layer using the design principles abve. Figure 2: Analytical Hub - Architecture Analytics Business Analytics and Advanced Analytics The gal f the business analytics layer is t prvide the analytical tls t supprt self-service BI. The technlgy selected in this layer needs t supprt data scientists wh are cnducting their wn analytics rather than relying n IT. This layer is ften used in the initial steps f the analysis t determine data availability. The gal f the advanced analytics layer is t prvide the frnt-end advanced analytical tls fr data scientists; the cmpanin back-end r server-based technlgy is in the predictive analytics layer. This layer is used t perfrm the advanced analytics prcesses and t develp predictive mdels. Sme imprtant cnsideratins when yu are designing the analytical hub include: Multiple BI analytical styles Data scientists use different analytical styles depending n the type f analysis they are perfrming, the data vlume and the data variety. Business analytical styles include: data discvery, On-Line Analytical Prcessing (OLAP), ad-hc, dashbards, screcards and reprting. Advanced analytics styles include: predictive analytics, statistical mdeling, data visualizatin and Big Data analytics. It is Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 5

imprtant t prvide the many analytical styles that are needed by data scientists and enable them t use these tls in a self-service mde. Multiple BI delivery and access platfrms The analytical hub needs t prvide access frm and delivery t analytics perfrmed n the desktp, in the clud, n mbile devices (tablets and smartphnes), and Micrsft Office applicatins. This enables data scientists t perfrm their analysis n the mst apprpriate platfrm fr their needs. In cntrast t business analysis that is typically read-nly, advanced analytical tls will be perfrming read and write peratins n data hsted in the hub platfrm, particularly when develping predictive r statistical mdels. Data scientists need these write peratins t be self-service, which requires write access t databases typically nt prvided t business peple. Analytical Hub Platfrm The analytical hub platfrm is the back-end r server-based develpment envirnment fr data scientists. Whereas the analytical sandbx supprts ad-hc analysis exclusively, the analytical hub supprts ad-hc analysis alng with the recurring creatin and refinement f mdels. The hub develpment envirnment needs t supprt the fllwing data management functins: Gathering data Staging area fr extracts Mdel input data Data integratin Physically integrating data Virtually integrating data Mdel management The predictive analytics layer is depicted lgically as a separate layer in Figure 2: Analytical Hub - Architecture, hwever, it s necessary t examine imprtant architectural alternatives t determine if that layer shuld physically be hsted n the hub platfrm. There are many architectural chices fr hsting prcessing and strage capabilities. Architectural ptins exist fr analytical prcessing, in-memry business analytics and database: ANALYTICAL PROCESSING BI appliances vs. traditinal distributed servers Analytical hubs typically start n traditinal distributed servers that IT manages and supprts. Enterprises ften deply in this type f envirnment because it meets initial data and prcessing needs, and because f their experience with these platfrms. Depending n the analytical sphisticatin and data vlumes, a BI platfrm dedicated t deplying analytical hubs may be the nly platfrm capable f meeting these needs. Many f the advances in servers, strage, database, BI and data-integratin prcessing have been used in the design f the BI appliances. There is a wide variatin in the underlying architectures, and an enterprise needs t evaluate what best fits their need and budget. On-premise vs. clud infrastructure Anther architectural cnsideratin is whether all the cmpnents f an analytical hub shuld be n the traditinal n-premise platfrm, r if sme r all can be mved nt the clud. Histrically, the clud ptins have been limited, but that has dramatically changed. Often, clud cmpnents are seen as a cst- and resurce-effective slutin that speeds up time-t-slutin. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 6

IN-MEMORY (OR IN-DATABASE) BUSINESS ANALYTICS A significant advancement that has enabled mre in-depth and speedier analytics has been leveraging the advances in memry n the devices n which BI and predictive analytics are perfrmed, and n the BI appliance if it is part f the architecture. In-memry analytics architectural ptins include inmemry analytics in the BI tls, as part f the database r n the BI appliance platfrm. DATABASE OPTIONS The traditinal database deplyment ptin fr BI slutins has been relatinal databases, but there are mre ptins available based n advances in technlgy and increased data variety. Optins include: Relatinal vs. clumnar vs. thers Structured vs. unstructured (particularly Big Data) Hybrid mix f abve Predictive Mdeling The purpse f the predictive mdeling layer is t prvide the analytical engines, such as statistical databases and predictive analytics, fr data scientists t develp the frward-lking mdels, such as predicting custmer behavir r fraud detectin, fr the enterprise. This layer may need t supprt a cmbinatin f the fllwing analytical methds: Statistical mdeling Predictive mdeling Frecasting Data mining Descriptive mdeling Ecnmetrics Operatins research Optimizatin Simulatin Textual analytics IT retains its traditinal rle f managing these varius analytical engines (if n-premise versus a hsted clud service), hwever, unlike ther enterprise applicatins, the data scientist wns and manages its cntent. This change in rles is crucial t success, particularly t time-t-slutin. There are varius architectural alternatives fr this layer. Each analytical engine, if mre than ne is used, will have its wn infrastructure requirements, ptentially making fr a cmplex envirnment. Besides being its wn physical envirnment, predictive mdeling may be a lgical layer that is incrprated int the analytical hub platfrm r deplyed as a set f services in the clud. Data Access and Integratin Business peple typically perfrm data access and integratin by accessing an applicatin (sils) directly, using a data warehuse, r with a cmbinatin, where they likely will use spreadsheets as the superglue creating a data shadw system. The analytical hub needs t prvide business peple with the ability t access, filter, augment and cmbine data frm many surces and in many varieties frm within and utside their enterprise. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 7

With self-service BI, the gal was truly shifting the analytical wrklad t the business. With data access and integratin, hwever, the gal is nt self-service data integratin, but rather empwerment. Typically, data integratin has emphasized physically integrating the data int a DW r anther applicatin. This has prven t be very time cnsuming, resulting in significant backlgs and limiting business analytics. In additin, business peple have ften been granted limited access t nn-integrated data t prtect them frm ptential incnsistencies. The data access and integratin layer needs t empwer the business peple t get the data they need as quickly as pssible, recgnizing that getting the best available data, even if nt perfect, is better than making a decisin with incmplete data r by using a data shadw system. There are several cnsideratins fr the architectural ptins f this layer: Data access The access ptins, prvided that security and privacy requirements are met, include query surces directly, data services, using lcal files and data virtualizatin. The first three alternatives are all pintt-pint access where the data scientist must knw abut the surce, secure access and then navigate the surce. Data virtualizatin (belw) is an architectural ptin that creates a data surce catalg that can be saved, shared and dcumented fr business analysts and augmented by the IT staff. Self-service data integratin Tday, data scientists rely n IT-built reprting r custm extracts fed by data-integratin tls, and then use spreadsheets t fill the gaps. Gathering requirements and designing and building the IT-built extracts severely slws dwn the time-t-slutin. The analytics hub leverages analytics tls, such as data discvery r data virtualizatin, t enable the business analyst t perfrm this functinality. In additin, there is a new generatin f data integratin tls, such as ELT (Extract, Lad and Transfrm), that are easy enugh t use by the data-savvy data scientists. This wave f self-service data integratin tls ften can wrk in batch, real-time and thrugh services, as well as being able t integrate structured, unstructured and big data. Augmenting enterprise data surces Often, critical data t classify, filter and analyze is nt available frm enterprise surces, but may require an external data feed r an imprt frm anther business grup. The hub needs t prvide the strage and ability t extract that data, and then imprt it int the envirnment. Data virtualizatin versus ETL data integratin Data integratin, data management and building a cnsistent, clean and cnfrmed data warehuse will cntinue t be respnsibility f IT grup. The data-integratin capability will expand beynd traditinal ETL t include data virtualizatin. Data virtualizatin empwers business peple in a cuple f ways. First, it enables them t expand the data used in their analysis withut requiring that it be physically integrated. Secnd, they d nt have t get IT invlved (via business requirements, data mdeling, ETL and BI design) every time data needs t be added. This iterative and agile apprach supprts data discvery mre prductively fr bth business and IT. Data virtualizatin eliminates the undcumented, verlapping and time-cnsuming pint-t-pint direct access cnnectins that business peple gt stuck ding in the past with their data shadw systems. With data virtualizatin, IT and business peple can add data surces int a repsitry that will dcument them, identify relatinships between surces and uses, and encurage reuse. T the business analyst, the virtualizatin repsitry prvides an infrmatin catalg t the relevant data needed fr their analysis. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 8

SECTION 5: ADVICE T cnclude, we ffer sme key advice fr designing and perating analytical hubs that enables the analytical elite t cnduct their situatinal analysis quickly and then act upn their insights: Build fr the advanced analytical elites, nt the masses The advanced analytical elite, i.e. data scientists and superpwer users, are the peple wh build predictive mdels and create frward-lking analytics. They are the Tp Guns f the analytical elite wh ften have a statistical backgrund and are typically mre data-savvy than IT. They d nt need IT t create BI slutins fr them, but rather create the analytical hub fr them t develp the analytical slutins. Trust them. Create the analytical hub. And then get ut f their way! Create an enterprise infrmatin backbne and integratin pipeline Predictive mdels are data hungry, needing ever increasing vlumes and variety at an ever faster pace. IT needs t cntinue t manage enterprise applicatins and extend BI slutins as the trusted enterprise infrmatin backbne fr all types f business peple t use. In additin, IT needs t establish an enterprise infrmatin pipeline fr data scientists and thers wh need t g beynd the infrmatin backbne. Embrace data virtualizatin and a hybrid data view mixing physically- and virtually-integrated data. Virtualizatin enables business relatinships and metrics t be built int the data view withut having t g thrugh the lengthy ETL integratin prcess. In additin, it enables yu t include varius data types and data surces that shuld nt be physically integrated. D nt be afraid t try smething new The technlgies and design appraches fr advanced analytics, predictive mdeling and data integratin are cntinually evlving in terms f capabilities, scale and ttal cst f wnership. Als, the vendr landscape has been vibrant with startups bringing new technlgies t the market, while mergers and acquisitins cnslidate and expand existing prduct capabilities. T meet the demands f data scientists, analytical hubs need t be designed differently than the standard prductin BI slutin. D nt be afraid t try new database, in-memry, virtualizatin and integratin technlgies frm new vendrs. Meeting the needs f frward-lking analytics is ging t mean thinking ut f the bx. Data may be dirty, incnsistent r incmplete IT s charter is t prvide cleansed and cnsistent data, which is the crrect gal fr the typical BI slutin, but data scientists ften need raw data that maybe be dirty, incnsistent r incmplete. Data scientists ften need t tap dirty data because that is the best that is available at the time fr them t develp their mdel. Much f the behaviral and attitudinal data that is used in mdels is utside the cntrl f the enterprise and will never be clean. Trust that data scientists understand hw t use bth the clean and dirty data. Their mdeling techniques takes int accunt the shrtcmings that the data that is available which is why they ften use many surces t imprve the accuracy f their mdels. Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 9

Abut the Authr: Rick Sherman is the funder f Athena IT Slutins, a firm that prvides business intelligence, data integratin and data warehuse cnsulting, training and vendr services. In additin t having mre than 25 years f experience in BI slutins, Rick writes n IT tpics and is a frequent speaker at industry events. He blgs at The Data Dghuse and can be reached at rsherman@athena-slutins.cm. Fr Mre Infrmatin: T learn mre abut hw Cmpsite Sftware can simplify infrmatin access at yur enterprise, please cntact us. inf@cmpsitesw.cm Phne (650) 227-8200 Fax (650) 227-8199 www.cmpsitesw.cm Cmpsite Sftware 2655 Campus Drive, Suite 200 San Mate, CA 94403 Fr Mre Infrmatin: T learn mre abut hw Athena IT Slutins can increase the success f yur BI, data integratin r data warehuse prject, please cntact us. inf@athena-slutins.cm Phne (978) 897-3322 Fax (978) 461-0809 www.athena-slutins.cm Athena IT Slutins Tw Clck Twer Place, Suite 540 Maynard, MA 01754 Analytics Best Practices: The Analytical Hub 2013 Athena IT Slutins Page 10