Annex: Concept Note. Big Data for Policy, Development and Official Statistics New York, 22 February 2013



Similar documents
Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

Big Data. Fast Forward. Putting data to productive use

Using Tableau Software with Hortonworks Data Platform

INTELLIGENT BUSINESS STRATEGIES WHITE PAPER

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

How To Handle Big Data With A Data Scientist

Big Data Market Size and Vendor Revenues

BIG DATA-AS-A-SERVICE

Wikibon Big Data Analytics Adoption Survey, Frequency Analysis

Testing Big data is one of the biggest

Accelerate BI Initiatives With Self-Service Data Discovery And Integration

There s no way around it: learning about Big Data means

Microsoft Big Data. Solution Brief

Apache Hadoop: The Big Data Refinery

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

BIG DATA: FIVE TACTICS TO MODERNIZE YOUR DATA WAREHOUSE

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

ANALYTICS BUILT FOR INTERNET OF THINGS

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Agile Business Intelligence Data Lake Architecture

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

The Definitive Guide to Data Blending. White Paper

INTRODUCTION TO CASSANDRA

The 3 questions to ask yourself about BIG DATA

Information Architecture

Getting Started Practical Input For Your Roadmap

What s Trending in Analytics for the Consumer Packaged Goods Industry?

What happens when Big Data and Master Data come together?

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

BIRT in the World of Big Data

Microsoft Analytics Platform System. Solution Brief

Architecting for the Internet of Things & Big Data

Self-Service Big Data Analytics for Line of Business

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

DATA MANAGEMENT FOR THE INTERNET OF THINGS

Big Data at Cloud Scale

IBM Big Data in Government

Taming Big Data. 1010data ACCELERATES INSIGHT

How To Understand The Benefits Of Big Data

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Real-Time Big Data Analytics + Internet of Things (IoT) = Value Creation

The Future of Data Management

Management Accountants and IT Professionals providing Better Information = BI = Business Intelligence. Peter Simons peter.simons@cimaglobal.

Vehicle Manufacturer Propels Customer Engagement with Digital Marketing Solutions Insights

Reaping the Rewards of Big Data

Tap into Big Data at the Speed of Business

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

Apache Hadoop Patterns of Use

A New Era Of Analytic

Cisco Data Preparation

Three Open Blueprints For Big Data Success

We are Big Data A Sonian Whitepaper

Data Refinery with Big Data Aspects

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

Game On: How Information is Changing the Rules of Insurance

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

IBM Data Warehousing and Analytics Portfolio Summary

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, Applies to: Microsoft SQL Server Summary:

Chapter 1. Contrasting traditional and visual analytics approaches

Extend your analytic capabilities with SAP Predictive Analysis

From Spark to Ignition:

SQL Server 2012 Parallel Data Warehouse. Solution Brief

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

Big Analytics: A Next Generation Roadmap

Big Data Use Cases. To Start Today. Paul Scholey Sales Director, EMEA. 2013, Pentaho. All Rights Reserved. pentaho.com. Worldwide +1 (866)

Winning with an Intuitive Business Intelligence Solution for Midsize Companies

UNIFY YOUR (BIG) DATA

BEYOND BI: Big Data Analytic Use Cases

Analyzing Big Data: The Path to Competitive Advantage

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Big Data Are You Ready? Thomas Kyte

The Big Picture on Big Data. Princeton Section 307 Dinner Meeting December 11, 2013 Richard Herczeg

How To Scale Out Of A Nosql Database

Understanding traffic flow

Exploiting Data at Rest and Data in Motion with a Big Data Platform

How Big Data is Different

COULD VS. SHOULD: BALANCING BIG DATA AND ANALYTICS TECHNOLOGY WITH PRACTICAL OUTCOMES

InfraStruxure TM Management Software

ENTERPRISE BI AND DATA DISCOVERY, FINALLY

North Highland Data and Analytics. Data Governance Considerations for Big Data Analytics

TS03: Operational Excellence by Leveraging Internet of Things Technologies

BENEFITS OF AUTOMATING DATA WAREHOUSING

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Hadoop for Enterprises:

P4.1 Reference Architectures for Enterprise Big Data Use Cases Romeo Kienzler, Data Scientist, Advisory Architect, IBM Germany, Austria, Switzerland

How To Make Data Streaming A Real Time Intelligence

Transcription:

Annex: Concept Note Friday Seminar on Emerging Issues Big Data for Policy, Development and Official Statistics New York, 22 February 2013 How is Big Data different from just very large databases? 1 Traditionally, data processing for analytic purposes followed a fairly static blueprint. Namely, create modest amounts of structured data with stable data models. Data processing analysis and integration tools are used to extract, transform and load the data from enterprises applications and administrative databases to a staging area where data quality and data normalization (hopefully) occur and the data is modeled into neat rows and tables. The modeled, cleansed data is then loaded into an enterprise data warehouse. This routine usually occurs on a scheduled basis usually daily or weekly, monthly or annually, sometimes more frequently. From there, data warehouse administrators create and schedule regular reports to run against normalized data stored in the warehouse or some other dissemination facility, which are distributed to a wide range of users in government, business, the media and the community at large.. They also create dashboards and other limited visualization tools for executives and management. Analysts, meanwhile, use data analytics tools/engines to run more advanced analytics against the warehouse or other dissemination facility, or more often against sample data migrated to a local data mart due to size limitations. Nonexpert users perform basic data visualization and limited analytics against the data warehouse via front-end business intelligence tools. Data volumes in traditional data warehouses rarely exceeded multiple terabytes (and even that much is rare) as large volumes of data strain warehouse resources and degrade performance. The changing nature of Big Data The advent of the Web, mobile devices and other technologies such as sensor networks has caused a fundamental change to the nature of data. Big Data has important, distinct qualities that differentiate it from traditional institutional data. 1 Most of the shown text comes from a Big Data Manifesto from the Wikibon Community by Jeff Kelly, see http://wikibon.org/wiki/v/big_data:_hadoop,_business_analytics_and_beyond 1

Data are no longer centralized, highly structured and easily manageable, but are highly distributed, loosely structured (if structured at all), and increasingly large in volume. Source: Microsoft Specifically: Volume The amount of data created both inside corporations and outside the firewall via the web, mobile devices, IT infrastructure, and other sources is increasing exponentially each year. Type The variety of data types is increasing, namely unstructured text-based data and semi-structured data like social media data, location-based data, and log-file data. Speed The speed at which new data is being created and the need for real-time analytics to derive business value from it -- is increasing thanks to digitization of transactions, the emergence of sensor networks, mobile computing and the sheer number of internet and mobile device users. Broadly speaking, Big Data is generated by a range of sources, including: Mobile Devices: There are over 5 billion mobile phones in use worldwide. Each call, text and instant message is logged as data. Mobile devices, particularly smart phones 2

and tablets, also make it easier to use social media and use other data-generating applications. Mobile devices also collect and transmit location data. Internet Transactions: Billions of online purchases, funds transfers, stock trades and other transactions happen every day, including countless automated transactions. Each creates a number of data points collected by retailers, banks, credit card issuers, credit agencies and others. Networked Devices and Sensors: Electronic devices of all sorts including servers and other IT hardware, smart energy meters and temperature and other sensors -- all create semi-structured log data that record every action. Social Networking and Media: There are currently over 700 million Facebook users, 250 million Twitter users and 156 million public blogs. Each Facebook update, Tweet, blog post and comment creates multiple new data points, both structured, semistructured and unstructured, sometimes called Data Exhaust. Source: The Informatica Blog New approaches to Big Data processing and analytics Traditional data warehouses and other data management tools are not designed for processing and analyzing Big Data in a time- or cost-efficient manner. Namely, data 3

must be organized into relational tables -- neat rows and columns -- before a traditional enterprise data warehouse can ingest it. Due to the time and man-power needed, applying such structure to vast amounts of unstructured data is impractical. Further, in order to scale-up a traditional enterprise data warehouse to accommodate potentially petabytes of data would require unrealistic financial investments in new, often (depending on the vendor) proprietary hardware. Data warehouse performance would also suffer due to a single choke point for loading data. Therefore new ways of processing and analyzing Big Data are required. There are number of approaches to processing and analyzing Big Data, but most have some common characteristics. Namely, they take advantage of commodity hardware to enable scale-out, parallel processing techniques; employ non-relational data storage capabilities in order to process unstructured and semi-structured data; and apply advanced analytics and data visualization technology to Big Data to convey insights to end-users. Source: Wikibon 2012 In order to fully take advantage of Big Data, however, enterprises must take further steps. Namely, they must employ staff with the knowledge and skills to deploy advanced analytics techniques on the processed data to reveal meaningful insights. People with the knowledge and skills are often now described as Data Scientists 4

performing this sophisticated work in one of a handful of languages or approaches, including HADOOP, SAS and R. The results of this analysis can then be operationalized via Big Data applications, either homegrown or off-the-shelf. Other vendors are developing business intelligence-style applications to allow non-power users to interact with Big Data directly. The context of Official Statistics National Statistical Offices have started to explore how best to harness this phenomenon of Big Data in their mission to supply quality statistics for improving economic performance, social well-being and environmental sustainability. Some of the issues 2 raised are: Should NSOs expand its business operations to take on the opportunities of using Big Data for official government purposes? Should NSOs take on a new mission as a trusted 3rd party whose role would be to certify the statistical quality of many of these newly emerging private sector sources? Should NSOs become a clearing house for statistics from non-traditional sources that meet their quality standards? Should NSOs use non-traditional sources to supplement (and perhaps replace) their official series? How might NSOs acquire people with the knowledge and skills to effectively take advantage of Big Data for official statistics purposes? For example, the billion Price Project collects price information over the internet and computes a price index to estimate inflation. The index is published daily with a three day lag as opposed to the official inflation numbers which are published monthly with a an even longer lag. A quick turn-around allows for early detection of inflation trends and may allow policy makers to tailor policies in a much more timely manner. If governments wanted to, they could already let Big Data play a role in providing some information on areas that are currently under the responsibility of national statistical offices (NSOs). 2 These issues are being considered by the High-Level Group for Strategic Developments in Business Architecture in Statistics which reports to the Conference of European Statisticians. 5

The attraction of Big Data lies in the sheer amount of data which could be available in, or near, real time. Potentially, Big Data could be used as intelligence to better solve emergency situations. Satellite imaging or information gathered from mobile devices can be used both in developed and developing countries. Big Data presents an opportunity for the official statistical community to better meet its mission of disseminating timely and quality statistics. Building on the experiences of the private and public sector, NSOs and national statistical and international statistical systems more generally have an opportunity to expand into an area that could provide a new range of relevant information in a timely manner. The use of Big Data has a number of upsides but also many challenges related to security, privacy, analysis and interpretation. Analyses and results emerging from the use of Big Data should be properly checked and documented for their quality, validity and limitations. Practical challenges with Big Data are using commercial infrastructure (capacity and computational power) to store, mine and analyse Big Data and developing the appropriate enterprise architectures within statistical organizations. Moving from traditional data collection to procurement and use of Big Data, the statistical community will also require need to address the skill gap around Big Data administration and Big Data Analytics, or Data Science. In order for Big Data to truly gain mainstream adoption and achieve its full potential for official statistical purposes, it is critical that the statistical community does not ignore Big Data, but recognizes the use Big Data as part of their information management model, prepares an inventory of the state of play and formulates the implications for official statistics. 6