Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics

Size: px
Start display at page:

Download "Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics"

Transcription

1 Big Data Terminology - Key to Predictive Analytics Success Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics

2 Outline Big Data Phenomena Terminology Role Background on ISO TC69 NIST Initiative Joint ISO TC69 and NIST Effort Current and Future Developments Oct Joint Technical Communities Conference 2

3 Big Data Phenomena See: Olap.com Oct Joint Technical Communities Conference 3

4 Wiki Land Related Topics Apache Accumulo Hortonworks Apache Hadoop NoSQL Big Memory Programming in R Data mining Sqrrl Cask (company) Talend Cloudera Tranreality gaming Data Analytics Accel. Lib Tuple space HPCC Systems Unstructured data Internet of Things Spark Map Reduce Oct Joint Technical Communities Conference 4

5 Big Data Phenomena Average salaries (Bill Snyder, InfoWorld): MapReduce: $127,315 Cloudera: $126,816 HBase: $126,369 Pig: $124,563 Flume: $123,186 Hadoop: $121,313 Hive: $120,873 Zookeeper: $118,567 Data Architect: $118, Oct Joint Technical Communities Conference 5

6 Google search counts 9/20/2015 Data: 5.59 billion Big: 4.07 billion Big data: 753 million Business analytics: Business intelligence: Predictive analytics: Statistics lies: Machine learning: Kardashian: 127 million 108 million 12.4 million 217 million 60 million 221 million Oct Joint Technical Communities Conference 6

7 Opportunities/Challenges Managers/Deans/Provosts throwing $$$ at Big Data Discipline Discipline = Statistics / Computer Science / Info Sciences / MIS / Industrial Engineering / Business College Skill Set: IT + Statistics One Thing Abundantly, Massively, Redundantly Clear: Big Data means different things to different people Overwhelming Statistical types need partners to get the data; IT folks need help with the (simple) statistical analyses Oct Joint Technical Communities Conference 7

8 Even our Sponsored Research folks got involved Oct Joint Technical Communities Conference 8

9 Is Terminology Necessary? Google Search (electronic dictionary)? Familiar with definitions ability to write them Lack of standardization does it matter? (think C p, C pk ) 沟 通 コミュニケーション viestintä επικοινωνία 의사소통 общение iletişim спілкування การต ดต อส อสาร sự truyền đạt ا ت ص ال Oct Joint Technical Communities Conference 9

10 Big Data Definition DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions Abstract sets the playing field: Big Data is a term used to describe the new deluge of data in our networked, digitized, sensorladen, information-driven world. While great opportunities exist with Big Data, it can overwhelm traditional technical approaches and its growth is outpacing scientific and technological advances in data analytics Oct Joint Technical Communities Conference 10

11 Brief History Moore s Law (1965): transistors on boards doubling every 2 years (or, computing capability) Growth of Data Volumes: doubling every 1.5 yrs Development of relational data bases Increasing amounts of unstructured data Hardware pushed Software pushed via massive parallel computing Oct Joint Technical Communities Conference 11

12 Drilling down into Big Data Four key aspects of the Big Data Revolution (1) characteristics of the datasets, (2) analysis of the datasets, (3) performance of the systems that handle the data, (4) the business considerations of cost effectiveness Oct Joint Technical Communities Conference 12

13 Big Data V s NIST working definition: Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. Volume (data set size) Variety (multiple data types, various domains and repositories) Velocity (rate of flow or creation of data) Variability (non-constancy of other V s) Oct Joint Technical Communities Conference 13

14 The Big Data paradigm consists of the distribution of data systems across horizontally coupled, independent resources to achieve the scalability needed for the efficient processing of extensive datasets. Horizontal scaling: clustering multiple resources to act as one system Vertical scaling: limited by physical constraints (e.g., Moore s law) Oct Joint Technical Communities Conference 14

15 Definitions Big Data engineering includes advanced techniques that harness independent resources for building scalable data systems when the characteristics of the datasets require new architectures for efficient storage, manipulation, and analysis. Non-relational models, frequently referred to as NoSQL, refer to logical data models that do not follow relational algebra for the storage and manipulation of data. A federated database system is a type of meta-database management system, which transparently maps multiple autonomous database systems into a single federated database Oct Joint Technical Communities Conference 15

16 More NIST Definitions Schema-on-read is the application of a data schema through preparation steps such as transformations, cleansing, and integration at the time the data is read from the database. Computational portability is the movement of the computation to the location of the data. The data science paradigm is extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and hypothesis testing. The data lifecycle is the set of processes that transform raw data into actionable knowledge. Analytics is the synthesis of knowledge from information Oct Joint Technical Communities Conference 16

17 Data science is the empirical synthesis of actionable knowledge from raw data through the complete data lifecycle process A data scientist is a practitioner who has sufficient knowledge in the overlapping regimes of business needs, domain knowledge, analytical skills, and software and systems engineering to manage the end-to-end data processes through each stage in the data lifecycle Oct Joint Technical Communities Conference 17

18 Skills Needed in Data Science Oct Joint Technical Communities Conference 18

19 Even More V s Veracity: accuracy of the data Value: value of the analytics to the organization Volatility: tendency for data structures to change over time Validity: appropriateness of the data for its intended use Some non-v s: quality control, metadata, and data provenance Oct Joint Technical Communities Conference 19

20 Oxford English Dictionary: big data n. Computing (also with capital initials) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data. Cathy O Neill: Big data is more than one thing, but an important aspect is its use as a rhetorical device, something that can be used to deceive or mislead or overhype Oct Joint Technical Communities Conference 20

21 ISO TC69 Applications of Statistical Methods Standardization in the application of statistical methods, including generation, collection (planning and design), analysis, presentation and interpretation of data. (http://www.iso.org) No Specific Big Data Role TC69 Chair (Dr. M. Boulanger) launched ad hoc committee to work with ISO/IEC JTCG1/WG9 (i.e., NIST effort) Oct Joint Technical Communities Conference 21

22 ISO TC69 Structure SC1 Terminology and symbols SC4 Applications of statistical methods in process management SC5 Acceptance sampling SC6 Measurement methods and results SC7 Applications of statistical and related techniques for the implementation of Six Sigma SC8 Application of statistical and related methodology for new technology TC 69/WG 3 Statistical interpretation of data Oct Joint Technical Communities Conference 22

23 NIST Initiative, Draft Documents Volume 1, Definitions Volume 2, Taxonomies Volume 3, Use Cases and General Requirements Volume 4, Security and Privacy Volume 5, Architectures White Paper Survey Volume 6, Reference Architecture Volume 7, Standards Roadmap Oct Joint Technical Communities Conference 23

24 Joint Effort ISO TC69: NIST Group Standardization and Statistics Expertise IT expertise including comprehensive proposed architecture/ecosystem for Big Data Oct Joint Technical Communities Conference 24

25 NIST Initiative via ISO/IEC JTC 1/WG9 Big Data Public Working Group (NBD-PWG) has developed a Big Data Interoperability Framework DRAFT NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies Particular interest is Figure 1 from this Document (one of 7 volumes)

26 Figure 1

27 Simplified Diagram Data Provider Big Data Application Provider Collection Preparation/Curation Analytics Visualization Access Data Consumer Information Technology Realm

28 1. Collection 2. Preparation/Curation 3. Analytics 4. Visualization 5. Access Components of Big Data Application Provider

29 Collection Moving the data from its repository to an accessible location (large volumes and confidentiality) What/which data (merge); granularity; Sampling (if any) Prior to data preparation and cleansing Interactive/Iterative process likely Statistician connects Data Provider and Data Consumer

30 Preparation/Curation Validation (e.g., check sums, format checks) Cleansing (eliminate clear mistakes, duplicate records; not Outlier removal however ) Conversion (hiding sensitive data; summarization?) Partition implementation (data >> PC storage; schema on read)

31 Analytics NPD-PWD thinks of analytics as the steps of discovery that equate to rapid hypothesis-test cycle for finding value in large data sets (i.e., agile analytics such as correlations and trends) Structured versus unstructured data (containing value) Possible parallel environment for analysis May not have the raw data in a table (summary tables or relational data bases)

32 Analytics (continued) Complexity issues (execution time of method) Latency (real-time or streaming, near real-time or interactive, batch or offline) Human-in-the-loop analytics lifecycle (e.g., discovery, hypothesis, hypothesis testing)

33 Visualization 1. Exploratory data visualization for data understanding (e.g., browsing, outlier detection, boundary conditions) 2. Explicatory visualization for analytical results (e.g., confirmation, near real-time presentation of analytics, interpreting analytic results) 3. Explanatory visualization to tell the story (e.g., reports, business intelligence, summarization)

34 Visualization (continued) Note similarity to Exploratory Data Analysis Confirmatory Data Analysis Implementation Feasible with very large data sets Data rich allows training, validation, verification Not necessary to squeeze all information out

35 Access Could be an explicit implementation of the actionable items as a culmination of the previous four categories. The access activity of the Big Data Application Provider should mirror all actions of the Data Provider, since the Data Consumer may view this system as the Data Provider for their follow-on tasks. = score future observations based on an analytical model (including cleaning/aggregation). Interpretation of 5 categories is a Work in Progress

36 Summary Big Data Public Working Group (NBD-PWG) has developed a Big Data Interoperability Framework useful to ISO/TC69 for developing future statistical standards of use to both us (developers and users) and them (IT people who rely on us plus the End Users) Some standards in TC69 apply now (but gap analysis in order) We need them and they need us! Hardware/Software/Info Tech material handled by NBD- PWG (we can focus on statistics) Invitation to participate with JTC/WG9 is a great opportunity. Take advantage now or some other entity will.

Comprehensive Analytics on the Hortonworks Data Platform

Comprehensive Analytics on the Hortonworks Data Platform Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

Big Data Systems and Interoperability

Big Data Systems and Interoperability Big Data Systems and Interoperability Emerging Standards for Systems Engineering David Boyd VP, Data Solutions Email: dboyd@incadencecorp.com Topics Shameless plugs and denials What is Big Data and Why

More information

ANALYTICS CENTER LEARNING PROGRAM

ANALYTICS CENTER LEARNING PROGRAM Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals

More information

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must

More information

NIST Big Data Phase I Public Working Group

NIST Big Data Phase I Public Working Group NIST Big Data Phase I Public Working Group Reference Architecture Subgroup May 13 th, 2014 Presented by: Orit Levin Co-chair of the RA Subgroup Agenda Introduction: Why and How NIST Big Data Reference

More information

The InterNational Committee for Information Technology Standards INCITS Big Data

The InterNational Committee for Information Technology Standards INCITS Big Data The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015 Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance

More information

DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions

DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions NIST Special Publication 1500-1 DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions NIST Big Data Public Working Group Definitions and Taxonomies Subgroup Draft Version 1 April 6, 2015

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Survey of Big Data Architecture and Framework from the Industry

Survey of Big Data Architecture and Framework from the Industry Survey of Big Data Architecture and Framework from the Industry NIST Big Data Public Working Group Sanjay Mishra May13, 2014 3/19/2014 NIST Big Data Public Working Group 1 NIST BD PWG Survey of Big Data

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Big Data Analytics Roadmap Energy Industry

Big Data Analytics Roadmap Energy Industry Douglas Moore, Principal Consultant, Architect June 2013 Big Data Analytics Energy Industry Agenda Why Big Data in Energy? Imagine Overview - Use Cases - Readiness Analysis - Architecture - Development

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies

NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies NIST Special Publication 1500-2 NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies Final Version 1 NIST Big Data Public Working Group Definitions and Taxonomies Subgroup This publication

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

WHITE PAPER. Four Key Pillars To A Big Data Management Solution WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...

More information

Big data for the Masses The Unique Challenge of Big Data Integration

Big data for the Masses The Unique Challenge of Big Data Integration Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

Dominik Wagenknecht Accenture

Dominik Wagenknecht Accenture Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

#TalendSandbox for Big Data

#TalendSandbox for Big Data Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

Luncheon Webinar Series May 13, 2013

Luncheon Webinar Series May 13, 2013 Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration

More information

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Standard Big Data Architecture and Infrastructure

Standard Big Data Architecture and Infrastructure Standard Big Data Architecture and Infrastructure Wo Chang Digital Data Advisor Information Technology Laboratory (ITL) National Institute of Standards and Technology (NIST) wchang@nist.gov May 20, 2016

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti

More information

ISO/IEC JTC1 SC32. Next Generation Analytics Study Group

ISO/IEC JTC1 SC32. Next Generation Analytics Study Group November 13, 2013 ISO/IEC JTC1 SC32 Next Generation Analytics Study Group Title: Author: Project: Status: Big Data Efforts Keith W. Hare Discussion Paper References: 1/6 1 NIST Big Data Public Working

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Data Management in SAP Environments

Data Management in SAP Environments Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

Are You Big Data Ready?

Are You Big Data Ready? ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Manifest for Big Data Pig, Hive & Jaql

Manifest for Big Data Pig, Hive & Jaql Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,

More information

Data processing goes big

Data processing goes big Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Instructional Model for Building effective Big Data Curricula for Online and Campus Education

Instructional Model for Building effective Big Data Curricula for Online and Campus Education Instructional Model for Building effective Big Data Curricula for Online and Campus Education Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University

More information

Constructing a Data Lake: Hadoop and Oracle Database United!

Constructing a Data Lake: Hadoop and Oracle Database United! Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014

Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014 Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Introduction to Big Data and the Lambda Architecture

Introduction to Big Data and the Lambda Architecture Introduction to Big Data and the Lambda Architecture Marc Schöni Meinrad Weiss April 2014 BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 1 What

More information

Foundations of Business Intelligence: Databases and Information Management

Foundations of Business Intelligence: Databases and Information Management Foundations of Business Intelligence: Databases and Information Management Wienand Omta Fabiano Dalpiaz 1 drs. ing. Wienand Omta Learning Objectives Describe how the problems of managing data resources

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Big Data and Data Science. The globally recognised training program

Big Data and Data Science. The globally recognised training program Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

IBM Big Data in Government

IBM Big Data in Government IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an

More information

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015

Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Datenverwaltung im Wandel - Building an Enterprise Data Hub with Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

The Future of Data Management with Hadoop and the Enterprise Data Hub

The Future of Data Management with Hadoop and the Enterprise Data Hub The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.

Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Apache Kylin Introduction Dec 8, 2014 @ApacheKylin

Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance

More information

Big Data Analytics Platform @ Nokia

Big Data Analytics Platform @ Nokia Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform

More information

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Agile Business Intelligence Data Lake Architecture

Agile Business Intelligence Data Lake Architecture Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume

More information

Cloud and Big Data Standardisation

Cloud and Big Data Standardisation Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam

More information

Big Data must become a first class citizen in the enterprise

Big Data must become a first class citizen in the enterprise Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

Big Data Challenges in Bioinformatics

Big Data Challenges in Bioinformatics Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?

More information

Internals of Hadoop Application Framework and Distributed File System

Internals of Hadoop Application Framework and Distributed File System International Journal of Scientific and Research Publications, Volume 5, Issue 7, July 2015 1 Internals of Hadoop Application Framework and Distributed File System Saminath.V, Sangeetha.M.S Abstract- Hadoop

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

This Symposium brought to you by www.ttcus.com

This Symposium brought to you by www.ttcus.com This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Move Data from Oracle to Hadoop and Gain New Business Insights

Move Data from Oracle to Hadoop and Gain New Business Insights Move Data from Oracle to Hadoop and Gain New Business Insights Written by Lenka Vanek, senior director of engineering, Dell Software Abstract Today, the majority of data for transaction processing resides

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Data Integration Checklist

Data Integration Checklist The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

IBM InfoSphere BigInsights Enterprise Edition

IBM InfoSphere BigInsights Enterprise Edition IBM InfoSphere BigInsights Enterprise Edition Efficiently manage and mine big data for valuable insights Highlights Advanced analytics for structured, semi-structured and unstructured data Professional-grade

More information

Ganzheitliches Datenmanagement

Ganzheitliches Datenmanagement Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist

More information