Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics
|
|
- Erica McCormick
- 8 years ago
- Views:
Transcription
1 Big Data Terminology - Key to Predictive Analytics Success Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics
2 Outline Big Data Phenomena Terminology Role Background on ISO TC69 NIST Initiative Joint ISO TC69 and NIST Effort Current and Future Developments Oct Joint Technical Communities Conference 2
3 Big Data Phenomena See: Olap.com Oct Joint Technical Communities Conference 3
4 Wiki Land Related Topics Apache Accumulo Hortonworks Apache Hadoop NoSQL Big Memory Programming in R Data mining Sqrrl Cask (company) Talend Cloudera Tranreality gaming Data Analytics Accel. Lib Tuple space HPCC Systems Unstructured data Internet of Things Spark Map Reduce Oct Joint Technical Communities Conference 4
5 Big Data Phenomena Average salaries (Bill Snyder, InfoWorld): MapReduce: $127,315 Cloudera: $126,816 HBase: $126,369 Pig: $124,563 Flume: $123,186 Hadoop: $121,313 Hive: $120,873 Zookeeper: $118,567 Data Architect: $118, Oct Joint Technical Communities Conference 5
6 Google search counts 9/20/2015 Data: 5.59 billion Big: 4.07 billion Big data: 753 million Business analytics: Business intelligence: Predictive analytics: Statistics lies: Machine learning: Kardashian: 127 million 108 million 12.4 million 217 million 60 million 221 million Oct Joint Technical Communities Conference 6
7 Opportunities/Challenges Managers/Deans/Provosts throwing $$$ at Big Data Discipline Discipline = Statistics / Computer Science / Info Sciences / MIS / Industrial Engineering / Business College Skill Set: IT + Statistics One Thing Abundantly, Massively, Redundantly Clear: Big Data means different things to different people Overwhelming Statistical types need partners to get the data; IT folks need help with the (simple) statistical analyses Oct Joint Technical Communities Conference 7
8 Even our Sponsored Research folks got involved Oct Joint Technical Communities Conference 8
9 Is Terminology Necessary? Google Search (electronic dictionary)? Familiar with definitions ability to write them Lack of standardization does it matter? (think C p, C pk ) 沟 通 コミュニケーション viestintä επικοινωνία 의사소통 общение iletişim спілкування การต ดต อส อสาร sự truyền đạt ا ت ص ال Oct Joint Technical Communities Conference 9
10 Big Data Definition DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions Abstract sets the playing field: Big Data is a term used to describe the new deluge of data in our networked, digitized, sensorladen, information-driven world. While great opportunities exist with Big Data, it can overwhelm traditional technical approaches and its growth is outpacing scientific and technological advances in data analytics Oct Joint Technical Communities Conference 10
11 Brief History Moore s Law (1965): transistors on boards doubling every 2 years (or, computing capability) Growth of Data Volumes: doubling every 1.5 yrs Development of relational data bases Increasing amounts of unstructured data Hardware pushed Software pushed via massive parallel computing Oct Joint Technical Communities Conference 11
12 Drilling down into Big Data Four key aspects of the Big Data Revolution (1) characteristics of the datasets, (2) analysis of the datasets, (3) performance of the systems that handle the data, (4) the business considerations of cost effectiveness Oct Joint Technical Communities Conference 12
13 Big Data V s NIST working definition: Big Data consists of extensive datasets primarily in the characteristics of volume, variety, velocity, and/or variability that require a scalable architecture for efficient storage, manipulation, and analysis. Volume (data set size) Variety (multiple data types, various domains and repositories) Velocity (rate of flow or creation of data) Variability (non-constancy of other V s) Oct Joint Technical Communities Conference 13
14 The Big Data paradigm consists of the distribution of data systems across horizontally coupled, independent resources to achieve the scalability needed for the efficient processing of extensive datasets. Horizontal scaling: clustering multiple resources to act as one system Vertical scaling: limited by physical constraints (e.g., Moore s law) Oct Joint Technical Communities Conference 14
15 Definitions Big Data engineering includes advanced techniques that harness independent resources for building scalable data systems when the characteristics of the datasets require new architectures for efficient storage, manipulation, and analysis. Non-relational models, frequently referred to as NoSQL, refer to logical data models that do not follow relational algebra for the storage and manipulation of data. A federated database system is a type of meta-database management system, which transparently maps multiple autonomous database systems into a single federated database Oct Joint Technical Communities Conference 15
16 More NIST Definitions Schema-on-read is the application of a data schema through preparation steps such as transformations, cleansing, and integration at the time the data is read from the database. Computational portability is the movement of the computation to the location of the data. The data science paradigm is extraction of actionable knowledge directly from data through a process of discovery, hypothesis, and hypothesis testing. The data lifecycle is the set of processes that transform raw data into actionable knowledge. Analytics is the synthesis of knowledge from information Oct Joint Technical Communities Conference 16
17 Data science is the empirical synthesis of actionable knowledge from raw data through the complete data lifecycle process A data scientist is a practitioner who has sufficient knowledge in the overlapping regimes of business needs, domain knowledge, analytical skills, and software and systems engineering to manage the end-to-end data processes through each stage in the data lifecycle Oct Joint Technical Communities Conference 17
18 Skills Needed in Data Science Oct Joint Technical Communities Conference 18
19 Even More V s Veracity: accuracy of the data Value: value of the analytics to the organization Volatility: tendency for data structures to change over time Validity: appropriateness of the data for its intended use Some non-v s: quality control, metadata, and data provenance Oct Joint Technical Communities Conference 19
20 Oxford English Dictionary: big data n. Computing (also with capital initials) data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges; (also) the branch of computing involving such data. Cathy O Neill: Big data is more than one thing, but an important aspect is its use as a rhetorical device, something that can be used to deceive or mislead or overhype Oct Joint Technical Communities Conference 20
21 ISO TC69 Applications of Statistical Methods Standardization in the application of statistical methods, including generation, collection (planning and design), analysis, presentation and interpretation of data. ( No Specific Big Data Role TC69 Chair (Dr. M. Boulanger) launched ad hoc committee to work with ISO/IEC JTCG1/WG9 (i.e., NIST effort) Oct Joint Technical Communities Conference 21
22 ISO TC69 Structure SC1 Terminology and symbols SC4 Applications of statistical methods in process management SC5 Acceptance sampling SC6 Measurement methods and results SC7 Applications of statistical and related techniques for the implementation of Six Sigma SC8 Application of statistical and related methodology for new technology TC 69/WG 3 Statistical interpretation of data Oct Joint Technical Communities Conference 22
23 NIST Initiative, Draft Documents Volume 1, Definitions Volume 2, Taxonomies Volume 3, Use Cases and General Requirements Volume 4, Security and Privacy Volume 5, Architectures White Paper Survey Volume 6, Reference Architecture Volume 7, Standards Roadmap Oct Joint Technical Communities Conference 23
24 Joint Effort ISO TC69: NIST Group Standardization and Statistics Expertise IT expertise including comprehensive proposed architecture/ecosystem for Big Data Oct Joint Technical Communities Conference 24
25 NIST Initiative via ISO/IEC JTC 1/WG9 Big Data Public Working Group (NBD-PWG) has developed a Big Data Interoperability Framework DRAFT NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies Particular interest is Figure 1 from this Document (one of 7 volumes)
26 Figure 1
27 Simplified Diagram Data Provider Big Data Application Provider Collection Preparation/Curation Analytics Visualization Access Data Consumer Information Technology Realm
28 1. Collection 2. Preparation/Curation 3. Analytics 4. Visualization 5. Access Components of Big Data Application Provider
29 Collection Moving the data from its repository to an accessible location (large volumes and confidentiality) What/which data (merge); granularity; Sampling (if any) Prior to data preparation and cleansing Interactive/Iterative process likely Statistician connects Data Provider and Data Consumer
30 Preparation/Curation Validation (e.g., check sums, format checks) Cleansing (eliminate clear mistakes, duplicate records; not Outlier removal however ) Conversion (hiding sensitive data; summarization?) Partition implementation (data >> PC storage; schema on read)
31 Analytics NPD-PWD thinks of analytics as the steps of discovery that equate to rapid hypothesis-test cycle for finding value in large data sets (i.e., agile analytics such as correlations and trends) Structured versus unstructured data (containing value) Possible parallel environment for analysis May not have the raw data in a table (summary tables or relational data bases)
32 Analytics (continued) Complexity issues (execution time of method) Latency (real-time or streaming, near real-time or interactive, batch or offline) Human-in-the-loop analytics lifecycle (e.g., discovery, hypothesis, hypothesis testing)
33 Visualization 1. Exploratory data visualization for data understanding (e.g., browsing, outlier detection, boundary conditions) 2. Explicatory visualization for analytical results (e.g., confirmation, near real-time presentation of analytics, interpreting analytic results) 3. Explanatory visualization to tell the story (e.g., reports, business intelligence, summarization)
34 Visualization (continued) Note similarity to Exploratory Data Analysis Confirmatory Data Analysis Implementation Feasible with very large data sets Data rich allows training, validation, verification Not necessary to squeeze all information out
35 Access Could be an explicit implementation of the actionable items as a culmination of the previous four categories. The access activity of the Big Data Application Provider should mirror all actions of the Data Provider, since the Data Consumer may view this system as the Data Provider for their follow-on tasks. = score future observations based on an analytical model (including cleaning/aggregation). Interpretation of 5 categories is a Work in Progress
36 Summary Big Data Public Working Group (NBD-PWG) has developed a Big Data Interoperability Framework useful to ISO/TC69 for developing future statistical standards of use to both us (developers and users) and them (IT people who rely on us plus the End Users) Some standards in TC69 apply now (but gap analysis in order) We need them and they need us! Hardware/Software/Info Tech material handled by NBD- PWG (we can focus on statistics) Invitation to participate with JTC/WG9 is a great opportunity. Take advantage now or some other entity will.
Comprehensive Analytics on the Hortonworks Data Platform
Comprehensive Analytics on the Hortonworks Data Platform We do Hadoop. Page 1 Page 2 Back to 2005 Page 3 Vertical Scaling Page 4 Vertical Scaling Page 5 Vertical Scaling Page 6 Horizontal Scaling Page
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationApache Hadoop: The Big Data Refinery
Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data
More informationThe Future of Data Management
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class
More informationBig Data Systems and Interoperability
Big Data Systems and Interoperability Emerging Standards for Systems Engineering David Boyd VP, Data Solutions Email: dboyd@incadencecorp.com Topics Shameless plugs and denials What is Big Data and Why
More informationBig Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management
Big Data and New Paradigms in Information Management Vladimir Videnovic Institute for Information Management 2 "I am certainly not an advocate for frequent and untried changes laws and institutions must
More informationNIST Big Data Phase I Public Working Group
NIST Big Data Phase I Public Working Group Reference Architecture Subgroup May 13 th, 2014 Presented by: Orit Levin Co-chair of the RA Subgroup Agenda Introduction: Why and How NIST Big Data Reference
More informationBig Data Analytics Roadmap Energy Industry
Douglas Moore, Principal Consultant, Architect June 2013 Big Data Analytics Energy Industry Agenda Why Big Data in Energy? Imagine Overview - Use Cases - Readiness Analysis - Architecture - Development
More informationThe InterNational Committee for Information Technology Standards INCITS Big Data
The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015 Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance
More informationHDP Hadoop From concept to deployment.
HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationSurvey of Big Data Architecture and Framework from the Industry
Survey of Big Data Architecture and Framework from the Industry NIST Big Data Public Working Group Sanjay Mishra May13, 2014 3/19/2014 NIST Big Data Public Working Group 1 NIST BD PWG Survey of Big Data
More informationNIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies
NIST Special Publication 1500-2 NIST Big Data Interoperability Framework: Volume 2, Big Data Taxonomies Final Version 1 NIST Big Data Public Working Group Definitions and Taxonomies Subgroup This publication
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationDRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions
NIST Special Publication 1500-1 DRAFT NIST Big Data Interoperability Framework: Volume 1, Definitions NIST Big Data Public Working Group Definitions and Taxonomies Subgroup Draft Version 1 April 6, 2015
More informationWHITE PAPER. Four Key Pillars To A Big Data Management Solution
WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationBig data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
More informationBIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
More information#TalendSandbox for Big Data
Evalua&on von Apache Hadoop mit der #TalendSandbox for Big Data Julien Clarysse @whatdoesdatado @talend 2015 Talend Inc. 1 Connecting the Data-Driven Enterprise 2 Talend Overview Founded in 2006 BRAND
More informationThe Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
More informationHadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012. Viswa Sharma Solutions Architect Tata Consultancy Services
Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, 2012 Viswa Sharma Solutions Architect Tata Consultancy Services 1 Agenda What is Hadoop Why Hadoop? The Net Generation is here Sizing the
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationBig Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
More informationDominik Wagenknecht Accenture
Dominik Wagenknecht Accenture Improving Mainframe Performance with Hadoop October 17, 2014 Organizers General Partner Top Media Partner Media Partner Supporters About me Dominik Wagenknecht Accenture Vienna
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationIntegrating a Big Data Platform into Government:
Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationIBM Big Data in Government
IBM Big in Government Turning big data into smarter decisions Deepak Mohapatra Sr. Consultant Government IBM Software Group dmohapatra@us.ibm.com The Big Paradigm Shift 2 Big Creates A Challenge And an
More informationTesting 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
More informationData Management in SAP Environments
Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationStandard Big Data Architecture and Infrastructure
Standard Big Data Architecture and Infrastructure Wo Chang Digital Data Advisor Information Technology Laboratory (ITL) National Institute of Standards and Technology (NIST) wchang@nist.gov May 20, 2016
More informationAgile Business Intelligence Data Lake Architecture
Agile Business Intelligence Data Lake Architecture TABLE OF CONTENTS Introduction... 2 Data Lake Architecture... 2 Step 1 Extract From Source Data... 5 Step 2 Register And Catalogue Data Sets... 5 Step
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationAre You Big Data Ready?
ACS 2015 Annual Canberra Conference Are You Big Data Ready? Vladimir Videnovic Business Solutions Director Oracle Big Data and Analytics Introduction Introduction What is Big Data? If you can't explain
More informationBIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
More informationBig Data must become a first class citizen in the enterprise
Big Data must become a first class citizen in the enterprise An Ovum white paper for Cloudera Publication Date: 14 January 2014 Author: Tony Baer SUMMARY Catalyst Ovum view Big Data analytics have caught
More informationBig Data Zurich, November 23. September 2011
Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,
More informationChapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationData-Intensive Programming. Timo Aaltonen Department of Pervasive Computing
Data-Intensive Programming Timo Aaltonen Department of Pervasive Computing Data-Intensive Programming Lecturer: Timo Aaltonen University Lecturer timo.aaltonen@tut.fi Assistants: Henri Terho and Antti
More informationBig Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
More informationChapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya
Chapter 6 Basics of Data Integration Fundamentals of Business Analytics Learning Objectives and Learning Outcomes Learning Objectives 1. Concepts of data integration 2. Needs and advantages of using data
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationManifest for Big Data Pig, Hive & Jaql
Manifest for Big Data Pig, Hive & Jaql Ajay Chotrani, Priyanka Punjabi, Prachi Ratnani, Rupali Hande Final Year Student, Dept. of Computer Engineering, V.E.S.I.T, Mumbai, India Faculty, Computer Engineering,
More informationBarriers. So what is Big Data?!
Barriers So what is Big Data?! Big Data is the modern scale at which we are defining or data usage challenges. Big Data begins at the point where need to seriously start thinking about the technologies
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationBig Data and Industrial Internet
Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationBig Data and Data Science. The globally recognised training program
Big Data and Data Science The globally recognised training program Certificate in Big Data Analytics Duration 5 days Big Data and Data Science enables value creation from data, through the use of calculative
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationInstructional Model for Building effective Big Data Curricula for Online and Campus Education
Instructional Model for Building effective Big Data Curricula for Online and Campus Education Big Data course and Learning Model for Online education (LMO) at the Laureate Online Education (University
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationConstructing a Data Lake: Hadoop and Oracle Database United!
Constructing a Data Lake: Hadoop and Oracle Database United! Sharon Sophia Stephen Big Data PreSales Consultant February 21, 2015 Safe Harbor The following is intended to outline our general product direction.
More informationHow to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationHortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015
Hortonworks and ODP: Realizing the Future of Big Data, Now Manila, May 13, 2015 We Do Hadoop Fall 2014 Page 1 HDP delivers a comprehensive data management platform GOVERNANCE Hortonworks Data Platform
More informationISO/IEC JTC1 SC32. Next Generation Analytics Study Group
November 13, 2013 ISO/IEC JTC1 SC32 Next Generation Analytics Study Group Title: Author: Project: Status: Big Data Efforts Keith W. Hare Discussion Paper References: 1/6 1 NIST Big Data Public Working
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationIntroduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p.
Introduction p. xvii Introduction to Big Data Analytics p. 1 Big Data Overview p. 2 Data Structures p. 5 Analyst Perspective on Data Repositories p. 9 State of the Practice in Analytics p. 11 BI Versus
More informationThe Future of Data Management with Hadoop and the Enterprise Data Hub
The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah Cofounder & CTO, Cloudera, Inc. Twitter: @awadallah 1 2 Cloudera Snapshot Founded 2008, by former employees of Employees
More informationDatenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
More informationGanzheitliches Datenmanagement
Ganzheitliches Datenmanagement für Hadoop Michael Kohs, Senior Sales Consultant @mikchaos The Problem with Big Data Projects in 2016 Relational, Mainframe Documents and Emails Data Modeler Data Scientist
More informationBringing the Power of SAS to Hadoop. White Paper
White Paper Bringing the Power of SAS to Hadoop Combine SAS World-Class Analytic Strength with Hadoop s Low-Cost, Distributed Data Storage to Uncover Hidden Opportunities Contents Introduction... 1 What
More informationAlexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data
INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are
More informationCloud and Big Data Standardisation
Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationData Integration Checklist
The need for data integration tools exists in every company, small to large. Whether it is extracting data that exists in spreadsheets, packaged applications, databases, sensor networks or social media
More informationW H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract
W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the
More informationManaging Data in Motion
Managing Data in Motion Data Integration Best Practice Techniques and Technologies April Reeve ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY
More informationHow To Turn Big Data Into An Insight
mwd a d v i s o r s Turning Big Data into Big Insights Helena Schwenk A special report prepared for Actuate May 2013 This report is the fourth in a series and focuses principally on explaining what s needed
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationModern Data Architecture for Predictive Analytics
Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters
More informationHDP Enabling the Modern Data Architecture
HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationBig Data Standardisation in Industry and Research
Big Data Standardisation in Industry and Research EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationCollaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.
Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!
More informationApache Kylin Introduction Dec 8, 2014 @ApacheKylin
Apache Kylin Introduction Dec 8, 2014 @ApacheKylin Luke Han Sr. Product Manager lukhan@ebay.com @lukehq Yang Li Architect & Tech Leader yangli9@ebay.com Agenda What s Apache Kylin? Tech Highlights Performance
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationExecutive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...
Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data
More informationThe 4 Pillars of Technosoft s Big Data Practice
beyond possible Big Use End-user applications Big Analytics Visualisation tools Big Analytical tools Big management systems The 4 Pillars of Technosoft s Big Practice Overview Businesses have long managed
More information