Big Data Big Data/Data Analytics & Software Development



Similar documents
Hadoop: Distributed Data Processing. Amr Awadallah Founder/CTO, Cloudera, Inc. ACM Data Mining SIG Thursday, January 25 th, 2010

Hadoop implementation of MapReduce computational model. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA TRENDS AND TECHNOLOGIES

Hadoop IST 734 SS CHUNG

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect

The Future of Data Management

An Integrated Analytics & Big Data Infrastructure September 21, 2012 Robert Stackowiak, Vice President Data Systems Architecture Oracle Enterprise

BIG DATA What it is and how to use?

Apache Hadoop: Past, Present, and Future

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Modernizing Your Data Warehouse for Hadoop

Introducing Oracle Exalytics In-Memory Machine

MapReduce with Apache Hadoop Analysing Big Data

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee

Big Data in Healthcare

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Integrating Hadoop. Into Business Intelligence & Data Warehousing. Philip Russom TDWI Research Director for Data Management, April

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

The Future of Data Management with Hadoop and the Enterprise Data Hub

Oracle Big Data Essentials

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

HDP Enabling the Modern Data Architecture

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

MySQL and Hadoop: Big Data Integration. Shubhangi Garg & Neha Kumari MySQL Engineering

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Application Development. A Paradigm Shift

Open source Google-style large scale data analysis with Hadoop

HDP Hadoop From concept to deployment.

A Brief Outline on Bigdata Hadoop

How To Use Big Data For Telco (For A Telco)

Big Data Analytics. Copyright 2011 EMC Corporation. All rights reserved.

The Enterprise Data Hub and The Modern Information Architecture

Session: Big Data get familiar with Hadoop to use your unstructured data Udo Brede Dell Software. 22 nd October :00 Sesión B - DB2 LUW

Cost-Effective Business Intelligence with Red Hat and Open Source

Bringing Big Data to People

So What s the Big Deal?

Big Data and Industrial Internet

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Big Data: Are You Ready? Kevin Lancaster

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Information Builders Mission & Value Proposition

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Open source large scale distributed data management with Google s MapReduce and Bigtable

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Big Data Explained. An introduction to Big Data Science.

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Data-Intensive Programming. Timo Aaltonen Department of Pervasive Computing

HADOOP ADMINISTATION AND DEVELOPMENT TRAINING CURRICULUM

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Architecting for the Internet of Things & Big Data

How To Scale Out Of A Nosql Database

Certified Big Data and Apache Hadoop Developer VS-1221

Enterprise Operational SQL on Hadoop Trafodion Overview

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

Introduction to Analytics and Big Data - Hadoop. Rob Peglar EMC Isilon

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Tap into Hadoop and Other No SQL Sources

Hadoop Ecosystem B Y R A H I M A.

Introduction to Hadoop

Big Data on Microsoft Platform

Big data blue print for cloud architecture

ITG Software Engineering

EMC Federation Big Data Solutions. Copyright 2015 EMC Corporation. All rights reserved.

<Insert Picture Here> Big Data

Peers Techno log ies Pv t. L td. HADOOP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Big Data Are You Ready? Thomas Kyte

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Big Data: Tools and Technologies in Big Data

Big Data and Hadoop for the Executive A Reference Guide

Journal of Environmental Science, Computer Science and Engineering & Technology

Transcription:

Big Data Big Data/Data Analytics & Software Development Danairat T. danairat@gmail.com, 081-559-1446 1

Agenda Big Data Overview Business Cases and Benefits Hadoop Technology Architecture Big Data Development Process Summary 2

Big Data Overview 3

What is Big data Big data analytics is concerned with the analysis of large volumes of transaction/event data and behavioral analysis of human/human a human/system interactions. (Gartner) Big data represents the collection of technologies that handle large data volumes well beyond that inflection point and for which, at least in theory, hardware resources required to manage data by volume track to an almost straight line rather than a curve. (IDC) 4

Structured.. Non Structured Non Structured Level Structured Semi-structured Quasi-structured Unstructured Example Relational database XML data files Text documents Images and video A new class of problems has emerged which demands an ability to accept and manage data without advanced knowledge of its structure or format. 5

Unstructured Data Growth Trends 6

Big Data, An Integrated Architecture Capture Store/Proces s Integrate Organize Analyze Gover n Structured Master & Ref Data Transaction Data Machine Generated Social Media Text, Image Video, Audio DBMS (OLTP) Hadoop Cluster MapReduce Key-Value Data Store DB Replic ETL/ELT ChangeDC Real-Time Unstructured Semistructured Message- Based ODS Data Warehouse Data Marts Streaming (CEP Engine) Reporting & Dashboards Alerting EPM BI Applications Big Data Text Analytics and Search In-Database Analytics Advanced Analytics Visual Discovery Management Security, Governance Source: oracle.com 7

What is Big data SOCIAL BLOG SMART METER 101100101001 001001101010 101011100101 010100100101 VOLUME VELOCITY VARIETY VALUE Source: oracle.com 8

Business Cases and Benefits 9

Hadoop Use Cases CRM, Customer Analysis, Social Marketing Telco: Network Analysis, Quality of Service Public Service: Crime Analysis, Flooding Alert, City Planning Healthcare: Patient Safety, EMR, Next Best Action (NBA) Retail: Everyday Low Price, Offer better quality products, Next Best Action (NBA) Finance: Risk Management, Loan Origination, Credit Line, Wealth Management HCM, Talent Management, Social Analytics, Etc. 10

What does a Big Data World look like? Utilities 0101010101010101010101010101010101010101 0101010101010101010101010101010101010101 0101010101010101010101010101010101010101 What they collect Smart Metering -Monitors power usage How they use it Better demand planning Better targeted marketing Better targeted products based on individuals power needs Big Data means The ability to predict demand at household level Reduce exposure to spot market 11

12 3. Public/Private Hospital executes Health Program integrated EHR/EMR Systems 4. Health Tracking Health check up records 1. Blood Pressure Sleep Tracking 4. Health Tracking Health check up records 1. Blood Pressure Heart rate Tracking 4. Health Tracking Health check up records Government Officer 2. Government creates/revises National Health Program Personal Health Improvement Cloud Suggested Health Improvement (Secured Personal Access) Big Data Doctor Nurse Officer 1. Blood Pressure Coach Tracking Integrated Medical Device Integrated Medical Device Integrated Medical Device Integrated Medical Device Integrated Medical Device

Public Healthcare Management of Outbreak Through Early Detection of Clusters 13

Hadoop Technology 14

What is Hadoop? A scalable fault-tolerant distributed system for data storage and processing Its scalability comes from the marriage of: HDFS: Self-Healing High-Bandwidth Clustered Storage MapReduce: Fault-Tolerant Distributed Processing Operates on structured and complex data A large and active ecosystem (many developers and additions like HBase, Hive, Pig, ) Open source under the Apache License http://wiki.apache.org/hadoop/ apache.org/hadoop/ 15

Hadoop History 2002-2004: Doug Cutting and Mike Cafarella started working on Nutch 2003-2004: Google publishes GFS and MapReduce papers 2004: Cutting adds DFS & MapReduce support to Nutch 2006: Yahoo! hires Cutting, Hadoop spins out of Nutch 2007: NY Times converts 4TB of archives over 100 EC2s 2008: Web-scale deployments at Y!, Facebook, Last.fm April 2008: Yahoo does fastest sort of a TB, 3.5mins over 910 nodes May 2009: Yahoo does fastest sort of a TB, 62secs over 1,460 nodes Yahoo sorts a PB in 16.25hours over 3658 nodes June 2009, Oct 2009: Hadoop Summit, Hadoop World September 2009: Doug Cutting joins Cloudera apache.org/hadoop/ 16

Basic Architecture Client 17

Basic Architecture HDFS Client Name Node 18

Basic Architecture HDFS Client Name Node Data Node Data Node Data Node 19

Basic Architecture Client HDFS Map Reduce Name Node Job Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker 20

Basic Architecture Client HDFS Map Reduce Name Node Job Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker 21

HDFS: Hadoop Distributed File System Block Size = 64MB Replication Factor = 3 Cost/GB is a few /month vs $/month apache.org/hadoop/ 22

MapReduce: Distributed Processing apache.org/hadoop/ 23

Hadoop MapReduce (Job Scheduling/Execution System) HDFS (Hadoop Distributed File System) 24

Hadoop Ecosystem Hive Sqoop Zookepper MapReduce (Job Scheduling/Execution System) HBase HDFS (Hadoop Distributed File System) Flume 25

Use The Right Tool For The Right Job Relational Databases: Hadoop: When to use? Interactive Reporting (<1sec) Multistep Transactions Lots of Inserts/Updates/Deletes When to use? Affordable Storage/Compute Structured or Not (Agility) Resilient Auto Scalability apache.org/hadoop/ 26

Example Thai Content Analytics Thai WordCount Input Content 27

Example Thai Content Analytics Thai WordCount The results 28

Example - Hive Hive enables Hadoop to support SQL for Non-Java developer hive (default)> select * from test_tbl; OK 1 USA 62 Indonesia 63 Philippines 65 Singapore 66 Thailand Time taken: 0.287 seconds hive (default)> Note: Support only text content with column separator 29

Bid Data Development Process 30

Big Data Development Process Guideline Architecture Planning Big Data Development Operation and Support System Evaluation Targeted Users Target Opportunities Data Scientist Data Source/Type Data Capturing Approach Data Processing and Visualize Planning Technology Architecture Big Data EcoSystem (Hadoop Ecosystem) Sizing Integration Security Administration and Operation Planning Develop Use Cases Set up Big Data Pseudo-distribution Mode Set up HDFS Develop Data Capturing System Develop Data Analytic Map Reduce Hive R Etc. Integrate result to Enterprise Analytic System Set up Big Data Cluster Mode Monitor HDFS utilization and capacity planning Monitor Job Tracker availability Monitor Data Capturing System Upgrade or Patch Big Data Hadoop ecosystem System admin. Training Helpdesk Training End-User Training (Analytic Results) Adoption Rates for each analytics results No. of Missing Analytic Results No. of Missing Data Lost hours per month Avg. of each Analytic Result Response Time No. of Technology System Failure per month 31

Summary 32

Big Data, An Integrated Architecture Capture Store/Proces s Integrate Organize Analyze Gover n Structured Master & Ref Data Transaction Data Machine Generated Social Media Text, Image Video, Audio DBMS (OLTP) Hadoop Cluster MapReduce Key-Value Data Store DB Replic ETL/ELT ChangeDC Real-Time Unstructured Semistructured Message- Based ODS Data Warehouse Data Marts Streaming (CEP Engine) Reporting & Dashboards Alerting EPM BI Applications Big Data Text Analytics and Search In-Database Analytics Advanced Analytics Visual Discovery Management Security, Governance Source: oracle.com 33

Thank you very much 34