BIG DATA TRENDS AND TECHNOLOGIES



Similar documents
Microsoft Big Data. Solution Brief

Big Data on Microsoft Platform

Bringing Big Data to People

Microsoft SQL Server 2012 with Hadoop

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

Big Data Analytics with PowerPivot and Power View

Modernizing Your Data Warehouse for Hadoop

Large scale processing using Hadoop. Ján Vaňo

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, Applies to: Microsoft SQL Server Summary:

NZ BI User Group Auckland 18 September, Big Data Analytics with PowerPivot and Power View

The Future of Data Management

How To Scale Out Of A Nosql Database

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop IST 734 SS CHUNG

BIG DATA What it is and how to use?

HDP Hadoop From concept to deployment.

Are You Ready for Big Data?

BIG DATA USING HADOOP

A Brief Outline on Bigdata Hadoop

Big Data and Industrial Internet

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

Hadoop. Sunday, November 25, 12

SQLSaturday #399 Sacramento 25 July, Big Data Analytics with Excel

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

Dell In-Memory Appliance for Cloudera Enterprise

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Designing Self-Service Business Intelligence and Big Data Solutions

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

Real Time Big Data Processing

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Harnessing the Power of the Microsoft Cloud for Deep Data Analytics

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

SOLVING REAL AND BIG (DATA) PROBLEMS USING HADOOP. Eva Andreasson Cloudera

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

BIG DATA TECHNOLOGY. Hadoop Ecosystem

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14

Are You Ready for Big Data?

Open source Google-style large scale data analysis with Hadoop

How To Extend An Enterprise Bio Solution

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

A Survey on Big Data Concepts and Tools

Why Big Data in the Cloud?

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Hadoop Beyond Hype: Complex Adaptive Systems Conference Nov 16, Viswa Sharma Solutions Architect Tata Consultancy Services

Upcoming Announcements

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Application Development. A Paradigm Shift

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

CSE-E5430 Scalable Cloud Computing Lecture 2

Hadoop Ecosystem B Y R A H I M A.

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Please give me your feedback

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Maximizing Hadoop Performance with Hardware Compression

Apache Hadoop: Past, Present, and Future

The Inside Scoop on Hadoop

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Big Data Big Data/Data Analytics & Software Development

Comprehensive Analytics on the Hortonworks Data Platform

Oracle Big Data SQL Technical Update

Course 20467: Designing Self-Service Business Intelligence and Big Data Solutions

<Insert Picture Here> Big Data

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Cloudera Enterprise Reference Architecture for Google Cloud Platform Deployments

Virtualizing Apache Hadoop. June, 2012

How Cisco IT Built Big Data Platform to Transform Data Management

Introducing the Reimagined Power BI Platform. Jen Underwood, Microsoft

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Big Data and Apache Hadoop s MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Hadoop Big Data for Processing Data and Performing Workload

HDP Enabling the Modern Data Architecture

Big Data: Tools and Technologies in Big Data

BIRT in the World of Big Data

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Native Connectivity to Big Data Sources in MSTR 10

THE HADOOP DISTRIBUTED FILE SYSTEM

BBM467 Data Intensive ApplicaAons

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data

The Future of Data Management with Hadoop and the Enterprise Data Hub

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

City Deploys Big Data BI Solution to Improve Lives and Create a Smart-City Template

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

White Paper: What You Need To Know About Hadoop

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Building your Big Data Architecture on Amazon Web Services

Transcription:

BIG DATA TRENDS AND TECHNOLOGIES

THE WORLD OF DATA IS CHANGING Cloud

WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. We are entering the era in which sensors are collecting data in our physical world and delivering it to networks that aggregate and analyze the information. Big data defines us and will increasingly dictate how we live in a fully interconnected world.

WHAT IS BIG DATA? Forrester s Brian Hopkins describes big data as techniques and technologies that make handling data at extreme scale economical.

DIGITAL DATA AGE BEFORE 1990 PHOTOGRAPHS AND VIDEO TAKEN ENTIRE LIFE BY ONE PROFESSIONAL OCCUPIES AROUND 10 GIGABYTES 2010 PHOTOGRAPH AND VIDEO TAKEN ONE YEAR BY ORDINARY PEOPLE TAKES UP ABOUT 5 GB

WHERE DATA COME FROM PHOTOGRAPHS VIDEO MACHINE LOGS RFID READER VEHICLE GPS TRACE RETAIL TRANSACTION FINANCIAL TRANSACTION

DIGITAL DATA FACTS AND FIGURES

STORAGE CAPACITY AND TRANSFER RATE 1990 1,000 MB WITH 4.4 MB/S TRANSFERATE It takes approximately 714 s or 12 minutes to read whole disk. 2011 1,000 GB WITH 135 MB/S TRANSFERATE It takes approximately 7,600 s or 2 Hours to read whole disk.

MORE OXES BETTER THAN ONE BIGGER OX? READ 2 TERABYTE FROM 1 DISK TAKE 4 HOURS WHAT ABOUT READ 1 TB FROM 1,000 DISKS IN PARALLEL TAKE 15 SECONDS

HADOOP Inspired by GOOGLE BIG TABLE and MAP REDUCE papers Circa 2004 created by doug cutting Hadoop Distributed File System - reliable data storage DATA IS DISTRIBUTED AND REPLICATED OVER MULTIPLE MACHINES DESIGNED FOR LARGE FILES (TB, PB, OR LARGER) MapReduce -high-performance parallel data processing

HADOOP DISTRIBUTED FILE SYSTEM

MAP/REDUCE ADVANTAGES SCALABLE Automatically Parallelizes Map & Reduce Operations Supporting 1,000 s of Processors and Petabytes of Data FAULT TOLERANCE Replicated Data in HDFS Failed Jobs Automatically Restarted without Loss of the Rest of Jobs ELASTIC AND FLEXIBLE Degree of Parallelism can be Determined at Runtime Flexible Data Model and Programing AFFORDABLE AND EASY TO USE Open Source and Designed to Work on Commodity Hardware Two Routines : Map & Reduce

HADOOP ARCHITECTURE

HADOOP ADVANTAGES DISTRIBUTED DATA WAS REPLICATED AND PROCESSED ACROSS THE CLUSTER FAULT TOLERANT WHEN NODES FAIL SELF HEALING REBALANCES FILES ACROSS CLUSTER SCALABLE JUST BY ADDING NEW NODES

HADOOP FACTS OPEN SOURCE BATCH / OFF-LINE ORIENTED DATA AND I/O INTENSIVE (READ) HADOOP IS NOT A RELATIONAL DATABASE HADOOP IS NOT AN OLTP SYSTEM AND NOT A STRUCTURED DATA STORE OF ANY KIND

HADOOP STACK HIVE DATA WAREHOUSE PLATFORM ON HADOOP HBASE TABLE STORAGE ON HADOOP CASANDRA DATA STORE ZOO KEEPER ZooKeeper is a centralized service for maintaining configuration information,naming, providing distributed synchronization, and providing group services FLUME, PIG, etc.

WHO S USING HADOOP? TWITTER WHO TO FOLLOW YAHOO SEARCH ASSIST LINKEDIN PEOPLE YOU MAY KNOW YOUTUBE VIDEO SUGGESTIONS FACEBOOK FRIENDS YOU MAY KNOW AND ALMOST EVERYTHING AMAZON, EBAY, GOOGLE

LEVERAGES TRADITIONAL AND NEW CAPABILITIES TRADITIONAL Relational Database Management System NEW Petabyte-Scale Services

Microsoft s approach to Big Data Immersive Insight, Wherever you are Analyze Big Data with familiar tools Immersive insights from any data JavaScript based simple programming Connecting with the World s Data Share your data with the world via Azure Marketplace Enrich with social media data via Social Analytics Advanced analytics with Hadoop Any Data, Any Size Anywhere Simplicity and manageability of Windows to Hadoop Extended data warehousing with Hadoop Scale & elasticity of cloud

MICROSOFT BIG DATA ANALYTIC Hadoop connectors for SQL named SQOOP that enable to move data seamlessly between Hadoop and SQL Server or SQL Server Parallel Data Warehouse. new Hive ODBC Driver and an Excel Hive Add-in that enable customers to move data from Hive directly into Excel, or Microsoft BI tools such as PowerPivot, for analysis.

Key Features Benefits Extending your Enterprise Data Warehouse with hadoop Integration with Microsoft Enterprise Data Warehouses Integration with enterprise BI solutions Deeper insights from structured and unstructured data Microsoft SQL Server connector for Apache Hadoop with SQOOP (SQL to Hadoop) SQL Server Parallel Data Warehouse connector for Apache Hadoop with SQOOP

Key Features Benefits Delivery insights to everyone by enabling big data analysis with familiar end user tools Interaction and analysis of unstructured data in Hadoop from Microsoft Excel Hive add-in for Excel

Key Features Benefits Unlocking new insights from all data with Microsoft BI tools Familiar BI tools with structured and unstructured data Hive ODBC Driver integrates Hadoop to SQL Server Analysis Services, PowerPivot, and Power View

Key Features Benefits Simplifying programming on hadoop with JavaScript MapReduce programs in JavaScript Simplified programming Simplified deployment of MapReduce jobs JS New JavaScript libraries for Hadoop Deploy JavaScript Hadoop jobs from a simple web browser

Key Features Benefits Providing Choice of Deployment options Elastic peta-scale analytics on Microsoft s cloud platform Enterprise-class Big Data platform on-premises Hadoop-based Service on Windows Azure platform Hadoop-based distribution on Windows Server

Key Features Benefits Connects Hadoop to the world via Windows Azure Marketplace Sharing of data and insights through Windows Azure Marketplace Mashing up of internal and public data sets via Data Explorer Integration with Windows Azure Marketplace Integration with thirdparty data and services

Key Features Benefits Simplicity and manageability of windows to hadoop Simplified management of Hadoop on Windows Enterprise-class security Easy setup on-premises and in the cloud Smart packaging of Hadoop on premises Integration with Microsoft System Center Integration with Windows Server Active Directory Fast deployment of Hadoop on Azure

A holistic BIG DATA Solution from Microsoft spanning relational and non-relational Worlds SELF-SERVICE DISCOVER AND RECOMMEND OPERATIONAL PREDICTIVE INSIGHTS DATA ENRICHMENT TRANSFORM AND CLEAN MOBILE REAL-TIME COLLABORATIV E SHARE AND GOVERN MARKETPLACE External Data and Services DATA MANAGEMENT 1 01 0 1 RELATIONAL NON-RELATIONAL MULTIDIMENSIONAL STREAMING

Hadoop on Windows & Azure: Roadmap INSIGHTS Excel Integration Preview 2 Hive Add-in for Excel PowerPivot Add-in for Excel Power View for SharePoint DATA ENRICHMENT Hadoop Connectors Azure Data Market Hive ODBC Driver Preview 2 Azure Labs Data Explorer Social Analytics Data Hub (Private Data Market) DATA MANAGEMENT Hadoop on Azure Private CTP Hadoop on Server Private TAP Hadoop Core & Common JavaScript Framework Hadoop on Azure Preview 2 More capacity Disaster Recovery for HDFS Support for Mahout Hadoop on Azure GA Portal Integration & Billing Azure SDK integration Hadoop on Server GA JavaScript, PIG, Hive, Hbase Active Directory Integration Systems Center Integration CY H2 2011 2012 29

Resource : http://www.microsoft.com/bigdata http://hadoop.apache.org http://www.cloudera.com http://www.youtube.com https://www.hadooponazure.com/