BIG DATA TRENDS AND TECHNOLOGIES
THE WORLD OF DATA IS CHANGING Cloud
WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools. Difficulties include capture, storage, search, sharing, analytics, and visualizing. We are entering the era in which sensors are collecting data in our physical world and delivering it to networks that aggregate and analyze the information. Big data defines us and will increasingly dictate how we live in a fully interconnected world.
WHAT IS BIG DATA? Forrester s Brian Hopkins describes big data as techniques and technologies that make handling data at extreme scale economical.
DIGITAL DATA AGE BEFORE 1990 PHOTOGRAPHS AND VIDEO TAKEN ENTIRE LIFE BY ONE PROFESSIONAL OCCUPIES AROUND 10 GIGABYTES 2010 PHOTOGRAPH AND VIDEO TAKEN ONE YEAR BY ORDINARY PEOPLE TAKES UP ABOUT 5 GB
WHERE DATA COME FROM PHOTOGRAPHS VIDEO MACHINE LOGS RFID READER VEHICLE GPS TRACE RETAIL TRANSACTION FINANCIAL TRANSACTION
DIGITAL DATA FACTS AND FIGURES
STORAGE CAPACITY AND TRANSFER RATE 1990 1,000 MB WITH 4.4 MB/S TRANSFERATE It takes approximately 714 s or 12 minutes to read whole disk. 2011 1,000 GB WITH 135 MB/S TRANSFERATE It takes approximately 7,600 s or 2 Hours to read whole disk.
MORE OXES BETTER THAN ONE BIGGER OX? READ 2 TERABYTE FROM 1 DISK TAKE 4 HOURS WHAT ABOUT READ 1 TB FROM 1,000 DISKS IN PARALLEL TAKE 15 SECONDS
HADOOP Inspired by GOOGLE BIG TABLE and MAP REDUCE papers Circa 2004 created by doug cutting Hadoop Distributed File System - reliable data storage DATA IS DISTRIBUTED AND REPLICATED OVER MULTIPLE MACHINES DESIGNED FOR LARGE FILES (TB, PB, OR LARGER) MapReduce -high-performance parallel data processing
HADOOP DISTRIBUTED FILE SYSTEM
MAP/REDUCE ADVANTAGES SCALABLE Automatically Parallelizes Map & Reduce Operations Supporting 1,000 s of Processors and Petabytes of Data FAULT TOLERANCE Replicated Data in HDFS Failed Jobs Automatically Restarted without Loss of the Rest of Jobs ELASTIC AND FLEXIBLE Degree of Parallelism can be Determined at Runtime Flexible Data Model and Programing AFFORDABLE AND EASY TO USE Open Source and Designed to Work on Commodity Hardware Two Routines : Map & Reduce
HADOOP ARCHITECTURE
HADOOP ADVANTAGES DISTRIBUTED DATA WAS REPLICATED AND PROCESSED ACROSS THE CLUSTER FAULT TOLERANT WHEN NODES FAIL SELF HEALING REBALANCES FILES ACROSS CLUSTER SCALABLE JUST BY ADDING NEW NODES
HADOOP FACTS OPEN SOURCE BATCH / OFF-LINE ORIENTED DATA AND I/O INTENSIVE (READ) HADOOP IS NOT A RELATIONAL DATABASE HADOOP IS NOT AN OLTP SYSTEM AND NOT A STRUCTURED DATA STORE OF ANY KIND
HADOOP STACK HIVE DATA WAREHOUSE PLATFORM ON HADOOP HBASE TABLE STORAGE ON HADOOP CASANDRA DATA STORE ZOO KEEPER ZooKeeper is a centralized service for maintaining configuration information,naming, providing distributed synchronization, and providing group services FLUME, PIG, etc.
WHO S USING HADOOP? TWITTER WHO TO FOLLOW YAHOO SEARCH ASSIST LINKEDIN PEOPLE YOU MAY KNOW YOUTUBE VIDEO SUGGESTIONS FACEBOOK FRIENDS YOU MAY KNOW AND ALMOST EVERYTHING AMAZON, EBAY, GOOGLE
LEVERAGES TRADITIONAL AND NEW CAPABILITIES TRADITIONAL Relational Database Management System NEW Petabyte-Scale Services
Microsoft s approach to Big Data Immersive Insight, Wherever you are Analyze Big Data with familiar tools Immersive insights from any data JavaScript based simple programming Connecting with the World s Data Share your data with the world via Azure Marketplace Enrich with social media data via Social Analytics Advanced analytics with Hadoop Any Data, Any Size Anywhere Simplicity and manageability of Windows to Hadoop Extended data warehousing with Hadoop Scale & elasticity of cloud
MICROSOFT BIG DATA ANALYTIC Hadoop connectors for SQL named SQOOP that enable to move data seamlessly between Hadoop and SQL Server or SQL Server Parallel Data Warehouse. new Hive ODBC Driver and an Excel Hive Add-in that enable customers to move data from Hive directly into Excel, or Microsoft BI tools such as PowerPivot, for analysis.
Key Features Benefits Extending your Enterprise Data Warehouse with hadoop Integration with Microsoft Enterprise Data Warehouses Integration with enterprise BI solutions Deeper insights from structured and unstructured data Microsoft SQL Server connector for Apache Hadoop with SQOOP (SQL to Hadoop) SQL Server Parallel Data Warehouse connector for Apache Hadoop with SQOOP
Key Features Benefits Delivery insights to everyone by enabling big data analysis with familiar end user tools Interaction and analysis of unstructured data in Hadoop from Microsoft Excel Hive add-in for Excel
Key Features Benefits Unlocking new insights from all data with Microsoft BI tools Familiar BI tools with structured and unstructured data Hive ODBC Driver integrates Hadoop to SQL Server Analysis Services, PowerPivot, and Power View
Key Features Benefits Simplifying programming on hadoop with JavaScript MapReduce programs in JavaScript Simplified programming Simplified deployment of MapReduce jobs JS New JavaScript libraries for Hadoop Deploy JavaScript Hadoop jobs from a simple web browser
Key Features Benefits Providing Choice of Deployment options Elastic peta-scale analytics on Microsoft s cloud platform Enterprise-class Big Data platform on-premises Hadoop-based Service on Windows Azure platform Hadoop-based distribution on Windows Server
Key Features Benefits Connects Hadoop to the world via Windows Azure Marketplace Sharing of data and insights through Windows Azure Marketplace Mashing up of internal and public data sets via Data Explorer Integration with Windows Azure Marketplace Integration with thirdparty data and services
Key Features Benefits Simplicity and manageability of windows to hadoop Simplified management of Hadoop on Windows Enterprise-class security Easy setup on-premises and in the cloud Smart packaging of Hadoop on premises Integration with Microsoft System Center Integration with Windows Server Active Directory Fast deployment of Hadoop on Azure
A holistic BIG DATA Solution from Microsoft spanning relational and non-relational Worlds SELF-SERVICE DISCOVER AND RECOMMEND OPERATIONAL PREDICTIVE INSIGHTS DATA ENRICHMENT TRANSFORM AND CLEAN MOBILE REAL-TIME COLLABORATIV E SHARE AND GOVERN MARKETPLACE External Data and Services DATA MANAGEMENT 1 01 0 1 RELATIONAL NON-RELATIONAL MULTIDIMENSIONAL STREAMING
Hadoop on Windows & Azure: Roadmap INSIGHTS Excel Integration Preview 2 Hive Add-in for Excel PowerPivot Add-in for Excel Power View for SharePoint DATA ENRICHMENT Hadoop Connectors Azure Data Market Hive ODBC Driver Preview 2 Azure Labs Data Explorer Social Analytics Data Hub (Private Data Market) DATA MANAGEMENT Hadoop on Azure Private CTP Hadoop on Server Private TAP Hadoop Core & Common JavaScript Framework Hadoop on Azure Preview 2 More capacity Disaster Recovery for HDFS Support for Mahout Hadoop on Azure GA Portal Integration & Billing Azure SDK integration Hadoop on Server GA JavaScript, PIG, Hive, Hbase Active Directory Integration Systems Center Integration CY H2 2011 2012 29
Resource : http://www.microsoft.com/bigdata http://hadoop.apache.org http://www.cloudera.com http://www.youtube.com https://www.hadooponazure.com/