Big Data on Microsoft Platform

Size: px
Start display at page:

Download "Big Data on Microsoft Platform"

Transcription

1 Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1

2 Contents 1. What is Big Data? Characteristics of Big Data Enter Hadoop Microsoft Big Data Solutions Microsoft Big Data Solutions for Hadoop Hadoop Data ImportExport on Microsoft Platform Conclusion...6 Page 2

3 What is Big Data? With the time Data is just growing and the growth is unprecedented. There were times when enterprise level organizations used to deal with few GB s of data which itself was huge and challenging to manage. Now the time has changed where needs of managing data has moved from Terabytes to Petabytes. Big data is data which is Big in size to manage. It s a word applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage and process the data within certain elapsed time. Big data is in some cases new and in some cases it is been there already. It is all about and around data explosion, large volumes but it also has other facets. When we are talking about Big data, it is about structured and unstructured data. In many cases it is unstructured or semi structured like for example weblogs, data coming from machines, plants or energy censers RFID, social data, call detail records, astronomy, atmospheric science, genomics, biogeochemical, biological, medical records, photography or video archives..etc., There are those types of data on which people want to analyze and make decisions. Gartner says on the world of data "By 2012 organization that build a modern information management system will outperform their peers financially by 20%" Characteristics of Big Data There are multiple facets of Big Data which are typically called as 4V s of Big data. These typically characterize Big Data. Volume: Exceeds Physical limits of vertical scalability Velocity: Decision window small compared to data change rate Variety: Many different formats make integration expensive Variability: It s not same as Variety. Many Options or variable interpretations confound analysis. A typical Big Data patterns are where it starts with holding data in a Digital box like storage where all potentially valuable data retained then analyzed later. Once data stored then mine the data for insight and feed to the downstream systems. Use this Information production as a factory for consumable feeds and discover, enrich and publish the data. Finally optimize the loop by analyzing ambient data, monitor the system and optimize the behavior. Enter Hadoop Big data is based on Hadoop. This is the most visible face of Big data. Hadoop is a technology developed in open source Apache community which enables analysis on semi structured and structured data where data is distributed across commodity servers with in a cluster. Page 3

4 Two major observations of Hadoop are Storage and processing Storage (Hadoop Distributed File System layer) HDFS is all about storage. It is basically Infrastructure which hadoop implemented to store the data across multiple nodes. HDFS also takes care of application scenarios to make it resilience to server or node failures or job failures. Processing (Map reduce layer) Every query in the Hadoop system is a job and this job creates a bunch of tasks. These tasks will process the data, shuffle the results, reduce them, aggregate them and then bring the results back to the client applications. Map Reduce layer consists of Job tracker and task tracker which works in tandem with HDFS. Beauty of hadoop is it distributes the data and calculates it locally then brings the results back. Hadoop = HDFS + MapReduce Microsoft Big Data Solutions Microsoft provided a range of solutions to address big data challenges. Data Warehouse solutions from SQL Server 2008 R2/2012, Fast Track Data Warehouse, Business Data Warehouse, Parallal Data Warehouse offers a robust and scalable platform for storing and analyzing data in a traditional data warehouse. Among these all PDW offers customers: Enterprise-class performance that handles massive volumes to over 600 TB. Microsoft provide LINQ to HPC which enables the development and deployment of data intensive applications using technologies such as.net and Language Integrated Query (LINQ), to express their data analytics algorithms. LINQ to HPC can integrate with SQL Server 2008 R2 and the portfolio of BI offerings from Microsoft such as SSRS, SSAS, Power Pivot, and Excel. Microsoft Big Data Solutions for Hadoop Hadoop is not a Microsoft solution. In order to work with Apache Hadoop, Microsoft has come up with three Hadoop Deployment models on Windows Platform. 1. On premise 2. On Cloud 3. Elastic MapReduce Page 4

5 Figure 1: Microsoft Big Data Solution On premise distribution: This is a dedicated on premise cluster. If you want to run Hadoop on a standalone or distributed windows servers located in your own infrastructure, then Hadoop services for on premise distribution provides such option. This distribution has great things like integration with your active directory, System center that makes it enterprise manageable setup. On cloud distribution: This is a Dedicated Cloud Cluster. Hadoop services would be hosted on Cloud and will not go away. This distribution provides some elasticity based on how you setup storage and partitioning. Basically this is all about running Hadoop on azure. Hadoop based services for windows azure are available through invitation. More on this can be found at Elastic MapReduce (EMR): Setting up Apache Hadoop in your infrastructure is not simple. EMR makes this easy and also Elastic MapReduce model is ultimate elastic. If you want to get the hadoop cluster up and running very fast, run the calculations, pull back the results then do not care anymore. It s a simple management deployment process. It is all self-managed on the azure but you will be managing actual processing. Hadoop Data ImportExport on Microsoft Platform Data can be imported or exported from Hadoop storage system in various ways from Microsoft Platform. For proceeding with Data transfer, FTPS and ODBC server ports must be opened where FTPS Page 5

6 port 2226 is required if want to remote desktop and work on the Hadoop system. ODBC Server Port is for connecting from end user tools like Power view, Power Pivot and Excel. Once the ports are open, through various interfaces data can be brought in to the cluster. For example from Manage cluster interface, data can be bought in from windows azure blog storage and Data markets. Couples of other ways are JavaScript console and SQOOP connectors. Interacting with Hadoop Traditional way to interact with Hadoop was to write your own jobs written in JAVA. If you are not interested in going to the level of understanding how the MapReduce job been executed then you can use javascript. JavaScript is an interactive development environment along with charting and graphing. HIVE and PIG interfaces would help in interacting with Hadoop. SQOOP connectors would help in transferring data between relational and Hadoop storage systems. HIVE interface would interest to the Data professionals because it uses a language called HIVQL. We can create structures over the data where will create Meta data layer. For example Weblogs would structure in the table format with which you can interact with common tools like Excel, Power View or Power Pivot. HIVE in other words is a Data warehouse system for hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. More details can be found on Hive project, examples, documentation and use cases at PIG is another data flow language but more like a scripting language to transform and analyze large data sets on Hadoop. It is an abstraction layer to interact with Hadoop. Here you write scripts that get converted to MapReduce Jobs. Those who are familiar with SQL love and embrace HIVE first. Those who are familiar with scripting language would love and embrace PIG first. PIG is a data-flow platform to transform and analyze large data sets on Hadoop. It can be extensible through user defined functions and methods. More can be learned from You can use other languages like Perl to get the data and work on. SQOOP helps in pulling data from SQL into Hadoop or pull data from Hadoop into SQL. Every SQOOP connector is bi-directional. It allows executing MapReduce jobs to transfer data in parallel with fault tolerance in the HDFS. These connectors can be downloaded from Conclusion Three flavored Microsoft Hadoop distributions full fill the various analyses and decision support needs on Unstructured and semi structured data. Integration of Semi and unstructured data with the RDBMS through SQOOP connecters extend the analysis scope. With the various connectors for Microsoft SQL server, Parallel Data warehousing along with end user ODBC connectors, Microsoft made Hadoop accessible to a broader class of end users, developers and IT professionals. Page 6

7 References Figure1 source: F8BC1B62D397/MicrosoftBigDataSolutionSheet.pdf Page 7

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

Microsoft Big Data. Solution Brief

Microsoft Big Data. Solution Brief Microsoft Big Data Solution Brief Contents Introduction... 2 The Microsoft Big Data Solution... 3 Key Benefits... 3 Immersive Insight, Wherever You Are... 3 Connecting with the World s Data... 3 Any Data,

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Introduction to Big Data and the Lambda Architecture

Introduction to Big Data and the Lambda Architecture Introduction to Big Data and the Lambda Architecture Marc Schöni Meinrad Weiss April 2014 BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 1 What

More information

The Inside Scoop on Hadoop

The Inside Scoop on Hadoop The Inside Scoop on Hadoop Orion Gebremedhin National Solutions Director BI & Big Data, Neudesic LLC. VTSP Microsoft Corp. Orion.Gebremedhin@Neudesic.COM B-orgebr@Microsoft.com @OrionGM The Inside Scoop

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal BIG DATA AND MICROSOFT Susie Adams CTO Microsoft Federal THE WORLD OF DATA IS CHANGING Cloud What s making this possible? Electrical efficiency of computers doubles every year and ½. Laptops and mobile

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop

Lecture 32 Big Data. 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop Lecture 32 Big Data 1. Big Data problem 2. Why the excitement about big data 3. What is MapReduce 4. What is Hadoop 5. Get started with Hadoop 1 2 Big Data Problems Data explosion Data from users on social

More information

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP

TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP Pythian White Paper TAMING THE BIG CHALLENGE OF BIG DATA MICROSOFT HADOOP ABSTRACT As companies increasingly rely on big data to steer decisions, they also find themselves looking for ways to simplify

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Modernizing Your Data Warehouse for Hadoop

Modernizing Your Data Warehouse for Hadoop Modernizing Your Data Warehouse for Hadoop Big data. Small data. All data. Audie Wright, DW & Big Data Specialist Audie.Wright@Microsoft.com O 425-538-0044, C 303-324-2860 Unlock Insights on Any Data Taking

More information

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov

An Industrial Perspective on the Hadoop Ecosystem. Eldar Khalilov Pavel Valov An Industrial Perspective on the Hadoop Ecosystem Eldar Khalilov Pavel Valov agenda 03.12.2015 2 agenda Introduction 03.12.2015 2 agenda Introduction Research goals 03.12.2015 2 agenda Introduction Research

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Please give me your feedback

Please give me your feedback Please give me your feedback Session BB4089 Speaker Claude Lorenson, Ph. D and Wendy Harms Use the mobile app to complete a session survey 1. Access My schedule 2. Click on this session 3. Go to Rate &

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Bringing Big Data to People

Bringing Big Data to People Bringing Big Data to People Microsoft s modern data platform SQL Server 2014 Analytics Platform System Microsoft Azure HDInsight Data Platform Everyone should have access to the data they need. Process

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

BIG DATA IS MESSY PARTNER WITH SCALABLE

BIG DATA IS MESSY PARTNER WITH SCALABLE BIG DATA IS MESSY PARTNER WITH SCALABLE SCALABLE SYSTEMS HADOOP SOLUTION WHAT IS BIG DATA? Each day human beings create 2.5 quintillion bytes of data. In the last two years alone over 90% of the data on

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved

Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software?

Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? Hadoop 只 支 援 用 Java 開 發 嘛? Is Hadoop only support Java? 總 不 能 全 部 都 重 新 設 計 吧? 如 何 與 舊 系 統 相 容? Can Hadoop work with existing software? 可 以 跟 資 料 庫 結 合 嘛? Can Hadoop work with Databases? 開 發 者 們 有 聽 到

More information

Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data

Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data Hive vs. JavaScript for Processing Big Data For some time Microsoft didn t offer a solution for processing big data in cloud environments. SQL Server is good for storage, but its ability to analyze terabytes

More information

Information Architecture

Information Architecture The Bloor Group Actian and The Big Data Information Architecture WHITE PAPER The Actian Big Data Information Architecture Actian and The Big Data Information Architecture Originally founded in 2005 to

More information

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved. Collaborative Big Data Analytics 1 Big Data Is Less About Size, And More About Freedom TechCrunch!!!!!!!!! Total data: bigger than big data 451 Group Findings: Big Data Is More Extreme Than Volume Gartner!!!!!!!!!!!!!!!

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Using Tableau Software with Hortonworks Data Platform

Using Tableau Software with Hortonworks Data Platform Using Tableau Software with Hortonworks Data Platform September 2013 2013 Hortonworks Inc. http:// Modern businesses need to manage vast amounts of data, and in many cases they have accumulated this data

More information

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal

Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal Business Analytics In a Big Data World Ted Malone Solutions Architect Data Platform and Cloud Microsoft Federal Information has gone from scarce to super-abundant. That brings huge new benefits. The Economist

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi

Getting Started with Hadoop. Raanan Dagan Paul Tibaldi Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop

More information

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary:

Whitepaper: Solution Overview - Breakthrough Insight. Published: March 7, 2012. Applies to: Microsoft SQL Server 2012. Summary: Whitepaper: Solution Overview - Breakthrough Insight Published: March 7, 2012 Applies to: Microsoft SQL Server 2012 Summary: Today s Business Intelligence (BI) platform must adapt to a whole new scope,

More information

Testing Big data is one of the biggest

Testing Big data is one of the biggest Infosys Labs Briefings VOL 11 NO 1 2013 Big Data: Testing Approach to Overcome Quality Challenges By Mahesh Gudipati, Shanthi Rao, Naju D. Mohan and Naveen Kumar Gajja Validate data quality by employing

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015

Step by Step: Big Data Technology. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Step by Step: Big Data Technology Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 25 August 2015 Data Sources IT Infrastructure Analytics 2 B y 2015, 20% of Global 1000 organizations

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

Big Data Course Highlights

Big Data Course Highlights Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like

More information

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics

Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Navigating the Big Data infrastructure layer Helena Schwenk

Navigating the Big Data infrastructure layer Helena Schwenk mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse

SQL Server 2012 PDW. Ryan Simpson Technical Solution Professional PDW Microsoft. Microsoft SQL Server 2012 Parallel Data Warehouse SQL Server 2012 PDW Ryan Simpson Technical Solution Professional PDW Microsoft Microsoft SQL Server 2012 Parallel Data Warehouse Massively Parallel Processing Platform Delivers Big Data HDFS Delivers Scale

More information

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh

Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets

More information

Big Data for the Rest of Us Technical White Paper

Big Data for the Rest of Us Technical White Paper Big Data for the Rest of Us Technical White Paper Treasure Data - Big Data for the Rest of Us 1 Introduction The importance of data warehousing and analytics has increased as companies seek to gain competitive

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.

SQL Server 2014. What s New? Christopher Speer. Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft. SQL Server 2014 What s New? Christopher Speer Technology Solution Specialist (SQL Server, BizTalk Server, Power BI, Azure) v-cspeer@microsoft.com The evolution of the Microsoft data platform What s New

More information

BIG DATA What it is and how to use?

BIG DATA What it is and how to use? BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14

More information

Implement Hadoop jobs to extract business value from large and varied data sets

Implement Hadoop jobs to extract business value from large and varied data sets Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to

More information

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science

Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science A Seminar report On Hadoop Submitted in partial fulfillment of the requirement for the award of degree of Bachelor of Technology in Computer Science SUBMITTED TO: www.studymafia.org SUBMITTED BY: www.studymafia.org

More information

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com;

Microsoft Big Data Solutions. Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Microsoft Big Data Solutions Anar Taghiyev P-TSP E-mail: b-anarta@microsoft.com; Why/What is Big Data and Why Microsoft? Options of storage and big data processing in Microsoft Azure. Real Impact of Big

More information

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap

Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau

Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau Big Data Analytics on Cab Company s Customer Dataset Using Hive and Tableau Dipesh Bhawnani 1, Ashish Sanwlani 2, Haresh Ahuja 3, Dimple Bohra 4 1,2,3,4 Vivekanand Education, India Abstract Project focuses

More information

Understanding Microsoft s BI Tools

Understanding Microsoft s BI Tools Understanding Microsoft s BI Tools The purpose of this document is to provide a high level understanding of what tools Microsoft has to support the concepts of data warehousing, business intelligence,

More information

Modern Data Warehousing

Modern Data Warehousing Modern Data Warehousing Cem Kubilay Microsoft CEE, Turkey & Israel Time is FY15 Gartner Survey April 2014 Piloting on premise 15% 10% 4% 14% 57% 2014 5% think Hadoop will replace existing DW solution (2013:

More information

Advanced Big Data Analytics with R and Hadoop

Advanced Big Data Analytics with R and Hadoop REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze

Data Warehouse and Hive. Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Data Warehouse and Hive Presented By: Shalva Gelenidze Supervisor: Nodar Momtselidze Decision support systems Decision Support Systems allowed managers, supervisors, and executives to once again see the

More information

Big Data for Investment Research Management

Big Data for Investment Research Management IDT Partners www.idtpartners.com Big Data for Investment Research Management Discover how IDT Partners helps Financial Services, Market Research, and Investment Management firms turn big data into actionable

More information

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current

More information

How Companies are! Using Spark

How Companies are! Using Spark How Companies are! Using Spark And where the Edge in Big Data will be Matei Zaharia History Decreasing storage costs have led to an explosion of big data Commodity cluster software, like Hadoop, has made

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014

Big Data. White Paper. Big Data Executive Overview WP-BD-10312014-01. Jafar Shunnar & Dan Raver. Page 1 Last Updated 11-10-2014 White Paper Big Data Executive Overview WP-BD-10312014-01 By Jafar Shunnar & Dan Raver Page 1 Last Updated 11-10-2014 Table of Contents Section 01 Big Data Facts Page 3-4 Section 02 What is Big Data? Page

More information

Big Data and Industrial Internet

Big Data and Industrial Internet Big Data and Industrial Internet Keijo Heljanko Department of Computer Science and Helsinki Institute for Information Technology HIIT School of Science, Aalto University keijo.heljanko@aalto.fi 16.6-2015

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

Are You Ready for Big Data?

Are You Ready for Big Data? Are You Ready for Big Data? Jim Gallo National Director, Business Analytics April 10, 2013 Agenda What is Big Data? How do you leverage Big Data in your company? How do you prepare for a Big Data initiative?

More information

Virtualizing Apache Hadoop. June, 2012

Virtualizing Apache Hadoop. June, 2012 June, 2012 Table of Contents EXECUTIVE SUMMARY... 3 INTRODUCTION... 3 VIRTUALIZING APACHE HADOOP... 4 INTRODUCTION TO VSPHERE TM... 4 USE CASES AND ADVANTAGES OF VIRTUALIZING HADOOP... 4 MYTHS ABOUT RUNNING

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges Prerita Gupta Research Scholar, DAV College, Chandigarh Dr. Harmunish Taneja Department of Computer Science and

More information

Big Data: Overview and Roadmap. 2015 eglobaltech. All rights reserved.

Big Data: Overview and Roadmap. 2015 eglobaltech. All rights reserved. Big Data: Overview and Roadmap 2015 eglobaltech. All rights reserved. What is Big Data? Large volumes of complex and variable data that require advanced techniques and technologies to enable capture, storage,

More information

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look

IBM BigInsights Has Potential If It Lives Up To Its Promise. InfoSphere BigInsights A Closer Look IBM BigInsights Has Potential If It Lives Up To Its Promise By Prakash Sukumar, Principal Consultant at iolap, Inc. IBM released Hadoop-based InfoSphere BigInsights in May 2013. There are already Hadoop-based

More information

Data Governance in the Hadoop Data Lake. Michael Lang May 2015

Data Governance in the Hadoop Data Lake. Michael Lang May 2015 Data Governance in the Hadoop Data Lake Michael Lang May 2015 Introduction Product Manager for Teradata Loom Joined Teradata as part of acquisition of Revelytix, original developer of Loom VP of Sales

More information

Course MS20467C Designing Self-Service Business Intelligence and Big Data Solutions

Course MS20467C Designing Self-Service Business Intelligence and Big Data Solutions 3 Riverchase Office Plaza Hoover, Alabama 35244 Phone: 205.989.4944 Fax: 855.317.2187 E-Mail: rwhitney@discoveritt.com Web: www.discoveritt.com Course MS20467C Designing Self-Service Business Intelligence

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

SQLSaturday #399 Sacramento 25 July, 2015. Big Data Analytics with Excel

SQLSaturday #399 Sacramento 25 July, 2015. Big Data Analytics with Excel SQLSaturday #399 Sacramento 25 July, 2015 Big Data Analytics with Excel Presenter Introduction Peter Myers Independent BI Expert Bitwise Solutions BBus, SQL Server MCSE, SQL Server MVP since 2007 Experienced

More information

What is a Petabyte? Gain Big or Lose Big; Measuring the Operational Risks of Big Data. Agenda

What is a Petabyte? Gain Big or Lose Big; Measuring the Operational Risks of Big Data. Agenda April - April - Gain Big or Lose Big; Measuring the Operational Risks of Big Data YouTube video here http://www.youtube.com/watch?v=o7uzbcwstu April, 0 Steve Woolley, Sr. Manager Business Continuity Dennis

More information

Ubuntu and Hadoop: the perfect match

Ubuntu and Hadoop: the perfect match WHITE PAPER Ubuntu and Hadoop: the perfect match February 2012 Copyright Canonical 2012 www.canonical.com Executive introduction In many fields of IT, there are always stand-out technologies. This is definitely

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY

DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Big Data Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 tom.haughey@infomodelusa.com

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

Course 20467: Designing Self-Service Business Intelligence and Big Data Solutions

Course 20467: Designing Self-Service Business Intelligence and Big Data Solutions Course 20467: Designing Self-Service Business Intelligence and Big Data Solutions Type:Course Audience(s):IT Professionals Technology:Microsoft SQL Server Level:300 This Revision:C Delivery method: Instructor-led

More information

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo

Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Sensor Network Messaging Service Hive/Hadoop Mr. Apichon Witayangkurn apichon@iis.u-tokyo.ac.jp Department of Civil Engineering The University of Tokyo Contents 1 Introduction 2 What & Why Sensor Network

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Big Data Analytics OverOnline Transactional Data Set

Big Data Analytics OverOnline Transactional Data Set Big Data Analytics OverOnline Transactional Data Set Rohit Vaswani 1, Rahul Vaswani 2, Manish Shahani 3, Lifna Jos(Mentor) 4 1 B.E. Computer Engg. VES Institute of Technology, Mumbai -400074, Maharashtra,

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information