Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012"

Transcription

1 Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

2 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation of technology to handle it. And, with new technologies come new buzzwords: acronyms, technical terms, product names, etc. Even the phrase "big data" itself can be confusing. Many think of "lots of data" when they hear it, but big data is much more than just data volume. Here, in alphabetical order, are some of the buzzwords we think you need to be familiar with.

3 ACID An acronym for Atomicity, Consistency, Isolation and Durability, ACID is a set of requirements or properties that, when adhered to, ensure the data integrity of database transactions during processing. While ACID has been around for a while, the explosion in transaction data volumes has focused more attention on the need for meeting ACID provisions when working with big data.

4 Big Data IT systems today pump out data that's "big" on volume, velocity and variety. Volume: IDC estimates that the volume of world information will reach 2.7 zettabytes this year (that's 2.7 billion terabytes) and that's doubling every two years. Velocity: It's not just the amount of data that's causing headaches for IT managers, but the increasingly rapid speed at which data is flowing from financial systems, retail systems, websites, sensors, RFID chips and social networks like Facebook, Twitter, etc. Variety: Going back five, maybe 10 years, IT mostly dealt with alphanumeric data that was easy to store in neat rows and columns in relational databases. No longer. Today, unstructured data, such as Tweets and Facebook posts, documents, Web content and so on, is all part of the big data mix.

5 Columnar (or Column-Oriented) Database Some new-generation databases (such as the open-source Cassandra and HP's Vertica) are designed to store data by column rather than by row as traditional SQL databases do. Their design provides faster disk access, improving their performance when handling big data. Columnar databases are especially popular for data-intensive business analytics applications.

6 Data Warehousing The concept of data warehousing, copying data from multiple operational IT systems into a secondary, off-line database for business analytics applications, has been around for about 25 years. But as data volumes explode, data warehouse systems are rapidly changing. They need to store more data --and more kinds of data -- making their management a challenge. And where 10 or 20 years ago data might have been copied into a data warehouse system on a weekly or monthly basis, data warehouses today are refreshed far more frequently with some even updated in real time.

7 ETL Extract, transform and load (ETL) software is used when moving data from one database, such as one supporting a banking application transaction processing system, to another, such as a data warehouse system used for business analytics. Data often needs to be reformatted and cleaned up when being transferred from one database to another. The performance demands on ETL tools have increased as data volumes have grown exponentially and data processing speeds have accelerated.

8 Flume Flume, a technology in the Apache Hadoopfamily (others include HBase, Hive, Oozie, Pig and Whirr), is a framework for populating Hadoopwith data. The technology uses agents scattered across application servers, Web servers, mobile devices and other systems to collect data and transfer it to a Hadoopsystem. A business, for example, could use Apache Flume running on a Web server to collect data from Twitter posts for analysis.

9 Geospatial Analysis One trend fueling big data is the increasing volume of geospatial data being generated and collected by IT systems today. A picture may be worth 1,000 words, so it's no surprise the growing number of maps, charts, photographs and other geographicbased content is a major driver of today's big data explosion. Geospatial analysis is a specific form of data visualization (see "V" for visualization) that overlays data on geographical maps to help users better understand the results of big data analysis.

10 Hadoop Hadoopis an open-source platform for developing distributed, data-intensive applications. It's controlled by the Apache Software Foundation. Hadoopwas created by Yahoo developer Doug Cutting, who based it on Google Labs' MapReduceconcept and named it after his infant son's toy elephant. Bonus "H" entries, or HBase, is a nonrelational database developed as part of the Hadoopproject. The Hadoop Distributed Filesystem(HDFS) is a key component of Hadoop. And, Hive is a data warehouse system built on Hadoop.

11 In-Memory Database Computers generally retrieve data from disk drives as they process transactions or perform queries. But, that can be too slow when IT systems are working with big data. In-memory database systems utilize a computer's main memory to store frequently used data, greatly reducing processing times. In-memory database products include SAP HANA and the Oracle Times Ten In-Memory Database.

12 Java Java is a programming language developed at Sun Microsystems and released in Hadoop and a number of other big data technologies were built using Java, and it remains a dominant development technology in the big data world.

13 Kafka Kafka is a high-throughput, distributed messaging system originally developed at LinkedIn to manage the service's activity stream (data about a Website's usage) and operational data processing pipeline (about the performance of server components). Kafka is effective for processing large volumes of streaming data -- a key issue in many big data computing environments. Storm, developed by Twitter, is another stream-processing technology that's catching on. The Apache Software Foundation has taken Kafka on as an open-source project. No jokes about buggy software, please...

14 Latency Latency is the delay when data is being delivered from one point to another or the amount of delay for a system, such as an application, to respond to another. While the term isn't new, you're hearing it more often today as data volumes grow and IT systems struggle to keep up. "Low latency" is good; "high latency" is bad.

15 Map/reduce Map/reduce is a way of breaking up a complex problem into smaller chunks, distributing them across many computers and then reassembling them into a single answer. Google's search system utilizes map/reduce concepts and the company has a framework with the brand name MapReduce. In 2004, Google released a white paper describing its use of map/reduce. Doug Cutting recognized its potential and developed the first release of Hadoop that also incorporates map/reduce concepts.

16 NoSQL Databases Most mainstream databases (such as the Oracle Database and Microsoft SQL Server) are based on a relational architecture and use structured query language (SQL) for development and data management. But a new generation of database systems dubbed "NoSQL" (which some now say stands for "Not only SQL") is based on architectures that proponents argue are better for handling big data. Some NoSQL databases are designed for scalability and flexibility whereas others are more efficient at handling documents and other unstructured data. Examples include Hadoop/HBase, Cassandra, MongoDB and CouchDB, while some big vendors like Oracle have launched their own NoSQL products.

17 Oozie Apache Oozieis an open-source workflow engine that's used to help manage processing jobs for Hadoop. Using Oozie, a series of jobs can be defined in multiple languages, such as Pig and MapReduce, and then linked to each other. That allows a programmer to launch a data analysis query once a job to collect data from an operational application has finished, for example.

18 Pig Pig, another Apache Software Foundation project, is a platform for analyzing huge data sets. At its core, it's a programming language for developing parallel computation queries that run on Hadoop.

19 Quantitative Data Analysis Quantitative data analysis is the use of complex mathematical or statistical modeling to explain financial and business behavior or even predict future behavior. With the exploding volumes of data being collected today, quantitative data analysis has become more complex. But more data also holds the promise of more data analysis opportunities for companies that know how to use it to gain better visibility and insights into their businesses and spot market trends. One problem: There's a serious shortage of people with these kinds of analytical skills. Consulting firm McKinsey says there is a need for 1.5 million additional analysts and managers with big data analysis skills in the U.S.

20 Relational Database Relational database management systems, including IBM's DB2, Microsoft's SQL Server and the Oracle Database, are the most widely used type of database today. Most corporate transaction processing systems run on RDBMs, from banking applications to retail point-ofsale systems to inventory management applications. But, some argue that relational databases may be unable to keep up with today's exploding volume and variety of data. RDBMs, for example, were designed with alphanumeric data in mind and aren't as effective when working with unstructured data.

21 Sharding As databases become ever larger, they become more difficult to work with. Shardingis a form of database partitioning that breaks a database up into smaller, more easily managed parts. Specifically, a database is partitioned horizontally to separately manage rows in a database table. Shardingallows segments of a huge database to be distributed across multiple servers, improving the overall speed and performance of the database. Bonus "S" entry: Sqoopis an opensource tool for moving data from non- Hadoopsources, such as relational databases, into Hadoop.

22 Text Analytics One of the contributors to the big data problem is the increasing amount of text being collected from social media sites like Twitter and Facebook, external news feeds and even within a company for analysis. Because text is unstructured (unlike structured data typically stored in relational databases), mainstream business analytics tools often falter when faced with text. Text analytics uses a range of techniques -- from key word search to statistical analysis to linguistic approaches -- to derive insight from text-based data.

23 Unstructured Data Until recent years, most data was structured, the kind of alphanumeric information (such as financial data from sales transactions) that could be easily stored in a relational database and analyzed by business intelligence tools. But, a big chunk of the 2.7 zettabytesof stored data today is unstructured, such as text-based documents, tweets, photos posted on Flickr, videos posted on YouTube and so on. (Fun fact: Thirtyfive hours of content are uploaded to YouTube every minute.) Processing, storing and analyzing all that messy unstructured stuff are often challenges for today's IT systems.

24 Visualization As the volume of data grows, it becomes increasingly difficult for people to understand it using static charts and graphs. That's led to the development of a new generation of data visualization and analysis tools that present data in new ways to help people make sense of huge amounts of information. These tools include color-coded heat maps, three-dimensional graphs, animated visualizations that show changes over time and geospatial representations that overlay data on geographical maps. Today's advanced data visualization tools are also more interactive, such as allowing a user to zoom in on a data subset for closer inspection.

25 Whirr Apache Whirr is a set of libraries for running big data cloud services. More specifically, it speeds up the development of Hadoop clusters on virtual infrastructure such as Amazon EC2 and Rackspace.

26 XML Extensible Markup Language is used to transport and store data (not to be confused with HTML, which is used to display data). With XML, programmers can create common data formats and share both the information and the format through the Web. Because XML documents can be very large and complex, they are often seen as contributing to IT organization's big data challenges.

27 Yottabyte A yottabyteis a data storage benchmark that's equal to 1,000 zettabytes. The total amount of data stored worldwide is expected to reach 2.7 zettabytesthis year, up 48 percent from 2011, according to an IDC calculation. So we're a long way from reaching the yottabyte threshold -- although with the rate of big data growth, it might come sooner than we think. Just to review, a zettabyteis one sextillion bytes of data. It's equal to 1,000 exabytes, 1 million petabytes and 1 billion terabytes.

28 ZooKeeper ZooKeeperwas created by the Apache Software Foundation to help Hadoop users manage and coordinate Hadoop nodes across a distributed network. Closely integrated with HBase, the database associated with Hadoop, ZooKeeperis a centralized service for maintaining configuration information, naming services, distributed synchronization and other group services. IT managers use it to implement reliable messaging, synchronize process execution and implement redundant services.

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools

More information

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani

A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

BIG DATA TECHNOLOGY. Hadoop Ecosystem

BIG DATA TECHNOLOGY. Hadoop Ecosystem BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big

More information

Hadoop vs Apache Spark

Hadoop vs Apache Spark Innovate, Integrate, Transform Hadoop vs Apache Spark www.altencalsoftlabs.com Introduction Any sufficiently advanced technology is indistinguishable from magic. said Arthur C. Clark. Big data technologies

More information

Firebird meets NoSQL (Apache HBase) Case Study

Firebird meets NoSQL (Apache HBase) Case Study Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage www.pwc.com/oracle Next presentation starting soon Business Analytics using Big Data to gain competitive advantage If every image made and every word written from the earliest stirring of civilization

More information

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics

More information

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook

Hadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future

More information

The Next Wave of Data Management. Is Big Data The New Normal?

The Next Wave of Data Management. Is Big Data The New Normal? The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Forecast of Big Data Trends Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014 Big Data transforms Business 2 Data created every minute Source http://mashable.com/2012/06/22/data-created-every-minute/

More information

III Big Data Technologies

III Big Data Technologies III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

A Survey on Big Data Concepts and Tools

A Survey on Big Data Concepts and Tools A Survey on Big Data Concepts and Tools D. Rajasekar 1, C. Dhanamani 2, S. K. Sandhya 3 1,3 PG Scholar, 2 Assistant Professor, Department of Computer Science and Engineering, Sri Krishna College of Engineering

More information

White Paper: What You Need To Know About Hadoop

White Paper: What You Need To Know About Hadoop CTOlabs.com White Paper: What You Need To Know About Hadoop June 2011 A White Paper providing succinct information for the enterprise technologist. Inside: What is Hadoop, really? Issues the Hadoop stack

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

Big Data Zurich, November 23. September 2011

Big Data Zurich, November 23. September 2011 Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,

More information

Application Development. A Paradigm Shift

Application Development. A Paradigm Shift Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Tap into Hadoop and Other No SQL Sources

Tap into Hadoop and Other No SQL Sources Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data

More information

Big Data Analytics - Accelerated. stream-horizon.com

Big Data Analytics - Accelerated. stream-horizon.com Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based

More information

Hadoop Big Data for Processing Data and Performing Workload

Hadoop Big Data for Processing Data and Performing Workload Hadoop Big Data for Processing Data and Performing Workload Girish T B 1, Shadik Mohammed Ghouse 2, Dr. B. R. Prasad Babu 3 1 M Tech Student, 2 Assosiate professor, 3 Professor & Head (PG), of Computer

More information

HDP Hadoop From concept to deployment.

HDP Hadoop From concept to deployment. HDP Hadoop From concept to deployment. Ankur Gupta Senior Solutions Engineer Rackspace: Page 41 27 th Jan 2015 Where are you in your Hadoop Journey? A. Researching our options B. Currently evaluating some

More information

BIRT in the World of Big Data

BIRT in the World of Big Data BIRT in the World of Big Data David Rosenbacher VP Sales Engineering Actuate Corporation 2013 Actuate Customer Days Today s Agenda and Goals Introduction to Big Data Compare with Regular Data Common Approaches

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Chapter 1. Contrasting traditional and visual analytics approaches

Chapter 1. Contrasting traditional and visual analytics approaches Chapter 1 Understanding Big Data Analytics In This Chapter Defining Big Data Understanding Big Data Analytics Contrasting traditional and visual analytics approaches The era of Big Data is upon us. The

More information

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM

Using Big Data for Smarter Decision Making. Colin White, BI Research July 2011 Sponsored by IBM Using Big Data for Smarter Decision Making Colin White, BI Research July 2011 Sponsored by IBM USING BIG DATA FOR SMARTER DECISION MAKING To increase competitiveness, 83% of CIOs have visionary plans that

More information

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com

Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com Big Data Are You Ready? Thomas Kyte http://asktom.oracle.com The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84

Chukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84 Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics

More information

HDP Enabling the Modern Data Architecture

HDP Enabling the Modern Data Architecture HDP Enabling the Modern Data Architecture Herb Cunitz President, Hortonworks Page 1 Hortonworks enables adoption of Apache Hadoop through HDP (Hortonworks Data Platform) Founded in 2011 Original 24 architects,

More information

Big Data: Tools and Technologies in Big Data

Big Data: Tools and Technologies in Big Data Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Big Data Explained. An introduction to Big Data Science.

Big Data Explained. An introduction to Big Data Science. Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of

More information

Speak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure

Speak<geek> Tech Brief. RichRelevance Distributed Computing: creating a scalable, reliable infrastructure 3 Speak Tech Brief RichRelevance Distributed Computing: creating a scalable, reliable infrastructure Overview Scaling a large database is not an overnight process, so it s difficult to plan and implement

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

Microsoft SQL Server 2012 with Hadoop

Microsoft SQL Server 2012 with Hadoop Microsoft SQL Server 2012 with Hadoop Debarchan Sarkar Chapter No. 1 "Introduction to Big Data and Hadoop" In this package, you will find: A Biography of the author of the book A preview chapter from the

More information

Introduction to Big Data and the Lambda Architecture

Introduction to Big Data and the Lambda Architecture Introduction to Big Data and the Lambda Architecture Marc Schöni Meinrad Weiss April 2014 BASEL BERN BRUGG LAUSANNE ZUERICH DUESSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MUNICH STUTTGART VIENNA 1 What

More information

Sentimental Analysis using Hadoop Phase 2: Week 2

Sentimental Analysis using Hadoop Phase 2: Week 2 Sentimental Analysis using Hadoop Phase 2: Week 2 MARKET / INDUSTRY, FUTURE SCOPE BY ANKUR UPRIT The key value type basically, uses a hash table in which there exists a unique key and a pointer to a particular

More information

BIG DATA TOOLS. Top 10 open source technologies for Big Data

BIG DATA TOOLS. Top 10 open source technologies for Big Data BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

Data Management in SAP Environments

Data Management in SAP Environments Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is

More information

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal

BIG DATA AND MICROSOFT. Susie Adams CTO Microsoft Federal BIG DATA AND MICROSOFT Susie Adams CTO Microsoft Federal THE WORLD OF DATA IS CHANGING Cloud What s making this possible? Electrical efficiency of computers doubles every year and ½. Laptops and mobile

More information

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08

More information

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform...

Executive Summary... 2 Introduction... 3. Defining Big Data... 3. The Importance of Big Data... 4 Building a Big Data Platform... Executive Summary... 2 Introduction... 3 Defining Big Data... 3 The Importance of Big Data... 4 Building a Big Data Platform... 5 Infrastructure Requirements... 5 Solution Spectrum... 6 Oracle s Big Data

More information

Big Data and the Cloud Trends, Applications, and Training

Big Data and the Cloud Trends, Applications, and Training Big Data and the Cloud Trends, Applications, and Training Stavros Christodoulakis MUSIC/TUC Lab School of Electronic and Computer Engineering Technical University of Crete stavros@ced.tuc.gr Data Explosion

More information

Bringing Big Data into the Enterprise

Bringing Big Data into the Enterprise Bringing Big Data into the Enterprise Overview When evaluating Big Data applications in enterprise computing, one often-asked question is how does Big Data compare to the Enterprise Data Warehouse (EDW)?

More information

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld

#mstrworld. Tapping into Hadoop and NoSQL Data Sources in MicroStrategy. Presented by: Trishla Maru. #mstrworld Tapping into Hadoop and NoSQL Data Sources in MicroStrategy Presented by: Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop? Customer Case

More information

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION

GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION GAIN BETTER INSIGHT FROM BIG DATA USING JBOSS DATA VIRTUALIZATION Syed Rasheed Solution Manager Red Hat Corp. Kenny Peeples Technical Manager Red Hat Corp. Kimberly Palko Product Manager Red Hat Corp.

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA

More information

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem: Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Chapter 6 Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica

More information

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru

Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy. Presented by: Jeffrey Zhang and Trishla Maru Tapping Into Hadoop and NoSQL Data Sources with MicroStrategy Presented by: Jeffrey Zhang and Trishla Maru Agenda Big Data Overview All About Hadoop What is Hadoop? How does MicroStrategy connects to Hadoop?

More information

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions Big Data Solutions Portal Development with MongoDB and Liferay Solutions Introduction Companies have made huge investments in Business Intelligence and analytics to better understand their clients and

More information

Turn "Big Data" into Business Value with Real-Time BI. Timo Elliott, March 2012

Turn Big Data into Business Value with Real-Time BI. Timo Elliott, March 2012 Turn "Big Data" into Business Value with Real-Time BI Timo Elliott, March 2012 1 Legal Disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without

More information

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES

BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES BIG DATA ANALYTICS REFERENCE ARCHITECTURES AND CASE STUDIES Relational vs. Non-Relational Architecture Relational Non-Relational Rational Predictable Traditional Agile Flexible Modern 2 Agenda Big Data

More information

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY

WA2192 Introduction to Big Data and NoSQL EVALUATION ONLY WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

There s no way around it: learning about Big Data means

There s no way around it: learning about Big Data means In This Chapter Chapter 1 Introducing Big Data Beginning with Big Data Meeting MapReduce Saying hello to Hadoop Making connections between Big Data, MapReduce, and Hadoop There s no way around it: learning

More information

Integrating Cloudera and SAP HANA

Integrating Cloudera and SAP HANA Integrating Cloudera and SAP HANA Version: 103 Table of Contents Introduction/Executive Summary 4 Overview of Cloudera Enterprise 4 Data Access 5 Apache Hive 5 Data Processing 5 Data Integration 5 Partner

More information

How Cisco IT Built Big Data Platform to Transform Data Management

How Cisco IT Built Big Data Platform to Transform Data Management Cisco IT Case Study August 2013 Big Data Analytics How Cisco IT Built Big Data Platform to Transform Data Management EXECUTIVE SUMMARY CHALLENGE Unlock the business value of large data sets, including

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

Customized Report- Big Data

Customized Report- Big Data GINeVRA Digital Research Hub Customized Report- Big Data 1 2014. All Rights Reserved. Agenda Context Challenges and opportunities Solutions Market Case studies Recommendations 2 2014. All Rights Reserved.

More information

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014 5 Keys to Unlocking the Big Data Analytics Puzzle Anurag Tandon Director, Product Marketing March 26, 2014 1 A Little About Us A global footprint. A proven innovator. A leader in enterprise analytics for

More information

WHITE PAPER. Four Key Pillars To A Big Data Management Solution

WHITE PAPER. Four Key Pillars To A Big Data Management Solution WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Big Data Success Step 1: Get the Technology Right

Big Data Success Step 1: Get the Technology Right Big Data Success Step 1: Get the Technology Right TOM MATIJEVIC Director, Business Development ANDY MCNALIS Director, Data Management & Integration MetaScale is a subsidiary of Sears Holdings Corporation

More information

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns Table of Contents Abstract... 3 Introduction... 3 Definition... 3 The Expanding Digitization

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6

How Big Is Big Data Adoption? Survey Results. Survey Results... 4. Big Data Company Strategy... 6 Survey Results Table of Contents Survey Results... 4 Big Data Company Strategy... 6 Big Data Business Drivers and Benefits Received... 8 Big Data Integration... 10 Big Data Implementation Challenges...

More information

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012

Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster. Nov 7, 2012 Extending the Enterprise Data Warehouse with Hadoop Robert Lancaster Nov 7, 2012 Who I Am Robert Lancaster Solutions Architect, Hotel Supply Team rlancaster@orbitz.com @rob1lancaster Organizer of Chicago

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Architectures for Big Data Analytics A database perspective

Architectures for Big Data Analytics A database perspective Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Data Services Advisory

Data Services Advisory Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Turn "Big Data" into Business Value with Real-Time BI. Timo Elliott, March 2012

Turn Big Data into Business Value with Real-Time BI. Timo Elliott, March 2012 Turn "Big Data" into Business Value with Real-Time BI Timo Elliott, March 2012 1 Legal Disclaimer The information in this presentation is confidential and proprietary to SAP and may not be disclosed without

More information

BIG DATA CHALLENGES AND PERSPECTIVES

BIG DATA CHALLENGES AND PERSPECTIVES BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

OPERATIONAL SCENARIOS USING THE MICROSOFT DATA PLATFORM

OPERATIONAL SCENARIOS USING THE MICROSOFT DATA PLATFORM David Chappell OPERATIONAL SCENARIOS USING THE MICROSOFT DATA PLATFORM A GUIDE FOR IT LEADERS Sponsored by Microsoft Corporation Copyright 2016 Chappell & Associates Contents Microsoft s Data Platform:

More information

Transforming the Telecoms Business using Big Data and Analytics

Transforming the Telecoms Business using Big Data and Analytics Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe

More information

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM

SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM David Chappell SELLING PROJECTS ON THE MICROSOFT BUSINESS ANALYTICS PLATFORM A PERSPECTIVE FOR SYSTEMS INTEGRATORS Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Business

More information

Big Data and Data Science: Behind the Buzz Words

Big Data and Data Science: Behind the Buzz Words Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing

More information

Age of Big data. Presented by: Mohammad Iqbal BCM -2014

Age of Big data. Presented by: Mohammad Iqbal BCM -2014 Age of Presented by: Mohammad Iqbal BCM -2014 Agenda Big? Big evolution from Big? Name Symbol Value Kilobyte KB 10^3 BIG DATA Megabyte MB 10^6 Gigabyte GB 10^9 Terabyte TB 10^12 Petabyte PB 10^15 So large

More information

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum

Big Data Analytics. with EMC Greenplum and Hadoop. Big Data Analytics. Ofir Manor Pre Sales Technical Architect EMC Greenplum Big Data Analytics with EMC Greenplum and Hadoop Big Data Analytics with EMC Greenplum and Hadoop Ofir Manor Pre Sales Technical Architect EMC Greenplum 1 Big Data and the Data Warehouse Potential All

More information