Database Appliances, MapReduce, and NoSQL Compared to RDBMS
|
|
- Jonas Kory Ford
- 8 years ago
- Views:
Transcription
1 Appliances, MapReduce, and NoSQL Compared to RDBMS Prepared by Philip Czachorowsi Noember 17,
2 Disclaimer All technical information is presented as is. Any mentions of products does not constitute an endorsement. 2
3 Appliances, MapReduce, and NoSQL Compared to RDBMS Architecture Oeriew - Shared data - Shared nothing Big Data Big Data Technologies - Appliances - Big Data File Management NoSQL Moement Conclusions 3
4 Shared Data Architecture DB2 for z/os All data shared Coupling facility supports distributed loc management and buffer pool coherency Up to 32 data sharing members CF LPAR SRB SRB LPAR SRB SRB SRB SRB LPAR SRB SRB DB2 SRB SRB DB2 SRB SRB DB2 z/os z/os z/os 4
5 Shared Nothing Architecture (MPP) DB2 for LUW No shared data Collocated table joins executed locally at each node Non-collocated table joins require moing data seeral strategies Node LUW DB2 Node LUW DB2 High Speed Interconnect Node DB2 LUW Memory Memory Memory Nodes 5
6 Big Data Big Data Definition: - data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time.: Wiipedia Big Data Growth Characteristics: - Volume: Petabytes of data - Variety: Semi and unstructured data (text, ideo, logs, sensor data, etc. - Velocity: Rate that data is being created Example sizes - Yahoo captures 450 TB eery day (Forrester Report) - Yahoo has 100,000 s on 40,000 commodity computers (Forrester Report) - Twitter processes 7 TB of data eery day (IBM presentation) - The total world s data will grow from 800,000 petabytes in 2000 to 32 zettabtyes by 2020 (IBM paper) Most data is semi-structured or unstructured and not being stored 6
7 Big Data Examples Web logs Video, photographs Documents Detail eent data, lie RFID and sensor networs Social networ data Call detail records Internet searching Scientific and medical data Financial eents 7
8 Appliances Used for analytics Shared nothing architecture for scalability Moes processing to the data Hardware and software pacaged together Low administratie cost minimal setup and tuning 8
9 Appliances - Netezza Deployment options - Standalone - DB2 Analytics Accelerator for z/os V2.1 Adanced functions - Analytics and statistics - Hadoop 9
10 Appliances - Netezza IBM Netezza 1000 Data Warehouse Appliance 8 cores-fpga per S-Blade 12 S-Blades per rac Up to 10 racs 960 cores Scales from 1 TB to 1.5 petabytes SMP Host SQL Compiler and Optimizer Execution Engine High Speed Interconnect S-Blade Core Memory Core Core S-Blade Core Memory Core Core S-Blade Core Memory Core Core FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA FPGA 10 FPGA Decompress data Restrict Project
11 MapReduce and Hadoop MapReduce - Deeloped by Google in Software framewor Support distributed computing of large datasets Executes on large number so s MPP model computer grids Uses commodity hardware - Batch oriented - Uses GFS for the file system Hadoop - Open source ersion of MapReduce, under the Apache project - Deeloped at Yahoo - Uses HDFS for the file system - Numerous other components deeloped to extend functions 11
12 Hadoop Architecture jobtracer tastracer Map 1 Part 1 Part 2 Part 3 Part n Reduce 1 Output Data Split 1 Split 2 Split 3 Split n tastracer Map 2 Map 3 Map n Part 1 Part 2 Part 3 Part n Part 1 Part 2 Part 3 Part n Part 1 Part 2 Part 3 Part n Reduce 2 Reduce 3 Reduce n HDFS 12
13 Hadoop-Relational Comparison Criteria Relational Hadoop* Schema Well formed with data integrity Any arbitrary data, no integrity checing Programming Model SQL (Declaratie) Low leel language (Procedural) Processing model Real time Batch Data storage Physical tables with indexes Files as output Query Optimization Extensie Minimal, procedural steps Parallelism Pre-determined by optimizer Dynamic scheduling of wor Consistency ACID BASE Scalability MPP DBMS can theoretically scale to massie numbers Deployed clusters with 1,000 s s SQL fault tolerance SQL statement is one UOW Checpoints each function Hardware Enterprise Commodity 13 *Numerous open source add on s are aailable to enhance Hadoop functionality
14 Open Source Components Layered on Hadoop Coordination Serice Zooeeper Distributed Serice for Log Data Flume Data Mining Library Mahout SQL Lie Language Jaql - JSON (Jaascript Object Notation) Query Language SQL Language Implementation Hie (HieQL) Job Control Oozie (Serer based) Data Flow Pig (Pig Latin) Systems HBase Cassandra File System HDFS Common Serices Hadoop Common 14
15 InfoSphere BigInsights IBM s Hadoop Implementation Adds enterprise features, Implements more robust infrastructure and connectiity - GPFS instead of HFS - Zooeeper: Coordination serice for distributed applications - Pig: High leel language for executing data flows and for - BigIndex: large scale indexing - Hie: SQL lie language - HBase: Columnar database - Jaql: JSON (Jaascript Object Notation) Query Language - 15
16 Example BigInsights Integration Options Netezza - BigInsighs has a connector to exchange data with Netezza - Uses Jaql - Can use BigInsights to process massie amount of data - Load into Netezza for analysis DB2 for LUW - DB2 UDF - BidInsights Jaql and JDBC InfoSphere Streams 16
17 NoSQL NoSQL Not only SQL The original intention was to implement modern web-scale databases The name was first used in 2009 Major data store types - Key-alue Stores - Document Stored - Extensible Record Stores or Wide Column Stores 17
18 NoSQL - Characteristics Shared nothing scalability - Horizontal scaling to large numbers of serers - Replication and distribution data oer serers - Exploits commodity hardware Simplicity - Simple API - No data schema - Optimized for a specific use Weaer concurrency model - BASE (Basically Aailable, Soft state, Eentually consistent) instead of ACID (Atomicity, Consistency, Isolation, and Durability) - CAP Theorem (Consistency, Aailability, Partitioning) Only two possible Open Source No standard data model lie relational 18
19 Conclusions 19 Critical record eeping data will remain in RDBMS databases - Data integrity and consistency (ACID) - Performance - SQL Standard Hadoop opens up opportunities to access and analyze new data - Enables businesses to gain new insights - Semi-structured as well as some structured data - Hadoop acts as an ETL tool, feeding analytical tools - Will eole and mature, learning from RDBMS RDBMS and Hadoop will co-exist - Processing of semi-structured or massie data in Hadoop - Analytics in RDBMS and database appliances NoSQL not liely to be embraced by large enterprises - Not liely to be used for critical data - Lac of a standard language and schema is a big disadantage - RDBMS endors may learn from NOSQL Hadoop changing rapidly - Open source - Implementation of higher layers, lie Pig, on top of Hadoop to simply deelopment - Addition of RDBMS features into Hadoop One size DBMS will not fit all problems
20 References (Partial List) A Comparison Of Join Algorithms For Log Processing In Mapreduce, Spyros Blanas, Jignesh M. Patel, Vu Ercegoac, Jun Rao, Eugene J. Sheita, Yuanyuan Tian. SIGMOD '10 Proceedings Of The 2010 International Conference On Management Of Data, 2010 Cohadoop: Flexible Data Placement And Its Exploitation In Hadoop, Mohamed Y. Eltabah, Yuanyuan Tian, Fatma Özcan, Rainer Gemulla, Aljoscha Krette, John Mcpherson. Proceedings Of The VLDB Endowment. Volume 4 Issue 9, June 2011 Ricardo: Integrating R And Hadoop, Sudipto Das, Yannis Sismanis, Kein S. Beyer, Rainer Gemulla, Peter J. Haas, John Mcpherson. SIGMOD '10 Proceedings Of The 2010 International Conference On Management Of Data, 2010 Emerging Trends In The Enterprise Data Analytics: Connecting Hadoop And DB2 Warehouse, Fatma Özcan, Daid Hoa, Kein S. Beyer, Andrey Balmin, Chuan Jie Liu, Yu Li.. SIGMOD '11 Proceedings Of The 2011 International Conference On Management Of Data, 2011 Efficiently Support Mapreduce-Lie Computation Models Inside Parallel DBMS, Qiming Chen, Andy Therber, Meichun Hsu, Hans Zeller, Bin Zhang, Ren Wu. IDEAS '09 Proceedings Of The 2009 International Engineering & Applications Symposium, 2009 Comparison Of Map-Reduce And SQL On Large-Scale Data Processing. Jenq-Shiou Leu, Yun-Sun Yee, Wa-Lin Chen ISPA '10 Proceedings Of The International Symposium On Parallel And Distributed Processing With Applications 2010 Will Nosql s Lie Up To Their Promise? Neal Leaitt. Computer, Volume 43 Issue 2, February 2010 Scalable SQL And Nosql Data Stores, Ric Cattell. ACM SIGMOD Record, Volume 39 Issue 4, December 2010 A Comparison Of Approaches To Large-Scale Data Analysis, Andrew Palo, Eri Paulson, Alexander Rasin, Daniel J. Abadi, Daid J. Dewitt, Samuel Madden, Michael Stonebraer. SIGMOD '09 Proceedings Of The 35th SIGMOD International Conference On Management Of Data, 2009 Mapreduce And Parallel DBMSS: Friends Or Foes? Michael Stonebraer, Daniel Abadi, Daid J. Dewitt, Sam Madden, Eri Paulson, Andrew Palo, Alexander Rasin. Communications Of The ACM, Volume 53 Issue 1, January 2010 OLTP Through The Looing Glass, And What We Found There, Staros Harizopoulos, Daniel J. Abadi, Samuel Madden, Michael Stonebraer. SIGMOD '08 Proceedings Of The 2008 ACM SIGMOD International Conference On Management Of Data, 2008 Hadoop: The Definite Guide, Tom White, 2010 Mapreduce: Simplified Data Processing On :Large Clusters, Jeffrey Dean, Sanjay Ghemawat, Communications Of The ACM, January, Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data, Chris Eaton, Dir Droos, Tom Deutsch, George PLapis, Paul Ziopoulos,
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationAnalysing Large Web Log Files in a Hadoop Distributed Cluster Environment
Analysing Large Files in a Hadoop Distributed Cluster Environment S Saravanan, B Uma Maheswari Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham,
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationBig Data and Apache Hadoop s MapReduce
Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationLuncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
More informationHadoop and Relational Database The Best of Both Worlds for Analytics Greg Battas Hewlett Packard
Hadoop and Relational base The Best of Both Worlds for Analytics Greg Battas Hewlett Packard The Evolution of Analytics Mainframe EDW Proprietary MPP Unix SMP MPP Appliance Hadoop? Questions Is Hadoop
More informationBig Data and Hadoop with components like Flume, Pig, Hive and Jaql
Abstract- Today data is increasing in volume, variety and velocity. To manage this data, we have to use databases with massively parallel software running on tens, hundreds, or more than thousands of servers.
More informationBig Data on Microsoft Platform
Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4
More informationHadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationGetting Started with Hadoop. Raanan Dagan Paul Tibaldi
Getting Started with Hadoop Raanan Dagan Paul Tibaldi What is Apache Hadoop? Hadoop is a platform for data storage and processing that is Scalable Fault tolerant Open source CORE HADOOP COMPONENTS Hadoop
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationWorkshop on Hadoop with Big Data
Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationInfomatics. Big-Data and Hadoop Developer Training with Oracle WDP
Big-Data and Hadoop Developer Training with Oracle WDP What is this course about? Big Data is a collection of large and complex data sets that cannot be processed using regular database management tools
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationIBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop
IBM Data Retrieval Technologies: RDBMS, BLU, IBM Netezza, and Hadoop Frank C. Fillmore, Jr. The Fillmore Group, Inc. Session Code: E13 Wed, May 06, 2015 (02:15 PM - 03:15 PM) Platform: Cross-platform Objectives
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationEnhancing Massive Data Analytics with the Hadoop Ecosystem
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 3, Issue 11 November, 2014 Page No. 9061-9065 Enhancing Massive Data Analytics with the Hadoop Ecosystem Misha
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationNavigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationBig Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014
Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014 Defining Big Not Just Massive Data Big data refers to data sets whose size is beyond the ability of typical database software tools
More informationBig Data Course Highlights
Big Data Course Highlights The Big Data course will start with the basics of Linux which are required to get started with Big Data and then slowly progress from some of the basics of Hadoop/Big Data (like
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationAll You Wanted to Know About Big Data Projects Chida Sadayappan @schida. Jan 2014
All You Wanted to Know About Big Data Projects Chida Sadayappan @schida Jan 2014 1 WHAT WE DISCUSS HERE AGENDA > > > > > > Need History Open Source - Hadoop BigData EcoSystem Use Cases Managing BigData
More informationBIG DATA TECHNOLOGY. Hadoop Ecosystem
BIG DATA TECHNOLOGY Hadoop Ecosystem Agenda Background What is Big Data Solution Objective Introduction to Hadoop Hadoop Ecosystem Hybrid EDW Model Predictive Analysis using Hadoop Conclusion What is Big
More informationIBM Big Data Platform
Mike Winer IBM Information Management IBM Big Data Platform The big data opportunity Extracting insight from an immense volume, variety and velocity of data, in a timely and cost-effective manner. Variety:
More informationHadoop Ecosystem Overview. CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook
Hadoop Ecosystem Overview CMSC 491 Hadoop-Based Distributed Computing Spring 2015 Adam Shook Agenda Introduce Hadoop projects to prepare you for your group work Intimate detail will be provided in future
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationAdvanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
More informationCSE-E5430 Scalable Cloud Computing Lecture 2
CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing
More informationINTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS)
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & MANAGEMENT INFORMATION SYSTEM (IJITMIS) International Journal of Information Technology & Management Information System (IJITMIS), ISSN 0976 ISSN 0976
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.
More informationBig Data: Using ArcGIS with Apache Hadoop. Erik Hoel and Mike Park
Big Data: Using ArcGIS with Apache Hadoop Erik Hoel and Mike Park Outline Overview of Hadoop Adding GIS capabilities to Hadoop Integrating Hadoop with ArcGIS Apache Hadoop What is Hadoop? Hadoop is a scalable
More informationA Comparison of Approaches to Large-Scale Data Analysis
A Comparison of Approaches to Large-Scale Data Analysis Sam Madden MIT CSAIL with Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel Abadi, David DeWitt, and Michael Stonebraker In SIGMOD 2009 MapReduce
More informationImplement Hadoop jobs to extract business value from large and varied data sets
Hadoop Development for Big Data Solutions: Hands-On You Will Learn How To: Implement Hadoop jobs to extract business value from large and varied data sets Write, customize and deploy MapReduce jobs to
More informationCleveland State University
Cleveland State University CIS 612 Modern Database Processing & Big Data (3-0-3) Fall 2015 Section 50 Class Nbr. 5378. Tues, Thu 4:30 5:45 PM Prerequisites: CIS 505 and CIS 530. CIS 611 Preferred. Instructor:
More informationHadoop and Map-Reduce. Swati Gore
Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data
More informationNative Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy
Native Connectivity to Big Data Sources in MicroStrategy 10 Presented by: Raja Ganapathy Agenda MicroStrategy supports several data sources, including Hadoop Why Hadoop? How does MicroStrategy Analytics
More informationA STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS
A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)
More informationManaging Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database
Managing Big Data with Hadoop & Vertica A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database Copyright Vertica Systems, Inc. October 2009 Cloudera and Vertica
More informationHadoop s Entry into the Traditional Analytical DBMS Market. Daniel Abadi Yale University August 3 rd, 2010
Hadoop s Entry into the Traditional Analytical DBMS Market Daniel Abadi Yale University August 3 rd, 2010 Data, Data, Everywhere Data explosion Web 2.0 more user data More devices that sense data More
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationA Platform for Big Data Analytics on Distributed Scale-out Storage System
A Platform for Big Data Analytics on Distributed Scale-out Storage System Kyar Nyo Aye* Software Department, Computer University (Thaton), The Union of Myanmar E-mail: kyarnyoaye@gmail.com *Corresponding
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 15
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 15 Big Data Management V (Big-data Analytics / Map-Reduce) Chapter 16 and 19: Abideboul et. Al. Demetris
More informationBig Data Analytics - Accelerated. stream-horizon.com
Big Data Analytics - Accelerated stream-horizon.com Legacy ETL platforms & conventional Data Integration approach Unable to meet latency & data throughput demands of Big Data integration challenges Based
More informationBig Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
More informationData Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationNoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK OVERVIEW ON BIG DATA SYSTEMATIC TOOLS MR. SACHIN D. CHAVHAN 1, PROF. S. A. BHURA
More informationMapReduce for Data Warehouses
MapReduce for Data Warehouses Data Warehouses: Hadoop and Relational Databases In an enterprise setting, a data warehouse serves as a vast repository of data, holding everything from sales transactions
More informationTesting 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
More informationArchitectures for Big Data Analytics A database perspective
Architectures for Big Data Analytics A database perspective Fernando Velez Director of Product Management Enterprise Information Management, SAP June 2013 Outline Big Data Analytics Requirements Spectrum
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationA Study on Big Data Integration with Data Warehouse
A Study on Big Data Integration with Data Warehouse T.K.Das 1 and Arati Mohapatro 2 1 (School of Information Technology & Engineering, VIT University, Vellore,India) 2 (Department of Computer Science,
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationChukwa, Hadoop subproject, 37, 131 Cloud enabled big data, 4 Codd s 12 rules, 1 Column-oriented databases, 18, 52 Compression pattern, 83 84
Index A Amazon Web Services (AWS), 50, 58 Analytics engine, 21 22 Apache Kafka, 38, 131 Apache S4, 38, 131 Apache Sqoop, 37, 131 Appliance pattern, 104 105 Application architecture, big data analytics
More informationBIG DATA : PAST, PRESENT AND FUTURE - AN ANALYST S PERSPECTIVE
BIG DATA : PAST, PRESENT AND FUTURE - AN ANALYST S PERSPECTIVE Carl Olofson : Research Vice President, IDC Mark Simmonds, IBM Enterprise Architect and Senior Product Marketing Manager, IBM Software Group
More informationBig Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
More informationA COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
More informationBig Data Defined Introducing DataStack 3.0
Big Data Big Data Defined Introducing DataStack 3.0 Inside: Executive Summary... 1 Introduction... 2 Emergence of DataStack 3.0... 3 DataStack 1.0 to 2.0... 4 DataStack 2.0 Refined for Large Data & Analytics...
More informationBig Data: Tools and Technologies in Big Data
Big Data: Tools and Technologies in Big Data Jaskaran Singh Student Lovely Professional University, Punjab Varun Singla Assistant Professor Lovely Professional University, Punjab ABSTRACT Big data can
More informationHadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics
In Organizations Mark Vervuurt Cluster Data Science & Analytics AGENDA 1. Yellow Elephant 2. Data Ingestion & Complex Event Processing 3. SQL on Hadoop 4. NoSQL 5. InMemory 6. Data Science & Machine Learning
More informationDepartment of Computer Science University of Cyprus EPL646 Advanced Topics in Databases. Lecture 14
Department of Computer Science University of Cyprus EPL646 Advanced Topics in Databases Lecture 14 Big Data Management IV: Big-data Infrastructures (Background, IO, From NFS to HFDS) Chapter 14-15: Abideboul
More informationHMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE
HMR LOG ANALYZER: ANALYZE WEB APPLICATION LOGS OVER HADOOP MAPREDUCE Sayalee Narkhede 1 and Tripti Baraskar 2 Department of Information Technology, MIT-Pune,University of Pune, Pune sayleenarkhede@gmail.com
More informationQsoft Inc www.qsoft-inc.com
Big Data & Hadoop Qsoft Inc www.qsoft-inc.com Course Topics 1 2 3 4 5 6 Week 1: Introduction to Big Data, Hadoop Architecture and HDFS Week 2: Setting up Hadoop Cluster Week 3: MapReduce Part 1 Week 4:
More informationWHITE PAPER. Four Key Pillars To A Big Data Management Solution
WHITE PAPER Four Key Pillars To A Big Data Management Solution EXECUTIVE SUMMARY... 4 1. Big Data: a Big Term... 4 EVOLVING BIG DATA USE CASES... 7 Recommendation Engines... 7 Marketing Campaign Analysis...
More informationArchitecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing
Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing Wayne W. Eckerson Director of Research, TechTarget Founder, BI Leadership Forum Business Analytics
More informationBig data for the Masses The Unique Challenge of Big Data Integration
Big data for the Masses The Unique Challenge of Big Data Integration White Paper Table of contents Executive Summary... 4 1. Big Data: a Big Term... 4 1.1. The Big Data... 4 1.2. The Big Technology...
More informationBig Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive
Big Data Analysis using Hadoop components like Flume, MapReduce, Pig and Hive E. Laxmi Lydia 1,Dr. M.Ben Swarup 2 1 Associate Professor, Department of Computer Science and Engineering, Vignan's Institute
More informationBig Data Open Source Stack vs. Traditional Stack for BI and Analytics
Big Data Open Source Stack vs. Traditional Stack for BI and Analytics Part I By Sam Poozhikala, Vice President Customer Solutions at StratApps Inc. 4/4/2014 You may contact Sam Poozhikala at spoozhikala@stratapps.com.
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationReport for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales
Report for the seminar Algorithms for Database Systems F1: A Distributed SQL Database That Scales Bogdan Aurel Vancea May 2014 1 Introduction F1 [1] is a distributed relational database developed by Google
More informationDell In-Memory Appliance for Cloudera Enterprise
Dell In-Memory Appliance for Cloudera Enterprise Hadoop Overview, Customer Evolution and Dell In-Memory Product Details Author: Armando Acosta Hadoop Product Manager/Subject Matter Expert Armando_Acosta@Dell.com/
More informationBITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand?
BITKOM& NIK - Big Data Wo liegen die Chancen für den Mittelstand? The Big Data Buzz big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database
More informationCloud Database Emergence
Abstract RDBMS technology is favorable in software based organizations for more than three decades. The corporate organizations had been transformed over the years with respect to adoption of information
More informationAligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap
Aligning Your Strategic Initiatives with a Realistic Big Data Analytics Roadmap 3 key strategic advantages, and a realistic roadmap for what you really need, and when 2012, Cognizant Topics to be discussed
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationPeers Techno log ies Pv t. L td. HADOOP
Page 1 Peers Techno log ies Pv t. L td. Course Brochure Overview Hadoop is a Open Source from Apache, which provides reliable storage and faster process by using the Hadoop distibution file system and
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationDesign and Analysis of Large Data Processing Techniques
Design and Analysis of Large Data Processing Techniques Madhavi Vaidya Asst Professor VES College, Mumbai Affiliated to Univ of Mumbai Shrinivas Deshpande, Ph.D Associate Professor, HVPM Affiliated To
More informationMapReduce With Columnar Storage
SEMINAR: COLUMNAR DATABASES 1 MapReduce With Columnar Storage Peitsa Lähteenmäki Abstract The MapReduce programming paradigm has achieved more popularity over the last few years as an option to distributed
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationCSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)
CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 2016 MapReduce MapReduce is a programming model
More informationHadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN
Hadoop MPDL-Frühstück 9. Dezember 2013 MPDL INTERN Understanding Hadoop Understanding Hadoop What's Hadoop about? Apache Hadoop project (started 2008) downloadable open-source software library (current
More informationGoogle Bing Daytona Microsoft Research
Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large
More information