Big Data and Databases

Size: px
Start display at page:

Download "Big Data and Databases"

Transcription

1 Big Data and Databases Vijay Gadepally Lauren Milechin This work is sponsored, by the Department of the ir Force, under ir Force Contract F C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

2 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

3 Big Data Challenge Kids dults Elderly Users (deciders) Rapidly increasing - Data volume - Data velocity - Data variety - Date veracity Things Gap Humans 10 Years go 5 Years go Today In 5 Years Sources (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

4 Challenge of Data Volume Where do I store my data? How much do I store? 1 TB total pplications & Data 2 TB total Data Scalable Data Center Data Flat file Spreadsheet How do I access it? Database Distributed database How do I index it?

5 Challenge of Data Velocity 2011 Data Generated Per Minute Facebook: 684,478 pieces of content Twitter: 100,000 tweets YouTube: 48 hours of new video Google: 2,000,000 new queries Internet Population: 2.1 Billion people

6 Challenge of Data Velocity

7 Challenge of Data Velocity 2014 Data Generated Per Minute Facebook: 2,460,00 pieces of content Twitter: 277,000 tweets YouTube: 72 hours of new video Google: 4,000,000 new queries Internet Population: 2.4 Billion people

8 Challenge of Data Velocity Increase in Data Generated Facebook: 350 MB/min Twitter: 50 MB/min YouTube: GB/min

9 Challenge of Data Velocity Increase in Data Generated Facebook: 350 MB/min Twitter: 50 MB/min How do I capture my data for processing? YouTube: GB/min How do I process the data within the specified time constraints?

10 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio

11 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I index heterogeneous data formats? Strings may be easily stored in a database Image and document metadata may fit in traditional database Raw images/documents may require file system or alternate database

12 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I fuse heterogeneous data formats to provide uniform view? Fusion drives Indexing/schema decisions Technology (databases, storage, etc.) selection Selection of software (visualization, language) tools

13 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I develop algorithms for heterogeneous data formats? Images can use High Performance Computing tools Strings and documents require a new algebra to take advantage of High Performance computing systems Visualization requires merging image with string data

14 Challenge of Data Veracity Does the data need protection? How do I balance privacy with availability? What level of security is required? How do I make data available only to vetted analysts? How is data kept secure and private while minimizing impact on analysis?

15 Challenge of Data Veracity Does the data need protection? How confident am I in the integrity of my data? Where did it come from? Who has accessed it? Has anyone modified data stream? Has anyone tampered with the data stream?

16 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

17 General Strategy: System Design Kids dults Elderly Users (deciders) User Interface Things Files Ingest & Ingest & Enrichment Enrichment Ingest Databases Humans nalytics B Gap C E D 10 Years go 5 Years go Today In 5 Years Scheduler Computing Sources (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

18 General Strategies: Collection Collect, Store, and Process only Useful Data 10 4 Degree Distribution 10 3 Count Degree d max

19 General Strategies: Collection Collect, Store, and Process only Useful Data Intelligently Reduce the mount of Data through Sampling Techniques 10 4 Degree Distribution 10 3 SIGNL Count 10 2 NOISE N-D SPCE Degree d max Example background model: Power Law Graph

20 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

21 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

22 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack Data Loss / Exfiltration E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

23 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack Data Loss / Exfiltration Insider E Threat D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

24 General Strategy: Privacy-Preserving Technology Use Cryptographic Protocols to Protect the Confidentiality, Integrity, and/or vailability of Data Lots of ongoing research Popular techniques: Fully Homomorphic Encryption Multiparty Computation Computing on Masked Data (CMD) Cryptographic protections for NoSQL ccumulo database Uses order preserving, deterministic and semantically secure encryption 2-4x performance overhead Plaintext! Query! Plaintext! nalytic! Result! Encrypt Decrypt Masked! Query! Masked! nalytic! Result! CMD Big Data Cloud

25 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

26 Database Fundamentals Database Collection of data and supporting data structures Database Management System (DBMS) Software that provides interface between user and database Define new data and schema data Retrieve (Query) data DB administration: set security and permissions BigTable

27 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails B SE Successful Transaction Failed Transaction

28 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time B SE

29 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

30 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

31 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

32 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

33 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

34 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

35 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

36 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

37 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE or

38 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

39 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

40 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

41 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

42 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

43 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

44 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

45 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

46 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

47 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction

48 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction

49 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction

50 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency

51 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency

52 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency

53 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency CID BSE BigTable

54 Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance

55 Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance

56 Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance Consistency vailability Partition Tolerance BigTable

57 Database Fundamentals SQL NoSQL NewSQL DTBSES Cluster Dremel BigTable PRLLEL PROCESSING MapReduce Hadoop Pregel D4M Giraph Slide Source: S. Sawyer, B. D. O'Gwynn,. Tran, T. Yu. Understanding Query Performance in ccumulo. HPEC 2013.

58 Database Fundamentals Consistency Relational DB Systems NewSQL DB Systems NoSQL DB Systems Performance

59

60 Relational Databases What it Is Database that stores information about data and how it is related Highly structured normalized table based database Predefined schema/organization of data Vertically scalable with good quality hardware Use SQL as query interface Typically provide full consistency Examples

61 Relational Databases Who Uses It Dealing with transactional data Problem sizes are moderate Need for CID guarantees When to Use It How to Use It JDBC (Java DataBase Connector) SQL command line

62 Relational Databases Tweet Table Tweet ID User ID Location ID Tweet Text wwz4p7jd Omg earthquake wwh1hss5 We're gonna have an earthquake wwwygbvq Omg it's a earthquake User ID Username Friends Count _zariaaa_ gnvrly_ron yolvndv 424 UserTable Location ID Latitude Longitude wwh1hss wwwygbvq wwz4p7jd Location Table

63 NoSQL Databases What it Is Database based on documents, key-value pairs, graphs, or widecolumn stores Dynamic schema Horizontal scalability Typically provide eventual consistency Examples

64 NoSQL Databases Who Uses It When to Use It Large unstructured datasets Strong need for high performance Only require BSE guarantees Python/JV bindings Lincoln Laboratory D4M Command Line How to Use It

65 NoSQL Databases Edge Table Degree Table Degree FriendCount 424 FriendCount 541 FriendCount 693 Latitude Latitude Latitude Location wwh1hss5 Location wwwygbvq Location wwz4p7jd UserID UserName _zariaaa_ UserName gnvrly_ron UserName yolvndv Word Omg Word a Word an Word earthquake FriendCount FriendCount FriendCount Latitude Word an 1 Word earthquake FriendCount 424 FriendCount 541 Word an Word earthquake Transpose Table Text Table Text Omg earthquake We're gonna have an earthquake Omg it's a earthquake

66 NoSQL Example - ccumulo

67 ccumulo Design Drivers 1 2 Cell-Level Security Express common security requirements in the infrastructure, not just in the application Data-centric approach encourages secure sharing Scalability Near linear performance improvements at thousands of nodes Durable and reliable under increased failures that come with scale 3 Diverse, Interactive nalytics Sorted key/value core performs well in a diverse set of domains Information retrieval, statistics, graph analysis, geo indexing, and more 4 Flexible, daptive Schema Start with universal structures and indexing Refine the schema over time Source: Sqrrl Data Inc

68 ccumulo Features Visibility Labels Iterators utomatic table splitting Support for pache Thrift proxy Visibility Iterator Table-split Thrift Schema D4M volume velocity variety veracity

69 NewSQL Databases What it Is Database systems that emulate performance of NoSQL along with CID guarantees of Relational Databases Usually scaled up version of a relational database Often uses array data model Other data models include graph-based data structures and distributed relational tables May make use of in-memory processing or specialized hardware Examples

70 NewSQL Databases Who Uses It When to Use It Large multidimensional datasets Data that doesn t fit in traditional databases Have the volume for NoSQL, but need for CID guarantees How to Use It Each have custom PI Ex: SciDB uses JDBC, SHIM, D4M, R-SciDB binding

71 NewSQL Example: SciDB

72 SciDB Design Drivers SciDB R, Python, Matlab, Julia, Massive Parallel Processing Database rray data model Complex analytics Commodity clusters or cloud

73 SciDB Example Schema Highly customizable to application Each cell is a strongly-typed structure of attributes: <int>, or <double, string, float>, or Nullable attributes, empty cells, sparse, or dense stock!! MSFT! MSFVX! MT!! price: 15.76! volume: 200! price: 234.2! volume: 10! price: 17.50! volume: null! price: 17.40! volume: 100! price: 0.02! volume: null! ! ! !! time!

74 SciDB Features Massive Parallel Database rray Data Model nalytic language support In-database analytics MPP DB rray Languages nalytics volume velocity variety veracity

75 Quick Reference RDBMS vs. NoSQL vs. NewSQL Examples Schema rchitecture Guarantees ccess Relational Databases MySQL, PostgreSQL, Oracle Typed columns with relational keys Single-node or sharded CID transactions SQL, indexing, joins, and query planning NoSQL HBase, Cassandra, ccumulo Schema-less Distributed, scalable Eventually consistent Low-level PI (scans and filtering) NewSQL SciDB, VoltDB, MemSQL Strongly-typed structure of attributes Distributed, scalable CID transactions (most) Custom PI, JDBC, Bindings to popular languages Slide Source: S. Sawyer, B. D. O'Gwynn,. Tran, T. Yu. Understanding Query Performance in ccumulo. HPEC 2013.

76 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

77 On The Horizon New Technologies and Techniques New database and processing technology such as: pache Spark: In memory distributed processing TileDB: Database for scientific big data S-Store: Database tuned for streaming data New cross database and storage engine standards, PI, and practices: BigDawg: n PI to simplify big data analytics currently being designed GraphBLS: n effort to standardize graph algorithms and databases dvances in privacy preserving technology: SPED: Signal processing in the encrypted domain Greater efficiency of protocols such as Functional Encryption and Multiparty Computation Tools and technologies will continue to evolve important to keep students abreast of new developments

78 Conclusions Lots of stuff going on! Very important to understand details of your dataset, end analytic, and other requirements Topics covered: Challenge overview (What is the problem?) Some general strategies Databases Upcoming technologies

79 Leading Science and Engineering Research University 80 Nobel laureates, 50 National Medal of Science recipients Thousands of companies (11 th largest world economy) 1000 faculty, employees, students $1.4B in annual external research funding Lincoln Laboratory $800M Other MIT $600M

80 bout MIT

Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15

Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15 Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15 Welcome! Today s Goals: Introduce you to the Big Data @ CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Comparing SQL and NOSQL databases

Comparing SQL and NOSQL databases COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what

More information

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive

More information

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Data-intensive HPC: opportunities and challenges. Patrick Valduriez Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,

More information

So What s the Big Deal?

So What s the Big Deal? So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

How To Use Big Data For Telco (For A Telco)

How To Use Big Data For Telco (For A Telco) ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call

More information

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web

More information

How To Improve Performance In A Database

How To Improve Performance In A Database Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed

More information

INTRODUCTION TO CASSANDRA

INTRODUCTION TO CASSANDRA INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D. Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

NoSQL Database Systems and their Security Challenges

NoSQL Database Systems and their Security Challenges NoSQL Database Systems and their Security Challenges Morteza Amini amini@sharif.edu Data & Network Security Lab (DNSL) Department of Computer Engineering Sharif University of Technology September 25 2

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15

MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You

More information

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Reference Architecture, Requirements, Gaps, Roles

Reference Architecture, Requirements, Gaps, Roles Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Big Data and Big Analytics

Big Data and Big Analytics Big Data and Big Analytics Introducing SciDB Open source, massively parallel DBMS and analytic platform Array data model (rather than SQL, Unstructured, XML, or triple-store) Extensible micro-kernel architecture

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Bringing Big Data Modelling into the Hands of Domain Experts

Bringing Big Data Modelling into the Hands of Domain Experts Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00

Practical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00 Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn

More information

Search and Real-Time Analytics on Big Data

Search and Real-Time Analytics on Big Data Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Big Data and Scripting Systems build on top of Hadoop

Big Data and Scripting Systems build on top of Hadoop Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here> s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline

More information

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give

More information

bigdata Managing Scale in Ontological Systems

bigdata Managing Scale in Ontological Systems Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural

More information

MapReduce with Apache Hadoop Analysing Big Data

MapReduce with Apache Hadoop Analysing Big Data MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues

More information

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010 System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached

More information

Xiaoming Gao Hui Li Thilina Gunarathne

Xiaoming Gao Hui Li Thilina Gunarathne Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the

More information

Cloud Scale Distributed Data Storage. Jürmo Mehine

Cloud Scale Distributed Data Storage. Jürmo Mehine Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented

More information

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

CISC 432/CMPE 432/CISC 832 Advanced Database Systems CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 martin@cs.queensu.ca Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:

More information

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1

Why NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1 Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots

More information

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark

More information

Open source large scale distributed data management with Google s MapReduce and Bigtable

Open source large scale distributed data management with Google s MapReduce and Bigtable Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory

More information

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach

www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging

More information

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

More information

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3

More information

Big Data Technologies Compared June 2014

Big Data Technologies Compared June 2014 Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development

More information

Completing the Big Data Ecosystem:

Completing the Big Data Ecosystem: Completing the Big Data Ecosystem: in sqrrl data INC. August 3, 2012 Design Drivers in Analysis of big data is central to our customers requirements, in which the strongest drivers are: Scalability: The

More information

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation

More information

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing

More information

From Spark to Ignition:

From Spark to Ignition: From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for

More information

How graph databases started the multi-model revolution

How graph databases started the multi-model revolution How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the

More information

Structured Data Storage

Structured Data Storage Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct

More information

Native Connectivity to Big Data Sources in MSTR 10

Native Connectivity to Big Data Sources in MSTR 10 Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single

More information

NoSQL for SQL Professionals William McKnight

NoSQL for SQL Professionals William McKnight NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm

More information

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning

More information

Similarity Search in a Very Large Scale Using Hadoop and HBase

Similarity Search in a Very Large Scale Using Hadoop and HBase Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France

More information

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University A Distributed Storage Schema for Cloud Computing based Raster GIS Systems Presented by Cao Kang, Ph.D. Geography Department, Clark University Cloud Computing and Distributed Database Management System

More information

The 3 questions to ask yourself about BIG DATA

The 3 questions to ask yourself about BIG DATA The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.

More information

Domain driven design, NoSQL and multi-model databases

Domain driven design, NoSQL and multi-model databases Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra

More information

Big Data Management and Security

Big Data Management and Security Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value

More information

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of

More information

Cassandra A Decentralized, Structured Storage System

Cassandra A Decentralized, Structured Storage System Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922

More information

Hadoop Ecosystem B Y R A H I M A.

Hadoop Ecosystem B Y R A H I M A. Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open

More information

Applications for Big Data Analytics

Applications for Big Data Analytics Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

Introduction to Big Data! with Apache Spark UC#BERKELEY# Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!

More information

NextGen Infrastructure for Big DATA Analytics.

NextGen Infrastructure for Big DATA Analytics. NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures

More information

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)

More information

Oracle Big Data Strategy Simplified Infrastrcuture

Oracle Big Data Strategy Simplified Infrastrcuture Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly

More information

A Brief Outline on Bigdata Hadoop

A Brief Outline on Bigdata Hadoop A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is

More information

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...

More information

HPC ABDS: The Case for an Integrating Apache Big Data Stack

HPC ABDS: The Case for an Integrating Apache Big Data Stack HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox gcf@indiana.edu http://www.infomall.org

More information

Navigating the Big Data infrastructure layer Helena Schwenk

Navigating the Big Data infrastructure layer Helena Schwenk mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining

More information

Big Data With Hadoop

Big Data With Hadoop With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials

More information

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects

More information

ROME, 17-10-2013 BIG DATA ANALYTICS

ROME, 17-10-2013 BIG DATA ANALYTICS ROME, 17-10-2013 BIG DATA ANALYTICS BIG DATA FOUNDATIONS Big Data is #1 on the 2012 and the 2013 list of most ambiguous terms - Global language monitor 2 BIG DATA FOUNDATIONS Big Data refers to data sets

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept

More information

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database

More information

Introduction to Apache Cassandra

Introduction to Apache Cassandra Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating

More information

Cloud Computing and Advanced Relationship Analytics

Cloud Computing and Advanced Relationship Analytics Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 brian.clark@objectivity.com

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation

More information

Large-Scale Data Processing

Large-Scale Data Processing Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase

More information