Hadoop and NoSQL Basics: Big Data Demystified. NYS Innovation Summit, 12/17/2013. Matt

Size: px
Start display at page:

Download "Hadoop and NoSQL Basics: Big Data Demystified. NYS Innovation Summit, 12/17/2013. Matt LeMay, @mattlemay"

Transcription

1 Hadoop and NoSQL Basics: Big Data Demystified NYS Innovation Summit, 12/17/2013 Matt

2 When I want people to think I m smart, I just say HADOOP really loud.

3

4 Hadoop! There it is. Big Data! Data Science! Algorithms!

5

6

7 ... why are we thinking about this at all?

8 ALL the data created until the year 2003 = ALL the data created every two days

9

10 Writes > 12 terabytes of data per day.

11 *the 451 group

12 ... how did we get here?

13 HIERARCHICAL DATABASE MODEL RELATIONAL DATABASE MODEL DOCUMENT DATABASE MODEL

14 HIERARCHICAL DATABASE MODEL Fruit Orange Apple Grape Granny Smith Honeycrisp Red Delicious Used in early mainframe computing! Stores data in one-to-many trees! Not very flexible

15 RELATIONAL DATABASE MODEL Fruit_Variety Granny Smith Honeycrisp Red Delicious Navel Fruit Apple Apple Apple Orange Invented in 1970 by Edgar F. Codd at IBM! Stores data in tuples which resemble rows of a table! Still the most widely used database model

16 RELATIONAL DATABASE MODEL Fruit_ID Fruit_Name 1 Orange 2 Apple 3 Grape Variety_ID Variety_Name Fruit_ID 1 Granny Smith 2 2 Honeycrisp 2 3 Red Delicious 2 4 Navel 1... can also store hierarchical data!

17 RELATIONAL DATABASE MODEL Fruit_ID Fruit_Name 1 Orange 2 Apple 3 Grape Variety_ID Variety_Name Fruit_ID 1 Granny Smith 2 2 Honeycrisp 2 3 Red Delicious 2 4 Navel 1 Has rigid structure or schema.

18 RELATIONAL DATABASE MODEL Fruit_ID Fruit_Name 1 Orange 2 Apple 3 Grape Variety_ID Variety_Name Fruit_ID 1 Granny Smith 2 2 Honeycrisp 2 3 Red Delicious 2 4 Navel 1 Uses unique keys for consistency across tables

19 DOCUMENT DATABASE MODEL Red Delicious Apple Honeycrisp Apple Navel Orange Granny Smith Apple Doesn t have a single structure or schema that each entry must follow! Developed in 1995 for use with Lotus Notes! SO TRENDY

20 DOCUMENT DATABASE MODEL {! Fruits : [! {! Type : Apple,! Variety : Red Delicious! },! {! Name : Granny Smith Apple! },! Navel Orange! ]! }!! CAN have structured elements, but structure doesn t need to be consistent across entries

21 HIERARCHICAL RIGID DATABASE MODEL RELATIONAL DATABASE MODEL DOCUMENT DATABASE MODEL FLEXIBLE

22 HIERARCHICAL RIGID DATABASE MODEL RELATIONAL DATABASE MODEL DOCUMENT DATABASE MODEL FLEXIBLE

23 Relational Database is to Document Database As Excel Spreadsheet is to Word Document!

24 Relational Database is to Document Database As Excel Spreadsheet is to Word Document!... as SQL is to NoSQL

25 Relational Database is to Document Database As Excel Spreadsheet is to Word Document!... as SQL is to NoSQL* *... mostly / sorta. Stay tuned!

26 SQL, or Structured Query Language, is a language for getting data into and out of a relational database. SELECT Variety_Name FROM fruits WHERE fruit_id = 2! Variety_Name! ! Granny Smith! Honeycrisp! Red Delicious!

27 Depending on who you ask, NoSQL means NOT SQL or NOT ONLY SQL.

28 (in fact, some characterize NoSQL as a movement, not a particular technology or set of technologies.)

29 SQL Databases are highly standardized.! NoSQL Databases are highly fragmented.

30 SQL Databases are highly standardized.! NoSQL Databases are highly fragmented. Some are document model databases, some use a variation of a key-value store. Document Databases

31 So, what are the characteristics of NoSQL databases* that make them so trendy and exciting? * Generally

32 Relational databases have strict schemas dictating the structure of data. NoSQL databases are generally schemaless, even when they use key-value stores.

33 NoSQL databases are generally schemaless, even when they use key-value stores. More flexible Can start entering data before deciding on how that data will be formatted Less structured, consistent

34 NoSQL databases are generally schemaless, even when they use key-value stores. More flexible Can start entering data before deciding on how that data will be formatted Less structured, consistent

35 Relational databases can scale up (on one computer) but not easily out (across many computers). NoSQL databases are designed to scale out across many computers.

36 NoSQL databases are designed to scale out across many computers. Lots of machines == BIG data Can scale quickly if needed No single point of failure More complicated to set up

37 Relational databases read and write information directly to a disk drive. NoSQL databases store information in memory, and/ or include robust built-in caching in memory.

38 NoSQL databases store information in memory, and/ or include robust built-in caching in memory. Faster Memory more expensive than disk Potential reliability issues

39 Relational databases follow the ACID model: NoSQL databases do not follow the ACID model.

40 NoSQL databases do not follow the ACID model. More freedom to handle requests in a way that honors the uniqueness of things. Much greater room for (potentially serious) errors.

41 Relational databases represent data as rows and columns. NoSQL databases often represent data in formats such as JSON, which are native to many programming languages.

42 NoSQL databases often represent data in formats such as JSON, which are native to many programming languages. Easier, faster for programmers Harder for non-programmers

43 SO WAIT, THOUGH, how the f*** do you find anything in a NoSQL database????

44

45 HADOOP is an open source framework for doing MapReduce.! MapReduce is one way to make sense of a document database. (That s how GOOGLE does it.)!

46 MapReduce has two core steps:! Map! and! Reduce.!!!... both are pretty much what they sound like.

47 This is what it actually looks like: function map(string name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1) function reduce(string word, Iterator partialcounts): // word: a word // partialcounts: a list of aggregated partial counts sum = 0 for each pc in partialcounts: sum += ParseInt(pc) emit (word, sum)

48 MAP: For a given document, map each word phrase or item to the number of times that word phrase or item appears. function map(string name, String document): // name: document name // document: document contents for each word w in document: emit (w, 1)

49 REDUCE: NOW, take all of those maps from every document, and reduce them to a single list of items and counts. function reduce(string word, Iterator partialcounts): // word: a word // partialcounts: a list of aggregated partial counts sum = 0 for each pc in partialcounts: sum += ParseInt(pc) emit (word, sum)

50 Honeycrisp Apple Granny Smith Apple Red Delicious Apple Navel Orange

51 Honeycrisp Apple Granny Smith Apple Red Delicious Apple Navel Orange MAP (Red, 1) (Delicious, 1) (Apple, 1) (Honeycrisp, 1) (Apple, 1) (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1) (Apple, 1)

52 Honeycrisp Apple Granny Smith Apple Red Delicious Apple Navel Orange MAP (Red, 1) (Delicious, 1) (Apple, 1) (Honeycrisp, 1) (Apple, 1) (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1) (Apple, 1) REDUCE (Red, 1) (Delicious, 1) (Apple, 3) (Honeycrisp, 1) (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1)

53 Honeycrisp Apple Granny Smith Apple Red Delicious Apple Navel Orange MAP (Red, 1) (Delicious, 1) (Apple, 1) (Honeycrisp, 1) (Apple, 1) (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1) (Apple, 1) REDUCE (Red, 1) (Delicious, 1) (Apple, 3) (Honeycrisp, 1) (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1)

54 The hard work is distributed

55 The hard work is distributed The easy work is centralized

56 COMP 1 Honeycrisp Apple COMP 2 Granny Smith Apple Red Delicious Apple Navel Orange... but what if we ve got our documents stored on multiple machines?

57 COMP 1 Honeycrisp Apple COMP 2 Granny Smith Apple Red Delicious Apple Navel Orange (Red, 1) (Delicious, 1) (Apple, 1) MAP (Honeycrisp, 1) (Apple, 1) (Navel, 1) (Orange, 1) MAP (Granny, 1) (Smith, 1) (Apple, 1)

58 COMP 1 Honeycrisp Apple COMP 2 Granny Smith Apple Red Delicious Apple Navel Orange (Red, 1) (Delicious, 1) (Apple, 1) MAP (Honeycrisp, 1) (Apple, 1) (Navel, 1) (Orange, 1) MAP (Granny, 1) (Smith, 1) (Apple, 1) (Red, 1) (Delicious, 1) (Apple, 2) (Honeycrisp, 1) REDUCE (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1) (Apple, 1) REDUCE

59 COMP 1 Honeycrisp Apple COMP 2 Granny Smith Apple Red Delicious Apple Navel Orange (Red, 1) (Delicious, 1) (Apple, 1) MAP (Honeycrisp, 1) (Apple, 1) (Navel, 1) (Orange, 1) MAP (Granny, 1) (Smith, 1) (Apple, 1) (Red, 1) (Delicious, 1) (Apple, 2) (Honeycrisp, 1) REDUCE (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1) (Apple, 1) REDUCE REDUCE (Red, 1) (Delicious, 1) (Apple, 3) (Honeycrisp, 1) (Navel, 1) (Orange, 1) (Granny, 1) (Smith, 1)

60 Is this the easiest way to count apples?

61 NOT

62

63 * relational database *

64

65 Tweet Text: I am so happy! Tweet Location: Albuquerque, NM User Home: New York, NY Tweet Text: #FML #FML #FML Tweet Location: Palo Alto, CA User Home: San Francisco, CA

66 Tweet Text: I am so happy! Tweet Location: Albuquerque, NM User Home: New York, NY Tweet Text: #FML #FML #FML Tweet Location: Palo Alto, CA User Home: San Francisco, CA MAP (WITH MATH + SENTIMENT) (1808, +.9) (Distance in Miles, Sentiment Score) (33, -.6)

67 Tweet Text: I am so happy! Tweet Location: Albuquerque, NM User Home: New York, NY Tweet Text: #FML #FML #FML Tweet Location: Palo Alto, CA User Home: San Francisco, CA MAP (WITH MATH + SENTIMENT) (1808, +.9) (Distance in Miles, Sentiment Score) (33, -.6) REDUCE (1808, +.9) (33, -.6)

68 Tweet Text: I am so happy! Tweet Location: Albuquerque, NM User Home: New York, NY Tweet Text: #FML #FML #FML Tweet Location: Palo Alto, CA User Home: San Francisco, CA MAP (WITH MATH + SENTIMENT) (1808, +.9) (Distance in Miles, Sentiment Score) (33, -.6) REDUCE (1808, +.9) (33, -.6) RINSE AND REPEAT LIKE A MILLION TIMES

69 ... none of this is magic.

70 ... in fact, the magic part is just a precursor to doing the actual hard work.

71

72 Danah Boyd s Six Provocations for Big Data: 1. Automating Research Changes the Definition of Knowledge.! 2. Claims to Objectivity and Accuracy are Misleading! 3. Bigger Data are Not Always Better Data! 4. Not All Data Are Equivalent! 5. Just Because it is Accessible Doesn t Make it Ethical! 6. Limited Access to Big Data Creates New Digital Divides

73 What about THE FUTURE?

74 HIERARCHICAL RIGID DATABASE MODEL RELATIONAL DATABASE MODEL DOCUMENT DATABASE MODEL FLEXIBLE

75 HIERARCHICAL RIGID DATABASE MODEL? RELATIONAL DATABASE MODEL DOCUMENT DATABASE MODEL FLEXIBLE

76

77

78

79 Further Reading: Martin Fowler on NoSQL: Helpful Stack Overflow thread: Finding Friends with MapReduce: Choosing a Database That s Right for Your Business: Demystifying the Role of Big Data in Marketing: mar/12/big-data-marketing-demystified! The NoSQL Movement: Big Data Tools Cost Too Much, Do Too Little: hadoop_no_sql_dont_believe_the_hype/! Is Big Data an Economic Big Dud?: Six Provocations for Big Data:

Lecture 10 - Functional programming: Hadoop and MapReduce

Lecture 10 - Functional programming: Hadoop and MapReduce Lecture 10 - Functional programming: Hadoop and MapReduce Sohan Dharmaraja Sohan Dharmaraja Lecture 10 - Functional programming: Hadoop and MapReduce 1 / 41 For today Big Data and Text analytics Functional

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect

More information

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University 2013-03-05 Introduction to NoSQL Databases Tore Risch Information Technology Uppsala University 2013-03-05 UDBL Tore Risch Uppsala University, Sweden Evolution of DBMS technology Distributed databases SQL 1960 1970

More information

InfiniteGraph: The Distributed Graph Database

InfiniteGraph: The Distributed Graph Database A Performance and Distributed Performance Benchmark of InfiniteGraph and a Leading Open Source Graph Database Using Synthetic Data Objectivity, Inc. 640 West California Ave. Suite 240 Sunnyvale, CA 94086

More information

Apache HBase. Crazy dances on the elephant back

Apache HBase. Crazy dances on the elephant back Apache HBase Crazy dances on the elephant back Roman Nikitchenko, 16.10.2014 YARN 2 FIRST EVER DATA OS 10.000 nodes computer Recent technology changes are focused on higher scale. Better resource usage

More information

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example

MapReduce. MapReduce and SQL Injections. CS 3200 Final Lecture. Introduction. MapReduce. Programming Model. Example MapReduce MapReduce and SQL Injections CS 3200 Final Lecture Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI'04: Sixth Symposium on Operating System Design

More information

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go

More information

Big Data and Apache Hadoop s MapReduce

Big Data and Apache Hadoop s MapReduce Big Data and Apache Hadoop s MapReduce Michael Hahsler Computer Science and Engineering Southern Methodist University January 23, 2012 Michael Hahsler (SMU/CSE) Hadoop/MapReduce January 23, 2012 1 / 23

More information

Lecture Data Warehouse Systems

Lecture Data Warehouse Systems Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores

More information

Open Source Technologies on Microsoft Azure

Open Source Technologies on Microsoft Azure Open Source Technologies on Microsoft Azure A Survey @DChappellAssoc Copyright 2014 Chappell & Associates The Main Idea i Open source technologies are a fundamental part of Microsoft Azure The Big Questions

More information

Open source, high performance database

Open source, high performance database Open source, high performance database Anti-social Databases: NoSQL and MongoDB Will LaForest Senior Director of 10gen Federal will@10gen.com @WLaForest 1 SQL invented Dynamic Web Content released IBM

More information

SQL Simple Queries. Chapter 3.1 V3.0. Copyright @ Napier University Dr Gordon Russell

SQL Simple Queries. Chapter 3.1 V3.0. Copyright @ Napier University Dr Gordon Russell SQL Simple Queries Chapter 3.1 V3.0 Copyright @ Napier University Dr Gordon Russell Introduction SQL is the Structured Query Language It is used to interact with the DBMS SQL can Create Schemas in the

More information

NoSQL. Thomas Neumann 1 / 22

NoSQL. Thomas Neumann 1 / 22 NoSQL Thomas Neumann 1 / 22 What are NoSQL databases? hard to say more a theme than a well defined thing Usually some or all of the following: no SQL interface no relational model / no schema no joins,

More information

The Internet of Things and Big Data: Intro

The Internet of Things and Big Data: Intro The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific

More information

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12

Introduction to NoSQL Databases and MapReduce. Tore Risch Information Technology Uppsala University 2014-05-12 Introduction to NoSQL Databases and MapReduce Tore Risch Information Technology Uppsala University 2014-05-12 What is a NoSQL Database? 1. A key/value store Basic index manager, no complete query language

More information

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015

Clustering Big Data. Efficient Data Mining Technologies. J Singh and Teresa Brooks. June 4, 2015 Clustering Big Data Efficient Data Mining Technologies J Singh and Teresa Brooks June 4, 2015 Hello Bulgaria (http://hello.bg/) A website with thousands of pages... Some pages identical to other pages

More information

How To Handle Big Data With A Data Scientist

How To Handle Big Data With A Data Scientist III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution

More information

Infrastructures for big data

Infrastructures for big data Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)

More information

Data Modeling for Big Data

Data Modeling for Big Data Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes

More information

Exploration of Non-Relational Database Models. Swayze Smartt. Department of Computer Science. Wake Forest University. Spring 2011 Honors Thesis

Exploration of Non-Relational Database Models. Swayze Smartt. Department of Computer Science. Wake Forest University. Spring 2011 Honors Thesis Smartt 1 Exploration of Non-Relational Database Models Swayze Smartt Department of Computer Science Wake Forest University Spring 2011 Honors Thesis Advised by Dr. Stan Thomas Smartt 2 Abstract While relational

More information

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu

Distributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.

More information

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013

Database Management System Choices. Introduction To Database Systems CSE 373 Spring 2013 Database Management System Choices Introduction To Database Systems CSE 373 Spring 2013 Outline Introduction PostgreSQL MySQL Microsoft SQL Server Choosing A DBMS NoSQL Introduction There a lot of options

More information

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc Big Data, Fast Data, Complex Data Jans Aasman Franz Inc Private, founded 1984 AI, Semantic Technology, professional services Now in Oakland Franz Inc Who We Are (1 (2 3) (4 5) (6 7) (8 9) (10 11) (12

More information

Socialprise: Leveraging Social Data in the Enterprise Rev 0109

Socialprise: Leveraging Social Data in the Enterprise Rev 0109 Socialprise: Leveraging Social Data in the Enterprise Rev 0109 Contents I. Socialprise: Capturing Smart Insights into Agile Relationships II. Socialprise Applications: Getting the Who, What and When of

More information

Cloud Computing at Google. Architecture

Cloud Computing at Google. Architecture Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale

More information

PostgreSQL as a Schemaless Database. Christophe Pettus PostgreSQL Experts, Inc. OSCON 2013

PostgreSQL as a Schemaless Database. Christophe Pettus PostgreSQL Experts, Inc. OSCON 2013 PostgreSQL as a Schemaless Database. Christophe Pettus PostgreSQL Experts, Inc. OSCON 2013 Welcome! I m Christophe. PostgreSQL person since 1997. Consultant with PostgreSQL Experts, Inc. cpettus@pgexperts.com

More information

AllegroGraph. a graph database. Gary King gwking@franz.com

AllegroGraph. a graph database. Gary King gwking@franz.com AllegroGraph a graph database Gary King gwking@franz.com Overview What we store How we store it the possibilities Using AllegroGraph Databases Put stuff in Get stuff out quickly safely Stuff things with

More information

Can the Elephants Handle the NoSQL Onslaught?

Can the Elephants Handle the NoSQL Onslaught? Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented

More information

Databases for text storage

Databases for text storage Databases for text storage Jonathan Ronen New York University jr4069@nyu.edu December 1, 2014 Jonathan Ronen (NYU) databases December 1, 2014 1 / 24 Overview 1 Introduction 2 PostgresSQL 3 MongoDB Jonathan

More information

NoSQL a view from the top

NoSQL a view from the top Red Stack Tech Ltd James Anthony Technology Director NoSQL a view from the top Part 1 1 Contents Introduction...Page 3 Key Value Stores..... Page 4 Column Family Data Stores.. Page 6 Document Data Stores...Page

More information

Introduction to Big Data Training

Introduction to Big Data Training Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB

More information

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment

Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Hacettepe University Department Of Computer Engineering BBM 471 Database Management Systems Experiment Subject NoSQL Databases - MongoDB Submission Date 20.11.2013 Due Date 26.12.2013 Programming Environment

More information

Introduction to Hadoop

Introduction to Hadoop Introduction to Hadoop Miles Osborne School of Informatics University of Edinburgh miles@inf.ed.ac.uk October 28, 2010 Miles Osborne Introduction to Hadoop 1 Background Hadoop Programming Model Examples

More information

SmartArrays and Java Frequently Asked Questions

SmartArrays and Java Frequently Asked Questions SmartArrays and Java Frequently Asked Questions What are SmartArrays? A SmartArray is an intelligent multidimensional array of data. Intelligent means that it has built-in knowledge of how to perform operations

More information

these three NoSQL databases because I wanted to see a the two different sides of the CAP

these three NoSQL databases because I wanted to see a the two different sides of the CAP Michael Sharp Big Data CS401r Lab 3 For this paper I decided to do research on MongoDB, Cassandra, and Dynamo. I chose these three NoSQL databases because I wanted to see a the two different sides of the

More information

An Oracle White Paper February 2011. Hadoop and NoSQL Technologies and the Oracle Database

An Oracle White Paper February 2011. Hadoop and NoSQL Technologies and the Oracle Database An Oracle White Paper February 2011 Hadoop and NoSQL Technologies and the Oracle Database Disclaimer The following is intended to outline our general product direction. It is intended for information purposes

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

Data Discovery, Analytics, and the Enterprise Data Hub

Data Discovery, Analytics, and the Enterprise Data Hub Data Discovery, Analytics, and the Enterprise Data Hub Version: 101 Table of Contents Summary 3 Used Data and Limitations of Legacy Analytic Architecture 3 The Meaning of Data Discovery & Analytics 4 Machine

More information

Overview. Introduction to Database Systems. Motivation... Motivation: how do we store lots of data?

Overview. Introduction to Database Systems. Motivation... Motivation: how do we store lots of data? Introduction to Database Systems UVic C SC 370 Overview What is a DBMS? what is a relational DBMS? Why do we need them? How do we represent and store data in a DBMS? How does it support concurrent access

More information

INTRODUCING AZURE SEARCH

INTRODUCING AZURE SEARCH David Chappell INTRODUCING AZURE SEARCH Sponsored by Microsoft Corporation Copyright 2015 Chappell & Associates Contents Understanding Azure Search... 3 What Azure Search Provides...3 What s Required to

More information

Teradata s Big Data Technology Strategy & Roadmap

Teradata s Big Data Technology Strategy & Roadmap Teradata s Big Data Technology Strategy & Roadmap Artur Borycki, Director International Solutions Marketing 18 March 2014 Agenda > Introduction and level-set > Enabling the Logical Data Warehouse > Any

More information

Introduction to Parallel Programming and MapReduce

Introduction to Parallel Programming and MapReduce Introduction to Parallel Programming and MapReduce Audience and Pre-Requisites This tutorial covers the basics of parallel programming and the MapReduce programming model. The pre-requisites are significant

More information

Big Systems, Big Data

Big Systems, Big Data Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,

More information

Why Semantic Analysis is Better than Sentiment Analysis. A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights

Why Semantic Analysis is Better than Sentiment Analysis. A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights Why Semantic Analysis is Better than Sentiment Analysis A White Paper by T.R. Fitz-Gibbon, Chief Scientist, Networked Insights Why semantic analysis is better than sentiment analysis I like it, I don t

More information

Civil Contractors :Interview case study Industry: Construction

Civil Contractors :Interview case study Industry: Construction BUILDING PROJECT MANAGEMENT SOLUTIONS THE WAY PROJECT MANAGERS THINK Civil Contractors :Interview case study Industry: Construction How would you describe your business? We manage the construction of earthworks,

More information

An Approach to Implement Map Reduce with NoSQL Databases

An Approach to Implement Map Reduce with NoSQL Databases www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce

CS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #13: NoSQL and MapReduce Announcements HW4 is out You have to use the PGSQL server START EARLY!! We can not help if everyone

More information

1. INTRODUCTION TO RDBMS

1. INTRODUCTION TO RDBMS Oracle For Beginners Page: 1 1. INTRODUCTION TO RDBMS What is DBMS? Data Models Relational database management system (RDBMS) Relational Algebra Structured query language (SQL) What Is DBMS? Data is one

More information

An Open Source NoSQL solution for Internet Access Logs Analysis

An Open Source NoSQL solution for Internet Access Logs Analysis An Open Source NoSQL solution for Internet Access Logs Analysis A practical case of why, what and how to use a NoSQL Database Management System instead of a relational one José Manuel Ciges Regueiro

More information

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R Table of Contents Introduction.......................................................... 3 Challenges

More information

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration

Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration Lab 4.4 Secret Messages: Indexing, Arrays, and Iteration This JavaScript lab (the last of the series) focuses on indexing, arrays, and iteration, but it also provides another context for practicing with

More information

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS

INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS INTRODUCING DRUID: FAST AD-HOC QUERIES ON BIG DATA MICHAEL DRISCOLL - CEO ERIC TSCHETTER - LEAD ARCHITECT @ METAMARKETS MICHAEL E. DRISCOLL CEO @ METAMARKETS - @MEDRISCOLL Metamarkets is the bridge from

More information

MapReduce (in the cloud)

MapReduce (in the cloud) MapReduce (in the cloud) How to painlessly process terabytes of data by Irina Gordei MapReduce Presentation Outline What is MapReduce? Example How it works MapReduce in the cloud Conclusion Demo Motivation:

More information

The Sierra Clustered Database Engine, the technology at the heart of

The Sierra Clustered Database Engine, the technology at the heart of A New Approach: Clustrix Sierra Database Engine The Sierra Clustered Database Engine, the technology at the heart of the Clustrix solution, is a shared-nothing environment that includes the Sierra Parallel

More information

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types

Topics. Introduction to Database Management System. What Is a DBMS? DBMS Types Introduction to Database Management System Linda Wu (CMPT 354 2004-2) Topics What is DBMS DBMS types Files system vs. DBMS Advantages of DBMS Data model Levels of abstraction Transaction management DBMS

More information

Microsoft Dynamics NAV

Microsoft Dynamics NAV Microsoft Dynamics NAV Maximizing value through business insight Business Intelligence White Paper November 2011 The information contained in this document represents the current view of Microsoft Corporation

More information

This article is the second

This article is the second This article is the second of a series by Pythian experts that will regularly be published as the Performance Corner column in the NoCOUG Journal. The main software components of Oracle Big Data Appliance

More information

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 2, 2013 01-09-2013 1

Big Data Management. Big Data Management. (BDM) Autumn 2013. Povl Koch September 2, 2013 01-09-2013 1 Big Data Management Big Data Management (BDM) Autumn 2013 Povl Koch September 2, 2013 01-09-2013 1 Overview Today s program 1. Little more practical details about this course 2. Chapter 2 & 3 in NoSQL

More information

Microsoft Azure Data Technologies: An Overview

Microsoft Azure Data Technologies: An Overview David Chappell Microsoft Azure Data Technologies: An Overview Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Blobs... 3 Running a DBMS in a Virtual Machine... 4 SQL Database...

More information

Using Big Data Analytics for Financial Services Regulatory Compliance

Using Big Data Analytics for Financial Services Regulatory Compliance Using Big Data Analytics for Financial Services Regulatory Compliance Industry Overview In today s financial services industry, the pendulum continues to swing further in the direction of lower risk and

More information

BIG DATA AND DIGITAL METHODS LECTURE 1 A TOUR ON BIG DATA Dario Malchiodi BIG DATA, SO WHAT? Source: + https://www.google.it/search?q=big+data http://tagxedo.com BIG DATA, SO WHAT? Number of «big data»

More information

Moving From Hadoop to Spark

Moving From Hadoop to Spark + Moving From Hadoop to Spark Sujee Maniyam Founder / Principal @ www.elephantscale.com sujee@elephantscale.com Bay Area ACM meetup (2015-02-23) + HI, Featured in Hadoop Weekly #109 + About Me : Sujee

More information

How To Scale Out Of A Nosql Database

How To Scale Out Of A Nosql Database Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI

More information

In Memory Accelerator for MongoDB

In Memory Accelerator for MongoDB In Memory Accelerator for MongoDB Yakov Zhdanov, Director R&D GridGain Systems GridGain: In Memory Computing Leader 5 years in production 100s of customers & users Starts every 10 secs worldwide Over 15,000,000

More information

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction

Dynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach

More information

Integrating Big Data into the Computing Curricula

Integrating Big Data into the Computing Curricula Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big

More information

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics

The Data Engineer. Mike Tamir Chief Science Officer Galvanize. Steven Miller Global Leader Academic Programs IBM Analytics The Data Engineer Mike Tamir Chief Science Officer Galvanize Steven Miller Global Leader Academic Programs IBM Analytics Alessandro Gagliardi Lead Faculty Galvanize Businesses are quickly realizing that

More information

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar

Object Oriented Databases. OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Object Oriented Databases OOAD Fall 2012 Arjun Gopalakrishna Bhavya Udayashankar Executive Summary The presentation on Object Oriented Databases gives a basic introduction to the concepts governing OODBs

More information

What is Big Data? BCS Aberdeen Branch 6 November 2014

What is Big Data? BCS Aberdeen Branch 6 November 2014 What is Big Data? BCS Aberdeen Branch 6 November 2014 Keith Gordon Soldier Teacher Data Manager Engineer Information Systems Professional Standards Expert Big Data Sceptic What they say The overeager adoption

More information

CS54100: Database Systems

CS54100: Database Systems CS54100: Database Systems Cloud Databases: The Next Post- Relational World 18 April 2012 Prof. Chris Clifton Beyond RDBMS The Relational Model is too limiting! Simple data model doesn t capture semantics

More information

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems

More information

Google Cloud Data Platform & Services. Gregor Hohpe

Google Cloud Data Platform & Services. Gregor Hohpe Google Cloud Data Platform & Services Gregor Hohpe All About Data We Have More of It Internet data more easily available Logs user & system behavior Cheap Storage keep more of it 3 Beyond just Relational

More information

Empowering the Masses with Analytics

Empowering the Masses with Analytics Empowering the Masses with Analytics THE GAP FOR BUSINESS USERS For a discussion of bridging the gap from the perspective of a business user, read Three Ways to Use Data Science. Ask the average business

More information

Scalable ecommerce with NoSQL. Dipali Trivedi

Scalable ecommerce with NoSQL. Dipali Trivedi Scalable ecommerce with NoSQL Dipali Trivedi ECommerce entities and schema Key aspect of NoSQL adoption Denomarlization: Key Aspect of NoSQL adoption Question oriented schema design: A. What are the products

More information

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform

More information

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS

WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS WINDOWS AZURE DATA MANAGEMENT AND BUSINESS ANALYTICS Managing and analyzing data in the cloud is just as important as it is anywhere else. To let you do this, Windows Azure provides a range of technologies

More information

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications

NOSQL, BIG DATA AND GRAPHS. Technology Choices for Today s Mission- Critical Applications NOSQL, BIG DATA AND GRAPHS Technology Choices for Today s Mission- Critical Applications 2 NOSQL, BIG DATA AND GRAPHS NOSQL, BIG DATA AND GRAPHS TECHNOLOGY CHOICES FOR TODAY S MISSION- CRITICAL APPLICATIONS

More information

Querying MongoDB without programming using FUNQL

Querying MongoDB without programming using FUNQL Querying MongoDB without programming using FUNQL FUNQL? Federated Unified Query Language What does this mean? Federated - Integrates different independent stand alone data sources into one coherent view

More information

Big Data and Its Impact on the Data Warehousing Architecture

Big Data and Its Impact on the Data Warehousing Architecture Big Data and Its Impact on the Data Warehousing Architecture Sponsored by SAP Speaker: Wayne Eckerson, Director of Research, TechTarget Wayne Eckerson: Hi my name is Wayne Eckerson, I am Director of Research

More information

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack

Apache Spark 11/10/15. Context. Reminder. Context. What is Spark? A GrowingStack Apache Spark Document Analysis Course (Fall 2015 - Scott Sanner) Zahra Iman Some slides from (Matei Zaharia, UC Berkeley / MIT& Harold Liu) Reminder SparkConf JavaSpark RDD: Resilient Distributed Datasets

More information

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel

Parallel Databases. Parallel Architectures. Parallelism Terminology 1/4/2015. Increase performance by performing operations in parallel Parallel Databases Increase performance by performing operations in parallel Parallel Architectures Shared memory Shared disk Shared nothing closely coupled loosely coupled Parallelism Terminology Speedup:

More information

Crystal Reports. Overview. Contents. Table Linking in Crystal Reports

Crystal Reports. Overview. Contents. Table Linking in Crystal Reports Overview Contents This document demonstrates the linking process in Crystal Reports (CR) 7 and later. This document discusses linking for PC-type databases, ODBC linking and frequently asked questions.

More information

Session 7 Fractions and Decimals

Session 7 Fractions and Decimals Key Terms in This Session Session 7 Fractions and Decimals Previously Introduced prime number rational numbers New in This Session period repeating decimal terminating decimal Introduction In this session,

More information

Unit 7 The Number System: Multiplying and Dividing Integers

Unit 7 The Number System: Multiplying and Dividing Integers Unit 7 The Number System: Multiplying and Dividing Integers Introduction In this unit, students will multiply and divide integers, and multiply positive and negative fractions by integers. Students will

More information

Architectures for massive data management

Architectures for massive data management Architectures for massive data management Apache Spark Albert Bifet albert.bifet@telecom-paristech.fr October 20, 2015 Spark Motivation Apache Spark Figure: IBM and Apache Spark What is Apache Spark Apache

More information

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Chapter 6. Foundations of Business Intelligence: Databases and Information Management Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:

More information

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process

ORACLE OLAP. Oracle OLAP is embedded in the Oracle Database kernel and runs in the same database process ORACLE OLAP KEY FEATURES AND BENEFITS FAST ANSWERS TO TOUGH QUESTIONS EASILY KEY FEATURES & BENEFITS World class analytic engine Superior query performance Simple SQL access to advanced analytics Enhanced

More information

Google Cloud Platform The basics

Google Cloud Platform The basics Google Cloud Platform The basics Who I am Alfredo Morresi ROLE Developer Relations Program Manager COUNTRY Italy PASSIONS Community, Development, Snowboarding, Tiramisu' Reach me alfredomorresi@google.com

More information

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data INFO 1500 Introduction to IT Fundamentals 5. Database Systems and Managing Data Resources Learning Objectives 1. Describe how the problems of managing data resources in a traditional file environment are

More information

7 Ways To Explode Your Profits as a Tint Professional and Change your Life Forever!

7 Ways To Explode Your Profits as a Tint Professional and Change your Life Forever! WINDOW FILM CUTTING SYSTEM 7 Ways To Explode Your Profits as a Tint Professional and Change your Life Forever! 2012 Tint Tek The automobile window tinting industry is a highly profitable trade and, for

More information

What Is Singapore Math?

What Is Singapore Math? What Is Singapore Math? You may be wondering what Singapore Math is all about, and with good reason. This is a totally new kind of math for you and your child. What you may not know is that Singapore has

More information

Evaluation Checklist Data Warehouse Automation

Evaluation Checklist Data Warehouse Automation Evaluation Checklist Data Warehouse Automation March 2016 General Principles Requirement Question Ajilius Response Primary Deliverable Is the primary deliverable of the project a data warehouse, or is

More information

MapReduce and the New Software Stack

MapReduce and the New Software Stack 20 Chapter 2 MapReduce and the New Software Stack Modern data-mining applications, often called big-data analysis, require us to manage immense amounts of data quickly. In many of these applications, the

More information

Introducing DocumentDB

Introducing DocumentDB David Chappell Introducing DocumentDB A NoSQL Database for Microsoft Azure Sponsored by Microsoft Corporation Copyright 2014 Chappell & Associates Contents Why DocumentDB?... 3 The DocumentDB Data Model...

More information

CS 564: DATABASE MANAGEMENT SYSTEMS

CS 564: DATABASE MANAGEMENT SYSTEMS Fall 2013 CS 564: DATABASE MANAGEMENT SYSTEMS 9/4/13 CS 564: Database Management Systems, Jignesh M. Patel 1 Teaching Staff Instructor: Jignesh Patel, jignesh@cs.wisc.edu Office Hours: Mon, Wed 1:30-2:30

More information

Understanding the Value of In-Memory in the IT Landscape

Understanding the Value of In-Memory in the IT Landscape February 2012 Understing the Value of In-Memory in Sponsored by QlikView Contents The Many Faces of In-Memory 1 The Meaning of In-Memory 2 The Data Analysis Value Chain Your Goals 3 Mapping Vendors to

More information

A Total Cost of Ownership Comparison of MongoDB & Oracle

A Total Cost of Ownership Comparison of MongoDB & Oracle A MongoDB White Paper A Total Cost of Ownership Comparison of MongoDB & Oracle August 2015 Table of Contents Executive Summary Cost Categories TCO for Example Projects Upfront Costs Initial Developer Effort

More information

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure

More information

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置

More information