Big Data and Databases
|
|
- Martin Greer
- 8 years ago
- Views:
Transcription
1 Big Data and Databases Vijay Gadepally Lauren Milechin This work is sponsored, by the Department of the ir Force, under ir Force Contract F C Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.
2 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies
3 Big Data Challenge Kids dults Elderly Users (deciders) Rapidly increasing - Data volume - Data velocity - Data variety - Date veracity Things Gap Humans 10 Years go 5 Years go Today In 5 Years Sources (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables
4 Challenge of Data Volume Where do I store my data? How much do I store? 1 TB total pplications & Data 2 TB total Data Scalable Data Center Data Flat file Spreadsheet How do I access it? Database Distributed database How do I index it?
5 Challenge of Data Velocity 2011 Data Generated Per Minute Facebook: 684,478 pieces of content Twitter: 100,000 tweets YouTube: 48 hours of new video Google: 2,000,000 new queries Internet Population: 2.1 Billion people
6 Challenge of Data Velocity
7 Challenge of Data Velocity 2014 Data Generated Per Minute Facebook: 2,460,00 pieces of content Twitter: 277,000 tweets YouTube: 72 hours of new video Google: 4,000,000 new queries Internet Population: 2.4 Billion people
8 Challenge of Data Velocity Increase in Data Generated Facebook: 350 MB/min Twitter: 50 MB/min YouTube: GB/min
9 Challenge of Data Velocity Increase in Data Generated Facebook: 350 MB/min Twitter: 50 MB/min How do I capture my data for processing? YouTube: GB/min How do I process the data within the specified time constraints?
10 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio
11 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I index heterogeneous data formats? Strings may be easily stored in a database Image and document metadata may fit in traditional database Raw images/documents may require file system or alternate database
12 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I fuse heterogeneous data formats to provide uniform view? Fusion drives Indexing/schema decisions Technology (databases, storage, etc.) selection Selection of software (visualization, language) tools
13 Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I develop algorithms for heterogeneous data formats? Images can use High Performance Computing tools Strings and documents require a new algebra to take advantage of High Performance computing systems Visualization requires merging image with string data
14 Challenge of Data Veracity Does the data need protection? How do I balance privacy with availability? What level of security is required? How do I make data available only to vetted analysts? How is data kept secure and private while minimizing impact on analysis?
15 Challenge of Data Veracity Does the data need protection? How confident am I in the integrity of my data? Where did it come from? Who has accessed it? Has anyone modified data stream? Has anyone tampered with the data stream?
16 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies
17 General Strategy: System Design Kids dults Elderly Users (deciders) User Interface Things Files Ingest & Ingest & Enrichment Enrichment Ingest Databases Humans nalytics B Gap C E D 10 Years go 5 Years go Today In 5 Years Scheduler Computing Sources (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables
18 General Strategies: Collection Collect, Store, and Process only Useful Data 10 4 Degree Distribution 10 3 Count Degree d max
19 General Strategies: Collection Collect, Store, and Process only Useful Data Intelligently Reduce the mount of Data through Sampling Techniques 10 4 Degree Distribution 10 3 SIGNL Count 10 2 NOISE N-D SPCE Degree d max Example background model: Power Law Graph
20 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables
21 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables
22 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack Data Loss / Exfiltration E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables
23 General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack Data Loss / Exfiltration Insider E Threat D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables
24 General Strategy: Privacy-Preserving Technology Use Cryptographic Protocols to Protect the Confidentiality, Integrity, and/or vailability of Data Lots of ongoing research Popular techniques: Fully Homomorphic Encryption Multiparty Computation Computing on Masked Data (CMD) Cryptographic protections for NoSQL ccumulo database Uses order preserving, deterministic and semantically secure encryption 2-4x performance overhead Plaintext! Query! Plaintext! nalytic! Result! Encrypt Decrypt Masked! Query! Masked! nalytic! Result! CMD Big Data Cloud
25 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies
26 Database Fundamentals Database Collection of data and supporting data structures Database Management System (DBMS) Software that provides interface between user and database Define new data and schema data Retrieve (Query) data DB administration: set security and permissions BigTable
27 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails B SE Successful Transaction Failed Transaction
28 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time B SE
29 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
30 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
31 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
32 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
33 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
34 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
35 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
36 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
37 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE or
38 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
39 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
40 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
41 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
42 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
43 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
44 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
45 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
46 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE
47 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction
48 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction
49 Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction
50 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency
51 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency
52 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency
53 Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency CID BSE BigTable
54 Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance
55 Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance
56 Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance Consistency vailability Partition Tolerance BigTable
57 Database Fundamentals SQL NoSQL NewSQL DTBSES Cluster Dremel BigTable PRLLEL PROCESSING MapReduce Hadoop Pregel D4M Giraph Slide Source: S. Sawyer, B. D. O'Gwynn,. Tran, T. Yu. Understanding Query Performance in ccumulo. HPEC 2013.
58 Database Fundamentals Consistency Relational DB Systems NewSQL DB Systems NoSQL DB Systems Performance
59
60 Relational Databases What it Is Database that stores information about data and how it is related Highly structured normalized table based database Predefined schema/organization of data Vertically scalable with good quality hardware Use SQL as query interface Typically provide full consistency Examples
61 Relational Databases Who Uses It Dealing with transactional data Problem sizes are moderate Need for CID guarantees When to Use It How to Use It JDBC (Java DataBase Connector) SQL command line
62 Relational Databases Tweet Table Tweet ID User ID Location ID Tweet Text wwz4p7jd Omg earthquake wwh1hss5 We're gonna have an earthquake wwwygbvq Omg it's a earthquake User ID Username Friends Count _zariaaa_ gnvrly_ron yolvndv 424 UserTable Location ID Latitude Longitude wwh1hss wwwygbvq wwz4p7jd Location Table
63 NoSQL Databases What it Is Database based on documents, key-value pairs, graphs, or widecolumn stores Dynamic schema Horizontal scalability Typically provide eventual consistency Examples
64 NoSQL Databases Who Uses It When to Use It Large unstructured datasets Strong need for high performance Only require BSE guarantees Python/JV bindings Lincoln Laboratory D4M Command Line How to Use It
65 NoSQL Databases Edge Table Degree Table Degree FriendCount 424 FriendCount 541 FriendCount 693 Latitude Latitude Latitude Location wwh1hss5 Location wwwygbvq Location wwz4p7jd UserID UserName _zariaaa_ UserName gnvrly_ron UserName yolvndv Word Omg Word a Word an Word earthquake FriendCount FriendCount FriendCount Latitude Word an 1 Word earthquake FriendCount 424 FriendCount 541 Word an Word earthquake Transpose Table Text Table Text Omg earthquake We're gonna have an earthquake Omg it's a earthquake
66 NoSQL Example - ccumulo
67 ccumulo Design Drivers 1 2 Cell-Level Security Express common security requirements in the infrastructure, not just in the application Data-centric approach encourages secure sharing Scalability Near linear performance improvements at thousands of nodes Durable and reliable under increased failures that come with scale 3 Diverse, Interactive nalytics Sorted key/value core performs well in a diverse set of domains Information retrieval, statistics, graph analysis, geo indexing, and more 4 Flexible, daptive Schema Start with universal structures and indexing Refine the schema over time Source: Sqrrl Data Inc
68 ccumulo Features Visibility Labels Iterators utomatic table splitting Support for pache Thrift proxy Visibility Iterator Table-split Thrift Schema D4M volume velocity variety veracity
69 NewSQL Databases What it Is Database systems that emulate performance of NoSQL along with CID guarantees of Relational Databases Usually scaled up version of a relational database Often uses array data model Other data models include graph-based data structures and distributed relational tables May make use of in-memory processing or specialized hardware Examples
70 NewSQL Databases Who Uses It When to Use It Large multidimensional datasets Data that doesn t fit in traditional databases Have the volume for NoSQL, but need for CID guarantees How to Use It Each have custom PI Ex: SciDB uses JDBC, SHIM, D4M, R-SciDB binding
71 NewSQL Example: SciDB
72 SciDB Design Drivers SciDB R, Python, Matlab, Julia, Massive Parallel Processing Database rray data model Complex analytics Commodity clusters or cloud
73 SciDB Example Schema Highly customizable to application Each cell is a strongly-typed structure of attributes: <int>, or <double, string, float>, or Nullable attributes, empty cells, sparse, or dense stock!! MSFT! MSFVX! MT!! price: 15.76! volume: 200! price: 234.2! volume: 10! price: 17.50! volume: null! price: 17.40! volume: 100! price: 0.02! volume: null! ! ! !! time!
74 SciDB Features Massive Parallel Database rray Data Model nalytic language support In-database analytics MPP DB rray Languages nalytics volume velocity variety veracity
75 Quick Reference RDBMS vs. NoSQL vs. NewSQL Examples Schema rchitecture Guarantees ccess Relational Databases MySQL, PostgreSQL, Oracle Typed columns with relational keys Single-node or sharded CID transactions SQL, indexing, joins, and query planning NoSQL HBase, Cassandra, ccumulo Schema-less Distributed, scalable Eventually consistent Low-level PI (scans and filtering) NewSQL SciDB, VoltDB, MemSQL Strongly-typed structure of attributes Distributed, scalable CID transactions (most) Custom PI, JDBC, Bindings to popular languages Slide Source: S. Sawyer, B. D. O'Gwynn,. Tran, T. Yu. Understanding Query Performance in ccumulo. HPEC 2013.
76 Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies
77 On The Horizon New Technologies and Techniques New database and processing technology such as: pache Spark: In memory distributed processing TileDB: Database for scientific big data S-Store: Database tuned for streaming data New cross database and storage engine standards, PI, and practices: BigDawg: n PI to simplify big data analytics currently being designed GraphBLS: n effort to standardize graph algorithms and databases dvances in privacy preserving technology: SPED: Signal processing in the encrypted domain Greater efficiency of protocols such as Functional Encryption and Multiparty Computation Tools and technologies will continue to evolve important to keep students abreast of new developments
78 Conclusions Lots of stuff going on! Very important to understand details of your dataset, end analytic, and other requirements Topics covered: Challenge overview (What is the problem?) Some general strategies Databases Upcoming technologies
79 Leading Science and Engineering Research University 80 Nobel laureates, 50 National Medal of Science recipients Thousands of companies (11 th largest world economy) 1000 faculty, employees, students $1.4B in annual external research funding Lincoln Laboratory $800M Other MIT $600M
80 bout MIT
Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15
Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15 Welcome! Today s Goals: Introduce you to the Big Data @ CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationHBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367
HBase A Comprehensive Introduction James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367 Overview Overview: History Began as project by Powerset to process massive
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationEvaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationHow To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationCloud Computing at Google. Architecture
Cloud Computing at Google Google File System Web Systems and Algorithms Google Chris Brooks Department of Computer Science University of San Francisco Google has developed a layered system to handle webscale
More informationBig Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
More informationNot Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)
Not Relational Models For The Management of Large Amount of Astronomical Data Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF) What is a DBMS A Data Base Management System is a software infrastructure
More informationNoSQL Database Systems and their Security Challenges
NoSQL Database Systems and their Security Challenges Morteza Amini amini@sharif.edu Data & Network Security Lab (DNSL) Department of Computer Engineering Sharif University of Technology September 25 2
More informationBig Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
More informationMongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
More informationComposite Data Virtualization Composite Data Virtualization And NOSQL Data Stores
Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores Composite Software October 2010 TABLE OF CONTENTS INTRODUCTION... 3 BUSINESS AND IT DRIVERS... 4 NOSQL DATA STORES LANDSCAPE...
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationUnified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia
Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing
More informationBig Data and Big Analytics
Big Data and Big Analytics Introducing SciDB Open source, massively parallel DBMS and analytic platform Array data model (rather than SQL, Unstructured, XML, or triple-store) Extensible micro-kernel architecture
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationCOMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationBringing Big Data Modelling into the Hands of Domain Experts
Bringing Big Data Modelling into the Hands of Domain Experts David Willingham Senior Application Engineer MathWorks david.willingham@mathworks.com.au 2015 The MathWorks, Inc. 1 Data is the sword of the
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationPractical Cassandra. Vitalii Tymchyshyn tivv00@gmail.com @tivv00
Practical Cassandra NoSQL key-value vs RDBMS why and when Cassandra architecture Cassandra data model Life without joins or HDD space is cheap today Hardware requirements & deployment hints Vitalii Tymchyshyn
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationChapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
More informationBig Data and Scripting Systems build on top of Hadoop
Big Data and Scripting Systems build on top of Hadoop 1, 2, Pig/Latin high-level map reduce programming platform Pig is the name of the system Pig Latin is the provided programming language Pig Latin is
More informationBig Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationIntroduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data
Introduction to Hadoop HDFS and Ecosystems ANSHUL MITTAL Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data Topics The goal of this presentation is to give
More informationbigdata Managing Scale in Ontological Systems
Managing Scale in Ontological Systems 1 This presentation offers a brief look scale in ontological (semantic) systems, tradeoffs in expressivity and data scale, and both information and systems architectural
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationXiaoming Gao Hui Li Thilina Gunarathne
Xiaoming Gao Hui Li Thilina Gunarathne Outline HBase and Bigtable Storage HBase Use Cases HBase vs RDBMS Hands-on: Load CSV file to Hbase table with MapReduce Motivation Lots of Semi structured data Horizontal
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationCISC 432/CMPE 432/CISC 832 Advanced Database Systems
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 martin@cs.queensu.ca Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationOracle Big Data Spatial & Graph Social Network Analysis - Case Study
Oracle Big Data Spatial & Graph Social Network Analysis - Case Study Mark Rittman, CTO, Rittman Mead OTN EMEA Tour, May 2016 info@rittmanmead.com www.rittmanmead.com @rittmanmead About the Speaker Mark
More informationOpen source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
More informationwww.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
More informationNoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015
NoSQL Databases Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015 Database Landscape Source: H. Lim, Y. Han, and S. Babu, How to Fit when No One Size Fits., in CIDR,
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More informationA COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA
A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA Ompal Singh Assistant Professor, Computer Science & Engineering, Sharda University, (India) ABSTRACT In the new era of distributed system where
More informationHadoop IST 734 SS CHUNG
Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to
More informationX4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
General announcements In-Memory is available next month http://www.oracle.com/us/corporate/events/dbim/index.html X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationCompleting the Big Data Ecosystem:
Completing the Big Data Ecosystem: in sqrrl data INC. August 3, 2012 Design Drivers in Analysis of big data is central to our customers requirements, in which the strongest drivers are: Scalability: The
More informationComplexity and Scalability in Semantic Graph Analysis Semantic Days 2013
Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013 James Maltby, Ph.D 1 Outline of Presentation Semantic Graph Analytics Database Architectures In-memory Semantic Database Formulation
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationFrom Spark to Ignition:
From Spark to Ignition: Fueling Your Business on Real-Time Analytics Eric Frenkiel, MemSQL CEO June 29, 2015 San Francisco, CA What s in Store For This Presentation? 1. MemSQL: A real-time database for
More informationHow graph databases started the multi-model revolution
How graph databases started the multi-model revolution Luca Garulli Author and CEO @OrientDB QCon Sao Paulo - March 26, 2015 Welcome to Big Data 90% of the data in the world today has been created in the
More informationStructured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
More informationNative Connectivity to Big Data Sources in MSTR 10
Native Connectivity to Big Data Sources in MSTR 10 Bring All Relevant Data to Decision Makers Support for More Big Data Sources Optimized Access to Your Entire Big Data Ecosystem as If It Were a Single
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationOracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
More informationBig Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island
Big Data Principles and best practices of scalable real-time data systems NATHAN MARZ JAMES WARREN II MANNING Shelter Island contents preface xiii acknowledgments xv about this book xviii ~1 Anew paradigm
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationSimilarity Search in a Very Large Scale Using Hadoop and HBase
Similarity Search in a Very Large Scale Using Hadoop and HBase Stanislav Barton, Vlastislav Dohnal, Philippe Rigaux LAMSADE - Universite Paris Dauphine, France Internet Memory Foundation, Paris, France
More informationA Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University
A Distributed Storage Schema for Cloud Computing based Raster GIS Systems Presented by Cao Kang, Ph.D. Geography Department, Clark University Cloud Computing and Distributed Database Management System
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationDomain driven design, NoSQL and multi-model databases
Domain driven design, NoSQL and multi-model databases Java Meetup New York, 10 November 2014 Max Neunhöffer www.arangodb.com Max Neunhöffer I am a mathematician Earlier life : Research in Computer Algebra
More informationBig Data Management and Security
Big Data Management and Security Audit Concerns and Business Risks Tami Frankenfield Sr. Director, Analytics and Enterprise Data Mercury Insurance What is Big Data? Velocity + Volume + Variety = Value
More informationAnalytics in the Cloud. Peter Sirota, GM Elastic MapReduce
Analytics in the Cloud Peter Sirota, GM Elastic MapReduce Data-Driven Decision Making Data is the new raw material for any business on par with capital, people, and labor. What is Big Data? Terabytes of
More informationCassandra A Decentralized, Structured Storage System
Cassandra A Decentralized, Structured Storage System Avinash Lakshman and Prashant Malik Facebook Published: April 2010, Volume 44, Issue 2 Communications of the ACM http://dl.acm.org/citation.cfm?id=1773922
More informationHadoop Ecosystem B Y R A H I M A.
Hadoop Ecosystem B Y R A H I M A. History of Hadoop Hadoop was created by Doug Cutting, the creator of Apache Lucene, the widely used text search library. Hadoop has its origins in Apache Nutch, an open
More informationApplications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
More informationIntroduction to Big Data! with Apache Spark" UC#BERKELEY#
Introduction to Big Data! with Apache Spark" UC#BERKELEY# So What is Data Science?" Doing Data Science" Data Preparation" Roles" This Lecture" What is Data Science?" Data Science aims to derive knowledge!
More informationNextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)
More informationOracle Big Data Strategy Simplified Infrastrcuture
Big Data Oracle Big Data Strategy Simplified Infrastrcuture Selim Burduroğlu Global Innovation Evangelist & Architect Education & Research Industry Business Unit Oracle Confidential Internal/Restricted/Highly
More informationA Brief Outline on Bigdata Hadoop
A Brief Outline on Bigdata Hadoop Twinkle Gupta 1, Shruti Dixit 2 RGPV, Department of Computer Science and Engineering, Acropolis Institute of Technology and Research, Indore, India Abstract- Bigdata is
More informationLambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: bdg@qburst.com Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
More informationHPC ABDS: The Case for an Integrating Apache Big Data Stack
HPC ABDS: The Case for an Integrating Apache Big Data Stack with HPC 1st JTC 1 SGBD Meeting SDSC San Diego March 19 2014 Judy Qiu Shantenu Jha (Rutgers) Geoffrey Fox gcf@indiana.edu http://www.infomall.org
More informationNavigating the Big Data infrastructure layer Helena Schwenk
mwd a d v i s o r s Navigating the Big Data infrastructure layer Helena Schwenk A special report prepared for Actuate May 2013 This report is the second in a series of four and focuses principally on explaining
More informationBig Data With Hadoop
With Saurabh Singh singh.903@osu.edu The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
More informationCloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu
Lecture 4 Introduction to Hadoop & GAE Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu Outline Introduction to Hadoop The Hadoop ecosystem Related projects
More informationROME, 17-10-2013 BIG DATA ANALYTICS
ROME, 17-10-2013 BIG DATA ANALYTICS BIG DATA FOUNDATIONS Big Data is #1 on the 2012 and the 2013 list of most ambiguous terms - Global language monitor 2 BIG DATA FOUNDATIONS Big Data refers to data sets
More informationThe Internet of Things and Big Data: Intro
The Internet of Things and Big Data: Intro John Berns, Solutions Architect, APAC - MapR Technologies April 22 nd, 2014 1 What This Is; What This Is Not It s not specific to IoT It s not about any specific
More informationAffordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale
WHITE PAPER Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale Sponsored by: IBM Carl W. Olofson December 2014 IN THIS WHITE PAPER This white paper discusses the concept
More informationВовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН
Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН Zettabytes Petabytes ABC Sharding A B C Id Fn Ln Addr 1 Fred Jones Liberty, NY 2 John Smith?????? 122+ NoSQL Database
More informationIntroduction to Apache Cassandra
Introduction to Apache Cassandra White Paper BY DATASTAX CORPORATION JULY 2013 1 Table of Contents Abstract 3 Introduction 3 Built by Necessity 3 The Architecture of Cassandra 4 Distributing and Replicating
More informationCloud Computing and Advanced Relationship Analytics
Cloud Computing and Advanced Relationship Analytics Using Objectivity/DB to Discover the Relationships in your Data By Brian Clark Vice President, Product Management Objectivity, Inc. 408 992 7136 brian.clark@objectivity.com
More informationA programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
More informationBIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
More informationLarge-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More information