Medical Big Data Workshop 12:30-5pm Star Conference Room. #MedBigData15
|
|
- Marcia Sims
- 8 years ago
- Views:
Transcription
1 Medical Big Data Workshop 12:30-5pm Star Conference Room #MedBigData15
2 Welcome! Today s Goals: Introduce you to the Big CSAIL Introduce you to the popular MIMIC II Dataset Overview of Database Technologies Network and meet new people! Come up with some cool ideas
3 The Team Sam Madden Mornin Feng Ikaro Silva Tristan Naumann Jeremy Kepner Alex Poliakov Lauren Edwards Vijay Gadepally
4 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
5 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
6 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
7 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
8 Database Technologies
9 Database Fundamentals Database: CollecWon of data and supporwng data structures Database Management Systems: SoYware that provides interface between user and database Common User- DBMS interacwons: Defining new data, new schema, etc. UpdaWng data Retrieving (Querying) data DB administrawon, security, permissions, etc.
10 A Brief History of Open- Source Big Data NoSQL DATABASES Cluster BigTable Dremel NewSQL PARALLEL PROCESSING MapReduce Hadoop Pregel D4M Giraph
11 RelaQonal Databases What it is: Database that stores informawon about data and how it is related. Table based databases, and tables contain n rows when you have n data entries Predefined schema/organizawon of data VerWcally scalable (Depends on hardware power. Scales with beier hardware) Use SQL as query interface Typically provide full consistency (only one version of stored data in the whole cluster) RelaQonal Databases Use Cases: Strong need to have consistent results (for example dealing with $$) Willing to trade performance for accuracy Need for ACID guarantees Examples: mysql, postgresql, Oracle
12 Non- RelaQonal Databases What it is: Database based on documents, key- value pairs, graphs, or wide- column stores No standard schema definiwons necessary to adhere to. Dynamic schema Horizontal scalability (Usually run on COTS, scales with more systems) Typically provide eventual (nosql) consistency there may be different valid versions of the same data in the cluster with different values. Non RelaQonal/Distributed Databases Use Cases: OK with BASE (Basically available, soy state, eventual consistency) guarantees Examples: Accumulo, Cassandra, MongoDB, Google Big Table
13 Comparing RelaQonal and Non RelaQonal Databases RelaQonal Databases MySQL, PostgreSQL, Oracle NoSQL HBase, Cassandra, Accumulo Typed columns with relawonal keys Single- node or sharded Quick Reference Schema- less RDBMS vs. NoSQL Distributed, scalable ACID transacwons SQL, indexing, joins, and query planning Eventually consistent Low- level API (scans and filtering)
14 Demo
15
16
17
18
19
20
21 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
22 Possible Projects
23 The MIMIC II Dataset The MulWparameter Intelligent Monitoring in Intensive Care (MIMIC II) dataset provides a realiswc and challenging corpus of data Made up of 2 parts: Clinical Dataset Waveform Dataset More informawon: hip://physionet.org/mimic2
24 Clinical Dataset Contents: General - PaWent demographics, hospital admissions & discharge dates, room tracking, death dates (in or out of the hospital), ICD- 9 codes, unique code for health care provider and type (RN, MD, RT, etc). All dates are surrogate dates due to privacy issues but Wme intervals (even those between mulwple admissions of the same pawent) are preserved. Physiological - Hourly vital sign metrics, SAPS, SOFA, venwlator seqngs, etc. MedicaWons - IV meds, provider order entry data, etc. Lab Tests - Chemistry, hematology, ABGs, imaging, etc. Fluid Balance - Intake (soluwons, blood, etc) and output (urine, eswmated blood loss, etc). Notes & Reports - Discharge summary, nursing progress notes, etc; cardiac catheterizawon, ECG, radiology, and echo reports. Currently stored in relawonal database
25 Waveform Dataset The waveform database contains thousands of recordings of mulwple physiologic signals ("waveforms") and Wme series of vital signs ("numerics") collected from bedside pawent monitors in adult and neonatal intensive care units (ICUs). Examples: ECG Signals Arterial Blood Pressure RespiraWon
26 MIMIC II Dataset Very useful, but, many major challenges: Messy Erroneous Unstructured Components Heterogeneous data types MIMIC II dataset provides insight into the challenges associated with real datasets
27 Project Ideas Common Themes: Cleaning AnalyWcs Viz 2015 Challenge
28 Project Ideas Meant to be interacwve! Please jump in with your thoughts or queswons We ve thought of a few projects along the following themes: AutomaWc pre processing/cleaning of data AnalyWcs VisualizaWon Discussion for people of different backgrounds and experwse
29 Theme: AutomaQc Pre Processing of Medical Big Data Big data means big problems in working with data collected over Wme. Challenges: Volume Velocity Variety Veracity (privacy) Big can be a relawve term. Depends your hardware, analywcs and types of data. Big can be anywhere from gigabytes to terabytes
30 Clean Data Look for possibly erroneous regions and extract points of interest based on physical or clinical informawon Project will perform literature review to find characteriswcs of correct signal, develop codebase that can read in waveforms and apply tests or comparisons against ideal data, extract regions that do not conform.
31 Outlier DetecQon/SubsQtuQon Look for signals or parts of signals that are outliers based on the stawswcs of the signal and biological limits (for example, having a heart beat above a threshold or 100 standard deviawons above the mean) Some useful tools: SCORPION/dbwipes PotenWal projects may perform literature review of current outlier detecwon algorithms and possible biological/ physical limits Reference: hip://web.mit.edu/mfeng/www/papers/arwfact_cr.pdf Reference: hip://web.mit.edu/mfeng/www/papers/ ICASSP13_HanMumaFengZoubir_draY.pdf
32 Outlier DetecQon/SubsQtuQon (2) Look for anomalies in the rate of change of signals, which may indicate errors in data collecwon. Signals are non stawonary and it may be necessary use enwre signal and not just pieces Project may be to develop filters that can look for regions of stawswcally or biologically anomalous rates of change
33 DetecQon of Human Bias OYen, there are errors in a dataset when human intervenwon is required. For example, someone may enter 100 KG instead of 100 lbs. Project will go through entries where human bias may exist and look for possible errors.
34 DetecQng Incorrect Signal Leads A common problem is when signal leads are mixed up (for example ECG lead V with IV). Project will look for signal characteriswcs associated with different leads, and go through dataset to extract erroneous connecwons. Reference: hips://github.com/ikarosilva/paweniracking
35 Theme: MIMIC AnalyQcs Developing a set of medically relevant analywcs that leverage the relawonal and Wme series porwons of the dataset. Will be great to have physical/medical pracwwoners involved!
36 Market Basket MedicaQon Use paiern analysis and text mining to predict the next medicawon for a parwcular pawent. Will involve looking at paierns of how medicawons were prescribe or taken by pawents. For example, pawents who take X medicawon have a tendency to take Y medicawon Reference: hip:// pii/s
37 RelaQng waveform and structured data Use Wming data, paiern matching and io_events to look at the relawonship between different measurements. For example, find the relawonship between blood pressure and urine. May need to control for possible intervenwons, age range, gender, etc.
38 PaQent Cohorts Determine how pawents are clustered. This can be very useful for some of the other analywcs. Possible approaches: cluster pawents based on waveform stawswcs, cluster on physical characteriswcs (age, gender, etc.), cluster based on medicawon/intervenwon.
39 Theme: VisualizaQon An important aspect of big data is big visualizawon. Data visualizawons can aid in the explorawon of knowledge and especially in complex datasets such as MIMIC, can help with the generawon of insight.
40 MIMIC Explorer Design a visualizawon framework (web or otherwise) to allow visualizawon of the relawonal and non- relawonal components of the MIMIC dataset Ideally, explorer should be able to perform some basic analywcs to get eswmates of data For example, group pawents by gender, age, Wme of admission, etc.
41 Visualize AnalyQcs Work with other groups to develop visualizawon for their analywcs/data pre- processing tasks Visualize data on new hardware such as Google Glass, Occulus RiY, etc. For example, port the MIMIC II explorer to different visualizawon hardware
42 Open Floor Anyone in the audience working on something intereswng?
43 How to get started? us! Get an account on MIT SuperCloud to get access to a compuwng cluster MIT SuperCloud has the MIMIC II data readily available.
44 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
45 Agenda IntroducWon and Welcome About the ISTC Program About the MIMIC II Dataset Break Database Technologies Project Ideas Group Discussion (more ideas!) Closing
46 Contact us! Sam Madden: Vijay Gadepally Get an account on MIT Systems.
Big Data and Databases
Big Data and Databases Vijay Gadepally (vijayg@ll.mit.edu) Lauren Milechin (lauren.milechin@ll.mit.edu) This work is sponsored, by the Department of the ir Force, under ir Force Contract F8721-05-C-0002.
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationSQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
More informationINTRODUCTION TO CASSANDRA
INTRODUCTION TO CASSANDRA This ebook provides a high level overview of Cassandra and describes some of its key strengths and applications. WHAT IS CASSANDRA? Apache Cassandra is a high performance, open
More informationNoSQL Database Systems and their Security Challenges
NoSQL Database Systems and their Security Challenges Morteza Amini amini@sharif.edu Data & Network Security Lab (DNSL) Department of Computer Engineering Sharif University of Technology September 25 2
More informationClient Overview. Engagement Situation. Key Requirements
Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationCISC 432/CMPE 432/CISC 832 Advanced Database Systems
CISC 432/CMPE 432/CISC 832 Advanced Database Systems Course Info Instructor: Patrick Martin Goodwin Hall 630 613 533 6063 martin@cs.queensu.ca Office Hours: Wednesday 11:00 1:00 or by appointment Schedule:
More informationIntegrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationWhy NoSQL? Your database options in the new non- relational world. 2015 IBM Cloudant 1
Why NoSQL? Your database options in the new non- relational world 2015 IBM Cloudant 1 Table of Contents New types of apps are generating new types of data... 3 A brief history on NoSQL... 3 NoSQL s roots
More informationHow To Use Big Data For Telco (For A Telco)
ON-LINE VIDEO ANALYTICS EMBRACING BIG DATA David Vanderfeesten, Bell Labs Belgium ANNO 2012 YOUR DATA IS MONEY BIG MONEY! Your click stream, your activity stream, your electricity consumption, your call
More informationReal World Big Data Architecture - Splunk, Hadoop, RDBMS
Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk Disclaimer During the course of this presentagon, we may make forward looking
More informationBig Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
More informationextensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010
System/ Scale to Primary Secondary Joins/ Integrity Language/ Data Year Paper 1000s Index Indexes Transactions Analytics Constraints Views Algebra model my label 1971 RDBMS O tables sql-like 2003 memcached
More informationBig Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012
Big Data Buzzwords From A to Z By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012 Big Data Buzzwords Big data is one of the, well, biggest trends in IT today, and it has spawned a whole new generation
More informationYou should have a working knowledge of the Microsoft Windows platform. A basic knowledge of programming is helpful but not required.
What is this course about? This course is an overview of Big Data tools and technologies. It establishes a strong working knowledge of the concepts, techniques, and products associated with Big Data. Attendees
More informationAn Approach to Implement Map Reduce with NoSQL Databases
www.ijecs.in International Journal Of Engineering And Computer Science ISSN: 2319-7242 Volume 4 Issue 8 Aug 2015, Page No. 13635-13639 An Approach to Implement Map Reduce with NoSQL Databases Ashutosh
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationLecture Data Warehouse Systems
Lecture Data Warehouse Systems Eva Zangerle SS 2013 PART C: Novel Approaches in DW NoSQL and MapReduce Stonebraker on Data Warehouses Star and snowflake schemas are a good idea in the DW world C-Stores
More informationBIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
More informationDatabases 2 (VU) (707.030)
Databases 2 (VU) (707.030) Introduction to NoSQL Denis Helic KMI, TU Graz Oct 14, 2013 Denis Helic (KMI, TU Graz) NoSQL Oct 14, 2013 1 / 37 Outline 1 NoSQL Motivation 2 NoSQL Systems 3 NoSQL Examples 4
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationComparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
More informationChallenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
More informationBIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
More informationData Modeling for Big Data
Data Modeling for Big Data by Jinbao Zhu, Principal Software Engineer, and Allen Wang, Manager, Software Engineering, CA Technologies In the Internet era, the volume of data we deal with has grown to terabytes
More informationMongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15
MongoDB in the NoSQL and SQL world. Horst Rechner horst.rechner@fokus.fraunhofer.de Berlin, 2012-05-15 1 MongoDB in the NoSQL and SQL world. NoSQL What? Why? - How? Say goodbye to ACID, hello BASE You
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationNoSQL Database Options
NoSQL Database Options Introduction For this report, I chose to look at MongoDB, Cassandra, and Riak. I chose MongoDB because it is quite commonly used in the industry. I chose Cassandra because it has
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationEvaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing
Evaluating NoSQL for Enterprise Applications Dirk Bartels VP Strategy & Marketing Agenda The Real Time Enterprise The Data Gold Rush Managing The Data Tsunami Analytics and Data Case Studies Where to go
More informationOverview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB
Overview of Databases On MacOS Karl Kuehn Automation Engineer RethinkDB Session Goals Introduce Database concepts Show example players Not Goals: Cover non-macos systems (Oracle) Teach you SQL Answer what
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationOracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>
s Big Data solutions Roger Wullschleger DBTA Workshop on Big Data, Cloud Data Management and NoSQL 10. October 2012, Stade de Suisse, Berne 1 The following is intended to outline
More informationIntroduction to Hadoop. New York Oracle User Group Vikas Sawhney
Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop
More informationTesting 3Vs (Volume, Variety and Velocity) of Big Data
Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used
More informationDNS Big Data Analy@cs
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken! Tweede niveau! Derde niveau! Vierde niveau DNS Big Data Analy@cs Vijfde niveau DNS- OARC Fall 2015 Workshop October 4th 2015 Maarten Wullink,
More informationThe Quest for Extreme Scalability
The Quest for Extreme Scalability In times of a growing audience, very successful internet applications have all been facing the same database issue: while web servers can be multiplied without too many
More informationComparing SQL and NOSQL databases
COSC 6397 Big Data Analytics Data Formats (II) HBase Edgar Gabriel Spring 2015 Comparing SQL and NOSQL databases Types Development History Data Storage Model SQL One type (SQL database) with minor variations
More informationAnalytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world
Analytics March 2015 White paper Why NoSQL? Your database options in the new non-relational world 2 Why NoSQL? Contents 2 New types of apps are generating new types of data 2 A brief history of NoSQL 3
More informationStructured Data Storage
Structured Data Storage Xgen Congress Short Course 2010 Adam Kraut BioTeam Inc. Independent Consulting Shop: Vendor/technology agnostic Staffed by: Scientists forced to learn High Performance IT to conduct
More informationBig Data Analytics. Lucas Rego Drumond
Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline
More informationBig Data Use Case. How Rackspace is using Private Cloud for Big Data. Bryan Thompson. May 8th, 2013
Big Data Use Case How Rackspace is using Private Cloud for Big Data Bryan Thompson May 8th, 2013 Our Big Data Problem Consolidate all monitoring data for reporting and analytical purposes. Every device
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationCloud Scale Distributed Data Storage. Jürmo Mehine
Cloud Scale Distributed Data Storage Jürmo Mehine 2014 Outline Background Relational model Database scaling Keys, values and aggregates The NoSQL landscape Non-relational data models Key-value Document-oriented
More informationReal Time Fraud Detection With Sequence Mining on Big Data Platform. Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA
Real Time Fraud Detection With Sequence Mining on Big Data Platform Pranab Ghosh Big Data Consultant IEEE CNSV meeting, May 6 2014 Santa Clara, CA Open Source Big Data Eco System Query (NOSQL) : Cassandra,
More informationBig Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases
Big Data and Transactional Databases Exploding Data Volume is Creating New Stresses on Traditional Transactional Databases Introduction The world is awash in data and turning that data into actionable
More informationCan the Elephants Handle the NoSQL Onslaught?
Can the Elephants Handle the NoSQL Onslaught? Avrilia Floratou, Nikhil Teletia David J. DeWitt, Jignesh M. Patel, Donghui Zhang University of Wisconsin-Madison Microsoft Jim Gray Systems Lab Presented
More informationIntroduction to Big Data Training
Introduction to Big Data Training The quickest way to be introduce with NOSQL/BIG DATA offerings Learn and experience Big Data Solutions including Hadoop HDFS, Map Reduce, NoSQL DBs: Document Based DB
More informationCloud & Big Data a perfect marriage? Patrick Valduriez
Cloud & Big Data a perfect marriage? Patrick Valduriez Cloud & Big Data: the hype! 2 Cloud & Big Data: the hype! 3 Behind the Hype? Every one who wants to make big money Intel, IBM, Microsoft, Oracle,
More informationPreparing Your Data For Cloud
Preparing Your Data For Cloud Narinder Kumar Inphina Technologies 1 Agenda Relational DBMS's : Pros & Cons Non-Relational DBMS's : Pros & Cons Types of Non-Relational DBMS's Current Market State Applicability
More informationThe evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect
The evolution of database technology (II) Huibert Aalbers Senior Certified Executive IT Architect IT Insight podcast This podcast belongs to the IT Insight series You can subscribe to the podcast through
More informationA Tour of the Zoo the Hadoop Ecosystem Prafulla Wani
A Tour of the Zoo the Hadoop Ecosystem Prafulla Wani Technical Architect - Big Data Syntel Agenda Welcome to the Zoo! Evolution Timeline Traditional BI/DW Architecture Where Hadoop Fits In 2 Welcome to
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationThe Rembrandt Group Strategies for BIG DATA 2015-2016
The Rembrandt Group Strategies for BIG DATA 2015-2016 Big Data Interesting applications are data hungry Increased number & variety of sources Realization that delete is not an option The data grows over
More informationSearch and Real-Time Analytics on Big Data
Search and Real-Time Analytics on Big Data Sewook Wee, Ryan Tabora, Jason Rutherglen Accenture & Think Big Analytics Strata New York October, 2012 Big Data: data becomes your core asset. It realizes its
More informationGoogle Bing Daytona Microsoft Research
Google Bing Daytona Microsoft Research Raise your hand Great, you can help answer questions ;-) Sit with these people during lunch... An increased number and variety of data sources that generate large
More informationPerformance and Scalability Overview
Performance and Scalability Overview This guide provides an overview of some of the performance and scalability capabilities of the Pentaho Business Analytics Platform. Contents Pentaho Scalability and
More informationA Professional Big Data Master s Program to train Computational Specialists
A Professional Big Data Master s Program to train Computational Specialists Anoop Sarkar, Fred Popowich, Alexandra Fedorova! School of Computing Science! Education for Employable Graduates: Critical Questions
More informationBig Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.
Big Data Technology ดร.ช ชาต หฤไชยะศ กด Choochart Haruechaiyasak, Ph.D. Speech and Audio Technology Laboratory (SPT) National Electronics and Computer Technology Center (NECTEC) National Science and Technology
More informationAli Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group
A Bioinformatics Research & Consulting Group Adding Omics Data to Electronic Health Record, A paradigm Shift in Big Data Modeling, Analytics and Storage management for Healthcare and Life Sciences Organizations
More informationCS 564: DATABASE MANAGEMENT SYSTEMS
Fall 2013 CS 564: DATABASE MANAGEMENT SYSTEMS 9/4/13 CS 564: Database Management Systems, Jignesh M. Patel 1 Teaching Staff Instructor: Jignesh Patel, jignesh@cs.wisc.edu Office Hours: Mon, Wed 1:30-2:30
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #13: NoSQL and MapReduce
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #13: NoSQL and MapReduce Announcements HW4 is out You have to use the PGSQL server START EARLY!! We can not help if everyone
More informationROME, 17-10-2013 BIG DATA ANALYTICS
ROME, 17-10-2013 BIG DATA ANALYTICS BIG DATA FOUNDATIONS Big Data is #1 on the 2012 and the 2013 list of most ambiguous terms - Global language monitor 2 BIG DATA FOUNDATIONS Big Data refers to data sets
More informationA Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University
A Distributed Storage Schema for Cloud Computing based Raster GIS Systems Presented by Cao Kang, Ph.D. Geography Department, Clark University Cloud Computing and Distributed Database Management System
More informationBIG DATA What it is and how to use?
BIG DATA What it is and how to use? Lauri Ilison, PhD Data Scientist 21.11.2014 Big Data definition? There is no clear definition for BIG DATA BIG DATA is more of a concept than precise term 1 21.11.14
More informationEuropean Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project
European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project Janet Delve, University of Portsmouth Kuldar Aas, National Archives of Estonia Rainer Schmidt, Austrian Institute
More informationData Services Advisory
Data Services Advisory Modern Datastores An Introduction Created by: Strategy and Transformation Services Modified Date: 8/27/2014 Classification: DRAFT SAFE HARBOR STATEMENT This presentation contains
More informationInternational Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: simmibagga12@gmail.com
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationBIG DATA: A CASE STUDY ON DATA FROM THE BRAZILIAN MINISTRY OF PLANNING, BUDGETING AND MANAGEMENT
BIG DATA: A CASE STUDY ON DATA FROM THE BRAZILIAN MINISTRY OF PLANNING, BUDGETING AND MANAGEMENT Ruben C. Huacarpuma, Daniel da C. Rodrigues, Antonio M. Rubio Serrano, João Paulo C. Lustosa da Costa, Rafael
More informationTransforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
More informationLet the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationBig Data Analytics Platform @ Nokia
Big Data Analytics Platform @ Nokia 1 Selecting the Right Tool for the Right Workload Yekesa Kosuru Nokia Location & Commerce Strata + Hadoop World NY - Oct 25, 2012 Agenda Big Data Analytics Platform
More informationwww.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach
www.objectivity.com Choosing The Right Big Data Tools For The Job A Polyglot Approach Nic Caine NoSQL Matters, April 2013 Overview The Problem Current Big Data Analytics Relationship Analytics Leveraging
More informationNoSQL Systems for Big Data Management
NoSQL Systems for Big Data Management Venkat N Gudivada East Carolina University Greenville, North Carolina USA Venkat Gudivada NoSQL Systems for Big Data Management 1/28 Outline 1 An Overview of NoSQL
More informationData-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
More informationMaking Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY MANNING ANN KELLY. Shelter Island
Making Sense ofnosql A GUIDE FOR MANAGERS AND THE REST OF US DAN MCCREARY ANN KELLY II MANNING Shelter Island contents foreword preface xvii xix acknowledgments xxi about this book xxii Part 1 Introduction
More informationThe NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg. Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua
The NoSQL Ecosystem, Relaxed Consistency, and Snoop Dogg Adam Marcus MIT CSAIL marcua@csail.mit.edu / @marcua About Me Social Computing + Database Systems Easily Distracted: Wrote The NoSQL Ecosystem in
More informationChapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related
Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related Summary Xiangzhe Li Nowadays, there are more and more data everyday about everything. For instance, here are some of the astonishing
More informationWhite Paper: Datameer s User-Focused Big Data Solutions
CTOlabs.com White Paper: Datameer s User-Focused Big Data Solutions May 2012 A White Paper providing context and guidance you can use Inside: Overview of the Big Data Framework Datameer s Approach Consideration
More informationMapReduce with Apache Hadoop Analysing Big Data
MapReduce with Apache Hadoop Analysing Big Data April 2010 Gavin Heavyside gavin.heavyside@journeydynamics.com About Journey Dynamics Founded in 2006 to develop software technology to address the issues
More informationBig Data Dimensional Analysis
Big Data Dimensional Analysis Vijay Gadepally & Jeremy Kepner MIT Lincoln Laboratory, Lexington, MA 02420 {vijayg, jeremy}@ ll.mit.edu Abstract The ability to collect and analyze large amounts of data
More informationSQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS
Enterprise Data Problems in Investment Banks BigData History and Trend Driven by Google CAP Theorem for Distributed Computer System Open Source Building Blocks: Hadoop, Solr, Storm.. 3548 Hypothetical
More informationInternational Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
More informationArchitectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase
Architectural patterns for building real time applications with Apache HBase Andrew Purtell Committer and PMC, Apache HBase Who am I? Distributed systems engineer Principal Architect in the Big Data Platform
More information.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken Tweede niveau Derde niveau Vierde niveau.nl ENTRADA Vijfde niveau CENTR-tech 33 November 2015 Marco Davids, SIDN Labs Wie zijn wij? Mijlpalen
More informationCassandra A Decentralized Structured Storage System
Cassandra A Decentralized Structured Storage System Avinash Lakshman, Prashant Malik LADIS 2009 Anand Iyer CS 294-110, Fall 2015 Historic Context Early & mid 2000: Web applicaoons grow at tremendous rates
More informationBenchmarking and Analysis of NoSQL Technologies
Benchmarking and Analysis of NoSQL Technologies Suman Kashyap 1, Shruti Zamwar 2, Tanvi Bhavsar 3, Snigdha Singh 4 1,2,3,4 Cummins College of Engineering for Women, Karvenagar, Pune 411052 Abstract The
More informationTap into Hadoop and Other No SQL Sources
Tap into Hadoop and Other No SQL Sources Presented by: Trishla Maru What is Big Data really? The Three Vs of Big Data According to Gartner Volume Volume Orders of magnitude bigger than conventional data
More informationINTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE
INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe
More informationBig Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD
Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine
More informationTRAINING PROGRAM ON BIGDATA/HADOOP
Course: Training on Bigdata/Hadoop with Hands-on Course Duration / Dates / Time: 4 Days / 24th - 27th June 2015 / 9:30-17:30 Hrs Venue: Eagle Photonics Pvt Ltd First Floor, Plot No 31, Sector 19C, Vashi,
More informationHow To Improve Performance In A Database
Some issues on Conceptual Modeling and NoSQL/Big Data Tok Wang Ling National University of Singapore 1 Database Models File system - field, record, fixed length record Hierarchical Model (IMS) - fixed
More informationNoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre
NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Why NoSQL? In the last thirty years relational databases have been the default choice for serious data storage. An architect
More informationApplications for Big Data Analytics
Smarter Healthcare Applications for Big Data Analytics Multi-channel sales Finance Log Analysis Homeland Security Traffic Control Telecom Search Quality Manufacturing Trading Analytics Fraud and Risk Retail:
More information