Big Data and Databases

Similar documents
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Integrating Big Data into the Computing Curricula

SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford

Comparing SQL and NOSQL databases

How To Scale Out Of A Nosql Database

Challenges for Data Driven Systems

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Overview of Databases On MacOS. Karl Kuehn Automation Engineer RethinkDB

HBase A Comprehensive Introduction. James Chin, Zikai Wang Monday, March 14, 2011 CS 227 (Topics in Database Management) CIT 367

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

So What s the Big Deal?

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

How To Use Big Data For Telco (For A Telco)

ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

How To Improve Performance In A Database

INTRODUCTION TO CASSANDRA

Cloud Computing at Google. Architecture

Big Data Technology ดร.ช ชาต หฤไชยะศ กด. Choochart Haruechaiyasak, Ph.D.

Not Relational Models For The Management of Large Amount of Astronomical Data. Bruno Martino (IASI/CNR), Memmo Federici (IAPS/INAF)

NoSQL Database Systems and their Security Challenges

Big Data and Analytics: Challenges and Opportunities

MongoDB in the NoSQL and SQL world. Horst Rechner Berlin,

Composite Data Virtualization Composite Data Virtualization And NOSQL Data Stores

How To Handle Big Data With A Data Scientist

Reference Architecture, Requirements, Gaps, Roles

Unified Big Data Processing with Apache Spark. Matei

Big Data and Big Analytics

Can the Elephants Handle the NoSQL Onslaught?

COMP9321 Web Application Engineering

An Approach to Implement Map Reduce with NoSQL Databases

Bringing Big Data Modelling into the Hands of Domain Experts

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Practical Cassandra. Vitalii

Search and Real-Time Analytics on Big Data

Chapter 7. Using Hadoop Cluster and MapReduce

Big Data and Scripting Systems build on top of Hadoop

Big Systems, Big Data

Oracle s Big Data solutions. Roger Wullschleger. <Insert Picture Here>

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

bigdata Managing Scale in Ontological Systems

MapReduce with Apache Hadoop Analysing Big Data

extensible record stores document stores key-value stores Rick Cattel s clustering from Scalable SQL and NoSQL Data Stores SIGMOD Record, 2010

Xiaoming Gao Hui Li Thilina Gunarathne

Lecture Data Warehouse Systems

Lecture 10: HBase! Claudia Hauff (Web Information Systems)!

Cloud Scale Distributed Data Storage. Jürmo Mehine

CISC 432/CMPE 432/CISC 832 Advanced Database Systems

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Open source large scale distributed data management with Google s MapReduce and Bigtable

Choosing The Right Big Data Tools For The Job A Polyglot Approach

NoSQL Databases. Institute of Computer Science Databases and Information Systems (DBIS) DB 2, WS 2014/2015

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

A COMPARATIVE STUDY OF NOSQL DATA STORAGE MODELS FOR BIG DATA

Hadoop IST 734 SS CHUNG

X4-2 Exadata announced (well actually around Jan 1) OEM/Grid control 12c R4 just released

Analytics March 2015 White paper. Why NoSQL? Your database options in the new non-relational world

Big Data Technologies Compared June 2014

Completing the Big Data Ecosystem:

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

Chapter 11 Map-Reduce, Hadoop, HDFS, Hbase, MongoDB, Apache HIVE, and Related

From Spark to Ignition:

How graph databases started the multi-model revolution

Structured Data Storage

Native Connectivity to Big Data Sources in MSTR 10

NoSQL for SQL Professionals William McKnight

Oracle Big Data SQL Technical Update

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Similarity Search in a Very Large Scale Using Hadoop and HBase

A Distributed Storage Schema for Cloud Computing based Raster GIS Systems. Presented by Cao Kang, Ph.D. Geography Department, Clark University

The 3 questions to ask yourself about BIG DATA

Domain driven design, NoSQL and multi-model databases

Big Data Management and Security

Analytics in the Cloud. Peter Sirota, GM Elastic MapReduce

Cassandra A Decentralized, Structured Storage System

Hadoop Ecosystem B Y R A H I M A.

Applications for Big Data Analytics

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Introduction to Big Data! with Apache Spark" UC#BERKELEY#

NextGen Infrastructure for Big DATA Analytics.

International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May ISSN BIG DATA: A New Technology

Oracle Big Data Strategy Simplified Infrastrcuture

A Brief Outline on Bigdata Hadoop

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

HPC ABDS: The Case for an Integrating Apache Big Data Stack

Navigating the Big Data infrastructure layer Helena Schwenk

Big Data With Hadoop

Cloud Application Development (SE808, School of Software, Sun Yat-Sen University) Yabo (Arber) Xu

The Internet of Things and Big Data: Intro

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

Вовченко Алексей, к.т.н., с.н.с. ВМК МГУ ИПИ РАН

Introduction to Apache Cassandra

Cloud Computing and Advanced Relationship Analytics

A programming model in Cloud: MapReduce

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Large-Scale Data Processing

Transcription:

Big Data and Databases Vijay Gadepally (vijayg@ll.mit.edu) Lauren Milechin (lauren.milechin@ll.mit.edu) This work is sponsored, by the Department of the ir Force, under ir Force Contract F8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government.

Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

Big Data Challenge Kids dults Elderly Users (deciders) Rapidly increasing - Data volume - Data velocity - Data variety - Date veracity Things Gap Humans 10 Years go 5 Years go Today In 5 Years Sources (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

Challenge of Data Volume Where do I store my data? How much do I store? 1 TB total pplications & Data 2 TB total Data Scalable Data Center Data Flat file Spreadsheet How do I access it? Database Distributed database How do I index it?

Challenge of Data Velocity 2011 Data Generated Per Minute Facebook: 684,478 pieces of content Twitter: 100,000 tweets YouTube: 48 hours of new video Google: 2,000,000 new queries Internet Population: 2.1 Billion people

Challenge of Data Velocity

Challenge of Data Velocity 2014 Data Generated Per Minute Facebook: 2,460,00 pieces of content Twitter: 277,000 tweets YouTube: 72 hours of new video Google: 4,000,000 new queries Internet Population: 2.4 Billion people

Challenge of Data Velocity 2011 2014 Increase in Data Generated Facebook: 350 MB/min Twitter: 50 MB/min YouTube: 24 48 GB/min

Challenge of Data Velocity 2011 2014 Increase in Data Generated Facebook: 350 MB/min Twitter: 50 MB/min How do I capture my data for processing? YouTube: 24 48 GB/min How do I process the data within the specified time constraints?

Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio

Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I index heterogeneous data formats? Strings may be easily stored in a database Image and document metadata may fit in traditional database Raw images/documents may require file system or alternate database

Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I fuse heterogeneous data formats to provide uniform view? Fusion drives Indexing/schema decisions Technology (databases, storage, etc.) selection Selection of software (visualization, language) tools

Challenge of Data Variety What does the data look like? Tweets Images Text and Documents udio How do I develop algorithms for heterogeneous data formats? Images can use High Performance Computing tools Strings and documents require a new algebra to take advantage of High Performance computing systems Visualization requires merging image with string data

Challenge of Data Veracity Does the data need protection? How do I balance privacy with availability? What level of security is required? How do I make data available only to vetted analysts? How is data kept secure and private while minimizing impact on analysis?

Challenge of Data Veracity Does the data need protection? How confident am I in the integrity of my data? Where did it come from? Who has accessed it? Has anyone modified data stream? Has anyone tampered with the data stream?

Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

General Strategy: System Design Kids dults Elderly Users (deciders) User Interface Things Files Ingest & Ingest & Enrichment Enrichment Ingest Databases Humans nalytics B Gap C E D 10 Years go 5 Years go Today In 5 Years Scheduler Computing Sources (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

General Strategies: Collection Collect, Store, and Process only Useful Data 10 4 Degree Distribution 10 3 Count 10 2 10 1 10 0 10 0 10 1 10 2 10 3 Degree d max

General Strategies: Collection Collect, Store, and Process only Useful Data Intelligently Reduce the mount of Data through Sampling Techniques 10 4 Degree Distribution 10 3 SIGNL Count 10 2 NOISE N-D SPCE 10 1 10 0 10 0 10 1 10 2 10 3 Degree d max Example background model: Power Law Graph

General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack Data Loss / Exfiltration E D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

General Strategy: Privacy-Preserving Technology Kids dults Elderly Humans (deciders) Web Raw Data Ingest & Enrichment Ingest & Enrichment Ingest Databases nalytics B C Data Integrity Data Integrity ttack Data Loss / Exfiltration Insider E Threat D Scheduler Computing Things (providers) Building Security Building Environment Building Usage Commuter Vehicles Work Vehicles Transport Vehicles Student Smartphones Classroom Tablets Fitness Wearables

General Strategy: Privacy-Preserving Technology Use Cryptographic Protocols to Protect the Confidentiality, Integrity, and/or vailability of Data Lots of ongoing research Popular techniques: Fully Homomorphic Encryption Multiparty Computation Computing on Masked Data (CMD) Cryptographic protections for NoSQL ccumulo database Uses order preserving, deterministic and semantically secure encryption 2-4x performance overhead Plaintext! Query! Plaintext! nalytic! Result! Encrypt Decrypt Masked! Query! Masked! nalytic! Result! CMD Big Data Cloud

Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

Database Fundamentals Database Collection of data and supporting data structures Database Management System (DBMS) Software that provides interface between user and database Define new data and schema data Retrieve (Query) data DB administration: set security and permissions BigTable

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails B SE Successful Transaction Failed Transaction

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE or

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions B SE

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction

Database Fundamentals C I D tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed B SE Transaction

Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency

Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency

Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency

Database Fundamentals C I D B SE tomicity- each transaction either fully succeeds or fails Consistency- all nodes see same valid data all the time Isolation- concurrent transactions result in system state obtained from serial transactions Durability- committed transactions remain committed Basically vailable Soft-state services with Eventual-consistency CID BSE BigTable

Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance

Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance

Database Fundamentals CP Theorem Impossible for a distributed system to simultaneously provide: Consistency vailability Partition Tolerance Consistency vailability Partition Tolerance BigTable

Database Fundamentals SQL NoSQL NewSQL DTBSES Cluster Dremel BigTable 1995 2004 2006 2008 2010 2012 2014 2016 PRLLEL PROCESSING MapReduce Hadoop Pregel D4M Giraph Slide Source: S. Sawyer, B. D. O'Gwynn,. Tran, T. Yu. Understanding Query Performance in ccumulo. HPEC 2013.

Database Fundamentals Consistency Relational DB Systems NewSQL DB Systems NoSQL DB Systems Performance

Relational Databases What it Is Database that stores information about data and how it is related Highly structured normalized table based database Predefined schema/organization of data Vertically scalable with good quality hardware Use SQL as query interface Typically provide full consistency Examples

Relational Databases Who Uses It Dealing with transactional data Problem sizes are moderate Need for CID guarantees When to Use It How to Use It JDBC (Java DataBase Connector) SQL command line

Relational Databases Tweet Table Tweet ID User ID Location ID Tweet Text 096360448 67555 wwz4p7jd Omg earthquake 544019456 67554 wwh1hss5 We're gonna have an earthquake 600791040 67556 wwwygbvq Omg it's a earthquake User ID Username Friends Count 67554 _zariaaa_ 541 67555 gnvrly_ron 693 67556 yolvndv 424 UserTable Location ID Latitude Longitude wwh1hss5 33.951186-118.328370 wwwygbvq 37.754312-122.164388 wwz4p7jd 38.337154-122.670192 Location Table

NoSQL Databases What it Is Database based on documents, key-value pairs, graphs, or widecolumn stores Dynamic schema Horizontal scalability Typically provide eventual consistency Examples

NoSQL Databases Who Uses It When to Use It Large unstructured datasets Strong need for high performance Only require BSE guarantees Python/JV bindings Lincoln Laboratory D4M Command Line How to Use It

NoSQL Databases Edge Table Degree Table Degree FriendCount 424 FriendCount 541 FriendCount 693 Latitude 33.951186 Latitude 37.754312 Latitude 38.337154 Location wwh1hss5 Location wwwygbvq Location wwz4p7jd UserID 67556 UserName _zariaaa_ UserName gnvrly_ron UserName yolvndv Word Omg Word a Word an Word earthquake FriendCount 424 1 FriendCount 541 1 FriendCount 693 1 Latitude 33.951186 1 096360448 544019456 600791040 Word an 1 Word earthquake 3 096360448 544019456 600791040 FriendCount 424 FriendCount 541 Word an Word earthquake Transpose Table Text Table Text 096360448 Omg earthquake 544019456 We're gonna have an earthquake 600791040 Omg it's a earthquake

NoSQL Example - ccumulo

ccumulo Design Drivers 1 2 Cell-Level Security Express common security requirements in the infrastructure, not just in the application Data-centric approach encourages secure sharing Scalability Near linear performance improvements at thousands of nodes Durable and reliable under increased failures that come with scale 3 Diverse, Interactive nalytics Sorted key/value core performs well in a diverse set of domains Information retrieval, statistics, graph analysis, geo indexing, and more 4 Flexible, daptive Schema Start with universal structures and indexing Refine the schema over time Source: Sqrrl Data Inc

ccumulo Features Visibility Labels Iterators utomatic table splitting Support for pache Thrift proxy Visibility Iterator Table-split Thrift Schema D4M volume velocity variety veracity

NewSQL Databases What it Is Database systems that emulate performance of NoSQL along with CID guarantees of Relational Databases Usually scaled up version of a relational database Often uses array data model Other data models include graph-based data structures and distributed relational tables May make use of in-memory processing or specialized hardware Examples

NewSQL Databases Who Uses It When to Use It Large multidimensional datasets Data that doesn t fit in traditional databases Have the volume for NoSQL, but need for CID guarantees How to Use It Each have custom PI Ex: SciDB uses JDBC, SHIM, D4M, R-SciDB binding

NewSQL Example: SciDB

SciDB Design Drivers SciDB R, Python, Matlab, Julia, Massive Parallel Processing Database rray data model Complex analytics Commodity clusters or cloud

SciDB Example Schema Highly customizable to application Each cell is a strongly-typed structure of attributes: <int>, or <double, string, float>, or Nullable attributes, empty cells, sparse, or dense stock!! MSFT! MSFVX! MT!! price: 15.76! volume: 200! price: 234.2! volume: 10! price: 17.50! volume: null! price: 17.40! volume: 100! price: 0.02! volume: null! 12342778213! 12342778214! 12342778215!! time!

SciDB Features Massive Parallel Database rray Data Model nalytic language support In-database analytics MPP DB rray Languages nalytics volume velocity variety veracity

Quick Reference RDBMS vs. NoSQL vs. NewSQL Examples Schema rchitecture Guarantees ccess Relational Databases MySQL, PostgreSQL, Oracle Typed columns with relational keys Single-node or sharded CID transactions SQL, indexing, joins, and query planning NoSQL HBase, Cassandra, ccumulo Schema-less Distributed, scalable Eventually consistent Low-level PI (scans and filtering) NewSQL SciDB, VoltDB, MemSQL Strongly-typed structure of attributes Distributed, scalable CID transactions (most) Custom PI, JDBC, Bindings to popular languages Slide Source: S. Sawyer, B. D. O'Gwynn,. Tran, T. Yu. Understanding Query Performance in ccumulo. HPEC 2013.

Outline Challenge Overview General Strategies Database Fundamentals and Technologies Up and Coming Technologies

On The Horizon New Technologies and Techniques New database and processing technology such as: pache Spark: In memory distributed processing TileDB: Database for scientific big data S-Store: Database tuned for streaming data New cross database and storage engine standards, PI, and practices: BigDawg: n PI to simplify big data analytics currently being designed GraphBLS: n effort to standardize graph algorithms and databases dvances in privacy preserving technology: SPED: Signal processing in the encrypted domain Greater efficiency of protocols such as Functional Encryption and Multiparty Computation Tools and technologies will continue to evolve important to keep students abreast of new developments

Conclusions Lots of stuff going on! Very important to understand details of your dataset, end analytic, and other requirements Topics covered: Challenge overview (What is the problem?) Some general strategies Databases Upcoming technologies

Leading Science and Engineering Research University 80 Nobel laureates, 50 National Medal of Science recipients Thousands of companies (11 th largest world economy) 1000 faculty, 10000 employees, 10000 students $1.4B in annual external research funding Lincoln Laboratory $800M Other MIT $600M

bout MIT