Big Data and Future Networks: A Perspective from the United States



Similar documents
SECURITY MEETS BIG DATA. Achieve Effectiveness And Efficiency. Copyright 2012 EMC Corporation. All rights reserved.

NITRD and Big Data. George O. Strawn NITRD

Big Data. George O. Strawn NITRD

Big Data R&D Initiative

Government Perspectives on the Future of Advanced Networking Technologies

Government Technology Trends to Watch in 2014: Big Data

Big Data and Data Science: Behind the Buzz Words

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

VIEWPOINT. High Performance Analytics. Industry Context and Trends

The 4 Pillars of Technosoft s Big Data Practice

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

TUT NoSQL Seminar (Oracle) Big Data

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Survey of Big Data Architecture and Framework from the Industry

NoSQL for SQL Professionals William McKnight

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

EMC Greenplum Driving the Future of Data Warehousing and Analytics. Tools and Technologies for Big Data

Sunnie Chung. Cleveland State University

Chapter 1. Contrasting traditional and visual analytics approaches

Data Centric Computing Revisited

Next presentation starting soon Business Analytics using Big Data to gain competitive advantage

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

Big Data a threat or a chance?

Big Data Buzzwords From A to Z. By Rick Whiting, CRN 4:00 PM ET Wed. Nov. 28, 2012

How To Understand The Business Case For Big Data

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

Ali Eghlima Ph.D Director of Bioinformatics. A Bioinformatics Research & Consulting Group

White Paper. Version 1.2 May 2015 RAID Incorporated

IJITE Vol.03 Issue - 03, (March 2015) ISSN: Impact Factor 3.570

Large-Scale Data Processing

Big Data Are You Ready? Jorge Plascencia Solution Architect Manager

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Mind Commerce. Commerce Publishing v3122/ Publisher Sample

Forecast of Big Data Trends. Assoc. Prof. Dr. Thanachart Numnonda Executive Director IMC Institute 3 September 2014

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

BIG DATA-AS-A-SERVICE

Big Data, Why All the Buzz? (Abridged) Anita Luthra, February 20, 2014

The Future of Data Management

Data Refinery with Big Data Aspects

Big Data Across the Federal Government

Hadoop. Sunday, November 25, 12

Chapter 6 8/12/2015. Foundations of Business Intelligence: Databases and Information Management. Problem:

Big Data and Telecom Analytics Market: Business Case, Market Analysis & Forecasts

Intro to Big Data and Business Intelligence

Big Data Explained. An introduction to Big Data Science.

Build Your Competitive Edge in Big Data with Cisco. Rick Speyer Senior Global Marketing Manager Big Data Cisco Systems 6/25/2015

Big Data and Healthcare Payers WHITE PAPER

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Oracle Big Data Building A Big Data Management System

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

How To Get More Data From Your Computer

Collaborative Big Data Analytics. Copyright 2012 EMC Corporation. All rights reserved.

Sunnie Chung. Cleveland State University

EMC BACKUP MEETS BIG DATA

"BIG DATA A PROLIFIC USE OF INFORMATION"

Customized Report- Big Data

White Paper: Datameer s User-Focused Big Data Solutions

Chapter 6. Foundations of Business Intelligence: Databases and Information Management

Exploiting Data at Rest and Data in Motion with a Big Data Platform

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA

We are Big Data A Sonian Whitepaper

Big Systems, Big Data

Big Data and Trusted Information

Majed Al-Ghandour, PhD, PE, CPM Division of Planning and Programming NCDOT 2016 NCAMPO Conference- Greensboro, NC May 12, 2016

Introduction to the Mathematics of Big Data. Philippe B. Laval

USING BIG DATA FOR INTELLIGENT BUSINESSES

Apache Hadoop in the Enterprise. Dr. Amr Awadallah,

Transforming the Telecoms Business using Big Data and Analytics

SDN Security Challenges. Anita Nikolich National Science Foundation Program Director, Advanced Cyberinfrastructure July 2015

Data Warehouse design

How Big Is Big Data Adoption? Survey Results. Survey Results Big Data Company Strategy... 6

National Big Data R&D Initiative

TABLE OF CONTENTS 1 Chapter 1: Introduction 2 Chapter 2: Big Data Technology & Business Case 3 Chapter 3: Key Investment Sectors for Big Data

Introduction to Engineering Using Robotics Experiments Lecture 17 Big Data

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

This Symposium brought to you by

CIS492 Special Topics: Cloud Computing د. منذر الطزاونة

1 st Symposium on Colossal Data and Networking (CDAN-2016) March 18-19, 2016 Medicaps Group of Institutions, Indore, India

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

Introducing Big Data. Abstract. with Small Changes. Agenda. Big Data in the News. Bits and Bytes

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

Big Analytics: A Next Generation Roadmap

Alexander Nikov. 5. Database Systems and Managing Data Resources. Learning Objectives. RR Donnelley Tries to Master Its Data

W H I T E P A P E R. Building your Big Data analytics strategy: Block-by-Block! Abstract

Computing at a Cross-Roads: Big Data, Big Compute, and the Long Tail. William Gropp

Reference Architecture, Requirements, Gaps, Roles

Turning Big Data into Big Decisions Delivering on the High Demand for Data

Big Data. Sonovate QuickView Series #3

Big Data. Lyle Ungar, University of Pennsylvania

Secure Cloud Computing Concepts Supporting Big Data in Healthcare. Ryan D. Pehrson Director, Solutions & Architecture Integrated Data Storage, LLC

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Demystifying Big Data Government Agencies & The Big Data Phenomenon

Transcription:

Big Data and Future Networks: A Perspective from the United States Hisashi Kobayashi ( 小 林 久 志 ) Princeton University and National Institute for Information and Communications Technology

Acknowledgments Prof. Tadao Saito, Toyota Info Technology Center Dr. Nozumu Nishinaga, Mr. Masahiro Kiyokawa, and Mr. Hiroaki Yano, NICT Prof. Mung Chiang, Princeton University Dr. Evangelos Eleftheriou, IBM Zurich Research Lab Mr. Kaiser Fung, Author Numbers Rule Your World Dr. Kazuo Iwano, Mitsubishi Corporation Prof. Brian L. Mark, George Mason University Prof. Dipanker Raychaudhuri, Rutgers University Prof. Phuoc Tran-Gia, University of Würzburg Prof. Howard Wactlar, CMU and NSF CISE Directorate Prof. Philip Yu, University of Illinois at Chicago 2 Big Data and Future Network Design Hisashi Kobayashi

Outline How Much Information? How Big is Data? 4 President Obama s Open Government Initiative 12 President Obama s Big Data Initiative 16 Big Data in Science and Technology Research 17 - NITRD Program, NSF, DARPA, DOE Big Data in Enterprises 27 Call for Data Science and Data Scientists 36 Big Data and Networks 43 References 51 3 Big Data and Future Network Design Hisashi Kobayashi

HOW MUCH INFORMATION? HOW BIG IS DATA? 4 Big Data and Future Network Design Hisashi Kobayashi

Source: The World of Data (by IBM): http://adamov.net.ru/images/share/the-world-of-large-scale-data-processing.jpg 5 Big Data and Future Network Design Hisashi Kobayashi

How Much Data was Out There? [Kobayashi et al. 2005] Online: Disk Drives File Systems 300 Petabytes Petabyte [1,000,000,000,000,000 bytes OR 10 15 bytes] Exabyte [1,000,000,000,000,000,000 bytes OR 10 18 bytes] Offline: Magnetic Tape CDs 8 Exabytes cf. 2003 Report by a U.C. Berkeley research group. Analog Data: Paper Film Videotape 200 Exabytes Source: http://www.sims.berkeley.edu/research/projects/how-much-info-2003 / 6 Big Data and Future Network Design Hisashi Kobayashi

Some Big Numbers 0.43 x 10 18 seconds: The Age of the Universe (13.77 billion years). 5 Exabytes: All words ever spoken by human beings (in text) Roy Williams (Caltech, 1993) 21 Exabytes/month: Global Internet traffic in 2007 Padmasree Warrior (CISCO, March 2010) 160 Exabytes: Digital information created, captures, and replicated world wide in 2007 (International Data Corporation, 2007) 42 Zettabytes: All words ever spoken by human beings (if digitized in 6kHz 16 bit audio) Mark Lieberman (U. Penn, 2003) kilo 10 3 Mega 10 6 Giga 10 9 Tera 10 12 Peta 10 15 Exa 10 18 Zetta 10 21 Yotta 10 24 7 Big Data and Future Network Design Hisashi Kobayashi

Source: Asigra Info Graphic: http://thumbnails.visually.netdna-cdn.com/big-data-infographic_504f4d2f5bd2f.jpg 8 Big Data and Future Network Design Hisashi Kobayashi

Source: - The Retailer's Guide: http://venturebeat.files.wordpress.com/2012/11/retailersbigdata_final.png 9 Big Data and Future Network Design Hisashi Kobayashi

10 Big Data and Future Network Design Hisashi Kobayashi Source: http://www.weforum.org/reports/personaldata-emergence-new-asset-class January 2011, Davos Switzerland

Every day, we create 2.5 quintillion (10 18 ) bytes (i.e., 2.5 Exabytes) of data so much that 90% of the data in the world today has been created in the last two years alone. [IBM] Raw data has little value by itself. We must process data and extract information in a usable form. - Big Data tools, e.g., Apache Hadoop, MapReduce - Data Science, (data mining, machine learning) - Need for advancing statistical analysis techniques that are scalable. We then must put the information into a valuable action, e.g., Amazon.com, a better government 11 Big Data and Future Network Design Hisashi Kobayashi

Open Government Initiative My administration is committed to creating an unprecedented level of openness in Government. We will work together to ensure the public trust and establish a system of transparency, public participation, and collaboration. Openness will strengthen our democracy and promote efficiency and effectiveness in Government. ---- President BARACK OBAMA, 01/21/09 12 Big Data and Future Network Design Hisashi Kobayashi

Government should be transparent - To promote accountability and provides information to citizens Government should be participatory - Knowledge is widely dispersed in society, and public officials benefit from having access to that knowledge. Government should be collaborative - We should use innovative tools, methods and systems to cooperate with nonprofit organizations, businesses, and individuals in the private sector. 13 Big Data and Future Network Design Hisashi Kobayashi

Open Government Directive 1. Publish Government Information Online 2. Improve the Quality of Government Information 3. Create and Institutionalize a Culture of Open Government 4. Create an Enabling Policy Framework for Open Government -- Peter R. Orszag, Director, Office of Management and Budget, 12/8/09 http://www.whitehouse.gov/sites/default/files/omb/assets/mem oranda_2010/m10-06.pdf 14 Big Data and Future Network Design Hisashi Kobayashi

Big Data and Future Network Design Hisashi Kobayashi Source: Howard Wactlar, NSF CISE Directorate at NIST Big Data Meeting, June 2012 15

President Obama s Big Data Initiative To advance state-of-the-art technologies to collect, store, preserve, manage, analyze and share Big Data. To accelerate the pace of discovery in science and engineering, strengthen the national security, and transform teaching and learning. To expand the work force needed to develop and use Big Data technologies. More than $200 millions in new commitments through six Federal departments and agencies. - Office of Science and Technology Policy (OSTP) announced on March 29, 2012 16 Big Data and Future Network Design Hisashi Kobayashi

BIG DATA IN SCIENCE AND TECHNOLOGY RESEARCH 17 Big Data and Future Network Design Hisashi Kobayashi

NITRD (Networking and Information Technology Research and Development) Program Provides a framework in which many Federal agencies coordinate their R&D efforts on networking and IT. Operates under the aegis of the NITRD Subcommittee of the National Science and Technology Council (NSTC) s Committee on Technology. The National Coordination Office (NCO) supports the NITRD Program by providing technical expertise, planning and coordination and by serving as the Program s central point of contact. 18 Big Data and Future Network Design Hisashi Kobayashi

19 Big Data and Future Network Design Hisashi Kobayashi

The NITRD Program s focus: Big Data (BD) Cyber Physical Systems (CPS) Cyber Security and Information Assurance (CSIA) Health Information Technology R & D (Health IT R&D) Human Computer Interaction and Information Management (HCI&IM) Etc. 20 Big Data and Future Network Design Hisashi Kobayashi

21 Big Data and Future Network Design Hisashi Kobayashi

22 Big Data and Future Network Design Hisashi Kobayashi

Source: Howard Wactlar, NSF CISE Directorate at NIST Big Data Meeting, June 2012 23 Big Data and Future Network Design Hisashi Kobayashi

Source: Howard Wactlar, NSF CISE Directorate at NIST Big Data Meeting, June 2012 24 Big Data and Future Network Design Hisashi Kobayashi

XDATA Program Invest $25 million/year Develop computational techniques and software tools, for both semi-structured (e.g., tabular, relational, categorical, meta-data) and unstructured (e.g., text documents, message traffic) data. - Scalable algorithms for processing imperfect data in distributed data stores; - Effective human-computer interaction tools for rapidly customizable visual reasoning 25 Big Data and Future Network Design Hisashi Kobayashi

DOE s Scalable Data Management Analysis and Visualization (SDAV) Institute: ($25 million over 5 years) Project Leader: Dr. Arie Shoshani Lawrence Berkeley National Laboratory 26 Big Data and Future Network Design Hisashi Kobayashi

BIG DATA IN ENTERPRISES 27 Big Data and Future Network Design Hisashi Kobayashi

28 Big Data and Future Network Design Hisashi Kobayashi

The Big Data market will exceed $50B worldwide by 2017. http://sourcedigit.com/700-big-data-market-size-forecasts-2012-17/ 29 Big Data and Future Network Design Hisashi Kobayashi

The Big Data Market. IDC Japan s Forecast 2011 年 142.5 億 円 2012 年 197 億 円 2016 年 765 億 円 現 在 のBigData 市 場 はIT 市 場 全 体 の 13 兆 円 の 0.1% 強 程 度 30 Big Data and Future Network Design Hisashi Kobayashi

Another Forecast is much Bigger (by an order of magnitude) Source: http://www.microsoft.com/ja-jp/sqlserver/2012/big-data/default.aspx 31 Big Data and Future Network Design Hisashi Kobayashi

Big Data: The Management Revolution Success story of Amazon.com 30-40% annual growth in 2008-2012 [HBR] Data Analytics (DA) will replace the HiPPO. HiPPO= Highest Paid Person s Opinion [HBR] Data analysts (or data scientists) are in short supply. [HBR]: Harvard Business Review, October 2012: http://hbr.org/archive-toc/br1210 Diamond ハーバード ビジネス レビュー ビッグデータ 競 争 元 年 February 2013 32 Big Data and Future Network Design Hisashi Kobayashi

Big Data in Enterprises cont d Big Data exceeds the processing capacity of conventional relational database systems. Big Data primarily addresses the database (DB)/data warehousing (DWH) aspect of data analysis. Apache Hadoop is the first technology for Big Data. -- Distributed data storage -- Analysis algorithms for parallel data 33 Big Data and Future Network Design Hisashi Kobayashi

A distributed computational framework that can process a wide range of datasets. High-performance parallel data processing using MapReduce. Reliable data storage using the Hadoop Distributed File System (HDFS). - Query language is NoSQL ( Not only SQL ) Typical users seem obsessed with quantity, not quality, of data. More thought should be given how to collect and select data [Kaiser Fung]. 34 Big Data and Future Network Design Hisashi Kobayashi

1. Volume: How to handle 3 Vs [IBM] - Massively parallel processing (e.g., Greenplum data computing) - Distributed computing platform (e.g., Apache Hadoop). 2. Velocity: - Processing of streaming data to keep storage requirement practical. (e.g., Large Hadron Collider at CERN) - Instantaneous response in some applications (e.g., financial trading) 3. Variety: - Need to deal with diverse data types and sources (e.g., text from SNS, data from sensors, image data, GPS data from mobile phones, etc.) [IBM] http://www-01.ibm.com/software/data/bigdata 35 Big Data and Future Network Design Hisashi Kobayashi

Big Data Platform Data Warehousing (DWH): Store large volumes information from multiple sources. Hadoop-based Analytics: Reduce the cost of analyzing massive data. Unstructured Database (as well as RDB) and NoSQL Stream Computing: Continuously analyze data to take action in real-time. Text Analytics (or Text Mining): Analyze textual content of unstructured information, using information retrieval, data mining machine learning, statistics and computational linguistics. Data Visualization Tools (or Infographics): Real-time processing and dashboard presentation. e.g. Tableau [http://www.tableausoftware.com/], Spotfire [http://spotfire.tibco.jp/], etc. 36 Big Data and Future Network Design Hisashi Kobayashi

Some Vendors of Big Data Tools Greenplum: http://en.wikipedia.org/wiki/greenplum - founded in 2003 - acquired by EMC in 2010 Netezza: http://en.wikipedia.org/wiki/netezza - founded in 2000. - acquired by IBM in 2011 for $1.7B. SPSS: http://ja.wikipedia.org/wiki/spss -founded in 1988 -acquired by IBM in 2009 for $1.2 B) Vertica (acquired by HP) Oracle, SAP and Microsoft also provide Big Data Tools 日 本 に 関 しては; 日 経 コンピュータ 2013 年 1 月 10 日 号 37 Big Data and Future Network Design Hisashi Kobayashi

Call for Better DATA SCIENCE And More DATA SCIENTISTS 38 Big Data and Future Network Design Hisashi Kobayashi

Try to gain insights from data, instead of presenting all collected data. Study and extend classical statistical techniques : - Exploratory Data Analysis (EDA). - Time Series Analysis - Hidden Markov Models (HMMs) - Bayesian Statistics and MCMC - etc. Scalable Algorithms and Analytics e.g., PageRank Algorithm (an efficient algorithm to compute eigenvectors of a Markov transition matrix) 39 Big Data and Future Network Design Hisashi Kobayashi

40 Big Data and Future Network Design Hisashi Kobayashi

Important Subfields of Data Mining Data stream mining [Aggrawal] - Computer network traffic - Web searches - Sensor data Graph mining [Aggrawal] - Web data - Social network analysis - Bio-informatics C. C. Aggrawal (Ed.) Data Streams: Models and Algorithms, Kluwer Academic Publisher C. C. Aggarwal and H. Wang (Eds.), Managing and Mining Graph Data, Springer 41 Big Data and Future Network Design Hisashi Kobayashi

42 Big Data and Future Network Design Hisashi Kobayashi

深 刻 な 日 本 のデータ サイエンテイスト 不 足 データ アナリシスに 関 する 知 識 ( 統 計 機 械 学 習 など)を 持 つ 新 卒 者 の 数 (2008 年 ): 米 国 24,730, 中 国 17,410,インド 13,270, 日 本 3,400. ( 中 国 では 年 +10.4% 増 加 日 本 では -5.3%) Source: http://blogs.itmedia.co.jp/business20/2012/10/post-2438.html SAS(Statistical Analysis System) 認 定 プロフェッショナルの 数 米 国 10,544, インド 5,907, 韓 国 1,381, 英 国 1,242 日 本 800 GDP 当 りのSAS 認 定 プロフェッショナルの 数 ( 米 国 を100) 米 国 100, インド 458, 韓 国 177, 英 国 73, 日 本 20. Source: Diamond ハーバード ビジネス レビュー Feb. 2013 43 Big Data and Future Network Design Hisashi Kobayashi

[McKinsey] Big data: The next frontier for innovation, competition and productivity, McKinsey & Co., May 2011 44 Big Data and Future Network Design Hisashi Kobayashi

BIG DATA and NETWORKS 45 Big Data and Future Network Design Hisashi Kobayashi

Source: - What happens in an Internet Minute? (by Intel): http://www.intel.com/content/dam/www/public/us/en/images/illustrations/embedded-infographic-600-logo.jpg 46 Big Data and Future Network Design Hisashi Kobayashi

Big Data vs. Networks Networks to cope with Big Data. - Sufficient storage, bandwidth and processing Big Data to help design and manage Networks. - Better performance, reliability and security Big Data and Networks for a better world. - Transparent government, Law enforcement - Risk management - Innovative applications for value creation e.g., User behavior tracking and marketing (Privacy and security are critical). 47 Big Data and Future Network Design Hisashi Kobayashi

Cloud Computing & Networking : A Platform for Big Data Cloud computing offers an on-demand access to a shared pool of configurable resources. Big Data requires a novel approach to meet the storage and processing requirements. The Cloud can make big data (analytics) accessible to those who couldn t use otherwise. Disk storage performance can be a problem when it is shared by various users. 48 Big Data and Future Network Design Hisashi Kobayashi

OpenFlow and FLARE will help Data Centers handle Big Data Help control of connectivity of Data Centers for big data analytics via virtualization Especially useful to a Multi-tenant Data Center environment. Facilitate load balancing among Data Centers. FLARE: Deeply Programmable Network (DPN) Architecture by Aki Nakao 49 Big Data and Future Network Design Hisashi Kobayashi

ID/Locator Separation and Context-oriented Service for Big Data Where contexts means data attributes, e.g., identity, group association, time, location, etc. Data Centric Networking (also called Named Data Networking or NDN ) appears a proper approach to Big Data. But its performance implications are unclear. GUID (Globally Unique ID) of MobilityFirst also facilitates context-oriented service. 50 Big Data and Future Network Design Hisashi Kobayashi

Optical Technologies: Fast Transport and Processing of Big Data Integrated Optical Path and Optical Packets of the AKARI Architecture. Silicon Nanophotonics Technology - Integrates optical and electrical circuits on a single silicon chip, by using 90nm CMOS fabrication line. cf. IBM Press release, Dec 10, 2012 http://www-03.ibm.com/press/us/en/pressrelease/39641.wss 51 Big Data and Future Network Design Hisashi Kobayashi

Additional Issues that Future Network Architectures should Address: Interface to Database - Increasingly unstructured and heterogeneous - Requires fast processing and transportation The Database community and the Networking community should interact. - No FIA project addresses database issues Service Layer for Big Data applications 52 Big Data and Future Network Design Hisashi Kobayashi

References [Kobayashi et al 2005] H. Kobayashi, Francois Dolivo, E. Eleftheriou, 35 Years of Progress in Digital Magnetic Recording, 2005 Eduard Rhein Technology Award Lecture. [IBM] http://www-01.ibm.com/software/data/bigdata [UCB] http://www.sims.berkeley.edu/research/projects/howmuch-info-2003 [McKinsey] Big data: The next frontier for innovation, competition and productivity, McKinsey & Co., May 2011, http://www.mckinsey.com/insights/mgi/research/technology_ and_innovation/big_data_the_next_frontier_for_innovation [IBM] IBM Lights Up Silicon Chips to Tackle Big Data, Press release Dec 12, 2012, http://www-03.ibm.com/press/us/en/pressrelease/39641.wss 53 Big Data and Future Network Design Hisashi Kobayashi

Appendix Big Data across the Federal Government (4) NITRD s Focus (2) NSF-NIH Initiative (2) MiKinsey Global Institute s Report (2) 2012 Summer Olympic Games Big Numbers Data Never Sleeps (Fortune Magazine, 7/ 2012) Twitter 2012 Big Data for Healthcare 54 Big Data and Future Network Design Hisashi Kobayashi

Big Data Across the Federal Government Department of Defense (DOD) March 29, 2012 Defense Advanced Research Projects Agency (DARPA) - Anomaly Detection at Multiple Scales (ADAMS) program - Cyber-Insider Threat (CINDER) program Department of Homeland Security (DHS) - Center of Excellence on Visualization and Data Analytics Department of Energy (DOE) - Advanced Scientific Computing Research (ASCR) - High Performance Storage System (HPSS) 55 Big Data and Future Network Design Hisashi Kobayashi

Department of Veterans Administration (VA) - Consortium for Healthcare Informatics Research (CHIR) - Corporate Data Warehouse (CDW) - Genomic Information System for Integrated Science (GenISIS) Department of Health and Human Services (HHS) Center for Disease Control & Prevention (CDC) - BioSense 2.0 program Center for Medicare & Medicaid Services (CMS) - A date warehouse based on Hadoop is being developed. - Use of XML database technologies is being evaluated. Food & Drug Administration (FDA) - Virtual Laboratory Environment (VLE) National Archives & Record Administration (NARA) - Cyberinfrastructure for a Billion Electronic Records (CI-BER) 56 Big Data and Future Network Design Hisashi Kobayashi

National Aeronautic & Space Administration (NASA) - Earth Science Data and Information System (ESDIS) - Global Earth Observation System of Systems (GEOSS) - Planetary Data System (PDS) - Multimission Archive at Space Telescope Science Institute (MAST) National Endowment for the Humanities (NEH) - Digging into Data Challenge National Institute of Health (NIH) - The Cancer Imaging Archives (TCIA) - Neuroimaging Informatics Tools and Resource Clearinghouse (NITRC) - Neuroscience Information Framework (NIF) - Structural Genomics Initiative - WorldWide Protein Data Bank (wwpdb) - Biomedical Informatics Research Network (BIRN) - Collaborative Research in Computational Neuroscience (CRCNS) 57 Big Data and Future Network Design Hisashi Kobayashi

National Science Foundation (NSF) - Core Techniques and Technologies for Advancing Big Data Science & Engineering - Cyberinfrastructure Framework for 21 st Century Science & Engineering (CIF21) - Data and Software Preservation for Open Science (DASPOS ) - Computational and Data-enabled Science and Engineering (CDS&E) in Mathematical and Statistical Science (CDS&E-MSS) - Open Science Grid (OSG) - Theoretical and Computational Astrophysics Networks (TCAN) National Security Agency (NSA) - Vigilant Net: A Competition to Foster and Test Cyber Defense Situational Awareness at Scale - NSA/CSS Commercial Solutions Center (NCSC) United States Geological Survey (USGS) - John Wesley Powell Center for Analysis and Synthesis 58 Big Data and Future Network Design Hisashi Kobayashi

The NITRD Program s focus: Big Data (BD) Cyber Physical Systems (CPS) Cyber Security and Information Assurance (CSIA) Health Information Technology R & D (Health IT R&D) Human Computer Interaction and Information Management (HCI&IM) 59 Big Data and Future Network Design Hisashi Kobayashi

The NITRD Program s focus cont d: High Confidence Software and Systems (HCSS) High End Computing (HEC) Large Scale Networking (LSN) Software Design and Productivity (SDP) Social, Economic, and Welfare Implication of IT and IT Workforce Development (SEW) Wireless Spectrum Research and Development (WSRD 60 Big Data and Future Network Design Hisashi Kobayashi

NSF-NIH Big Data Initiative Eight (8) fundamental research projects o Big Data were announced on October 3, 2012 Typically, one to three investigators per project. Total of $15 millions, so about $500k/project 1. Eliminating the Data Ingestion Bottleneck in Big-Data Application, M. Farach-Colton (Rutgers) and M. Bendor (Stony Brook) 2. DataBridge- A Sociometric System for Long-Tail Science Data Collection, A. Rajaesekar (Univ. of N.C.), G. King (Harvard) and Justin Zhan (NC Agricultura & Tech State Univ.) 3. A Formal Foundation for Big Data Management, D. Suciu (Univ. of Washington). 61 Big Data and Future Network Design Hisashi Kobayashi

4. Analytical Approaches to Massive Data Computation with Applications to Genomics, E. Upfal (Brown) 5. Distribution-based Machine Learning for High-dimensional Datasets, A. Singh (CMU) 6. GenomesGlore- Core Techniques, Libraries, and Domain Specific Languages for High-Throughput DNA Sequencing, S. Aluru (Iowa State) O. Olukotun (Stanford) and W. Feng (Virginia Tech.) 7. Big Tensor Mining: Theory, Scalable Algorithms and Applications, C. Faloutos (CMU) N. Sidiropoulos (U. of Minnesota) 8. Discovery and Social Analytics for Large-Scale Scientific Literature, P. Kantor, T. Joachims (Cornell) and D. Biei (Princeton) 62 Big Data and Future Network Design Hisashi Kobayashi

63 Big Data and Future Network Design Hisashi Kobayashi

64 Big Data and Future Network Design Hisashi Kobayashi

Source: - Big Data at London Summer Games 2012: http://www.cloudtweaks.com/web/content//big-data-infographic1.jpg 65 Big Data and Future Network Design Hisashi Kobayashi

Source: - How much data is generated Every Minute: http://blogs-images.forbes.com/davefeinleib/files/2012/07/big-data-infographic.jpg 66 Big Data and Future Network Design Hisashi Kobayashi

Source: Facts about Twitter: http://blog.sironaconsulting.com/.a/6a00d8341c761a53ef016767bafa2c970b-pi 67 Big Data and Future Network Design Hisashi Kobayashi

Source: Info Graphic Healthcare IT: http://www.healthcareitconnect.com/wp-content/uploads/2012/10/infographic-big-data.jpg 68 Big Data and Future Network Design Hisashi Kobayashi