Data Intensive Science Education

Size: px
Start display at page:

Download "Data Intensive Science Education"

Transcription

1 Data Intensive Science Education Thomas J. Hacker Associate Professor, Computer & Information Technology Purdue University, West Lafayette, Indiana USA Gjesteprofessor (Visiting Professor), Department of Electrical Engineering and Computer Science University of Stavanger, Norway EU-China-Nord America Workshop on HPC Cloud and Big Data June 20, 2013 University of Stavanger, Norway

2 Introduction and Motivation Theory and Experiment (1800s) Computational Simulation Third leg of science Past 50 years or so (1950s) Data (21st century science) Fourth leg of science Researchers are flooded with data Tremendous quantity and multiple scales of data Difficult to collect, store, and manage How can we distill meaningful knowledge from data?

3 Data is the 4th Paradigm Producing an avalanche of high resolution digital data All (or most) of the data needs to be accessible over a long period of time Much of the data is not reproducible Example NEES project Structure or sample destroyed through testing Very expensive to rebuild for more tests

4 Data, data every where We are surrounded by data that we want, but it is difficult to find the information that we need Water, water every where, Nor any dro to drink. Samuel Taylor Coleridge, Rime of the Ancient Mariner Private, shared, and public data repositories Files on your computer Group documents and files Experimental results Published papers Data are scattered across many systems and devices Personal computer, old diskettes in a box, several systems, Old computer systems The Rime of the Ancient Mariner: Plate 32: The Pliot, by Gustave Doré

5 Need for Data Education Data is the 4th paradigm of Science and Engineering We are losing valuable data every day The techniques we were taught to maintain a lab notebook has not been effectively transferred to computer based data collection and registration systems. So much data is available and collected today, it is not possible to keep it on paper anymore.

6 Two Examples of Data Intensive Science Two large-scale science and engineering projects illustrate the problems related to data intensive science National Science Foundation George E. Brown Network for Earthquake Engineering Simulation (NEES) Purdue operates the headquarters for the NEEScomm, the community of NEES research facilities The Compact Muon Solenoid project Purdue operates a Tier-2 CMS center

7 NSF Network for Earthquake Engineering Simulation (NEES) Safer buildings and civil infrastructure are needed to reduce damage and loss from earthquakes and tsunamis To facilitate research to improve seismic design of buildings and civil infrastructure, the National Science Foundation established NEES NEES Objectives Develop a national, multi-user, research infrastructure to support research and innovation in earthquake and tsunami loss reduction Create an educated workforce in hazard mitigation Conduct broader outreach and lifelong learning activities

8 Vision for NEES Facilitate access to the world's best integrated network of state-of-the art physical simulation facilities Build a cyber-enabled community that shares ideas, data, and computational tools and models. Promote education and training for the next generation of researchers and practitioners. Cultivate partnerships with other organizations to disseminate research results, leverage cyberinfrastructure, and reduce risk by transferring results into practice.

9 NEES Research Facilities NEES has a broad set of experimental facilities Each type of equipment produces unique data Located at 14 sites across the United States Shake Table Tsunami Wave Basin Large-Scale Testing Facilities Centrifuge Field and Mobile Facilities Large-Displacement Facility Cyberinfrastructure

10 Oregon State University University of Minnesota University of Illinois- Urbana University of California Berkeley University of California Davis https://www.nees.org University of Buffalo University of California Santa Barbara Cornell University University of California Los Angeles Rensselaer Polytechnic Institute University of California San Diego University of Nevada Reno University of Texas Austin Lehigh University

11 Large-Scale Testing Facilties Lehigh University Reaction wall, strong floor dynamic actuators UC-Berkeley Reconfigurable Reaction Wall University of Illinois Urbana-Champaign Multi-Axial Full-Scale Sub-Structured Testing & Simulation (MUST-SIM) University of Minnesota Reaction walls Multi-Axial Subassemblage Testing (MAST) Images: Univ of Minnesota

12 NEEShub at Nees.org

13 Compact Muon Solenoid Project Another example of a big data project Two primary computational goals Move detector data from Large Hadron Collider at CERN to remote sites for processing Examine detector data for evidence of Higgs boson ~15 PB/yr data Applications used by CMS are not inherently parallel Data is split up and distributed across nodes Embarrassingly parallel

14 CMS Project Overview CERN Large Hadron Collider Project (LHC) Particle accelerator and collider largest in the world 17 mile circumference tunnel Providing evidence to support the existence of the Higgs boson Six detector experiments at the LHC Atlas, CMS, LHCb, ALICE, TOTEM, LHCf Compact Muon Solenoid (CMS) Very large solenoid with 4 Tesla magnetic field Earth s magnetic field 60 x 10^-6 Tesla

15 CMS Detector

16

17 Purdue CMS Tier-2 Center Computing Infrastructure ~10,000 computing cores within the Purdue University Community Cluster program Purdue recently (June 18) announced the Conte Supercomputer Fastest university-owned supercomputer in the United States 3 PB of disk storage running Hadoop Sharing a 100 Gb/sec network uplink to Indianapolis and Chicago Ultimately connecting to Fermi National Lab in Chicago Provided 14% of all Tier-2 computing globally in 2012

18 Purdue CMS Tier-2 Center Physicists from around the world submit computational jobs to Purdue Data is copied from the Tier-1 to Purdue storage on user request Simulation codes also run at Purdue, with results pushed up to Tier-1 center or other Tier-2s. International data sharing Data interoperability is designed into the project from the beginning. There is one instrument (the CMS detector), which greatly simplified the sharing and reuse of data compared with a project like NEES

19 Challenges involved in Big Data Performance at Scale How can we effectively match data performance with HPC capabilities? How can we ensure good reliability of these systems? Data Curation Challenges What should we preserve, how should we preserve it, and how can we ensure the long-term viability of the data? Disciplinary Sociology and Cyberinfrastructure How can we effectively promote and support the adoption and use of new technologies? How can we foster the development of new disciplinary practices focused on the long-term accessibility of data?

20 Performance at Scale Petaflop scale systems are now available for use by researchers Example: Purdue Conte system announced this week (Rmax 943 TF, Rpeak Petaflops) Conte was built with 580 HP ProLiant SL250 Generation 8 (Gen8) servers, each incorporating two Intel Xeon processors and two Intel Xeon Phi coprocessors, integrated with Mellanox 56Gb/S FDR InfiniBand. Conte has 580 servers (570 at the time of testing) with 9,120 standard cores and 68,400 Phi cores, for a total of 77,520 cores. Big data analytics coupled with petascale systems requires high bandwidth storage systems Avoid wasteful and expensive CPU stalls Scaling up is along two axes: Large volume of data (example: CMS Project) Large variety and number of files (example: NEES project)

21 Curation Challenges Data production rate is tremendous Volume of data is growing over time Sensor sampling rate increasing High definition video Managing data transfer Time required to upload and download data is growing Upload and download time can take a lot of time if there are network bottlenecks Ensuring data integrity Filtering, cleaning, and calibration is often needed before upload and curating data The community needs to also retain the raw data in case an error is made or in case a researcher can later distill further insights from the data.

22 Curation Challenges File type management Data is stored in files through the intermediary of an application This means that the information in the data will be encoded into some kind of format It s difficult (if not impossible) to restrict the file formats used by the research community As these applications change (or disappear) over time, the information encoded in the data may become stranded Risk of stranded data When the file format cannot be precisely identified, then we don t know which application can be used as an intermediary for reading the information encoded in the data. This leads to stranded data that is useless.

23 Curation Challenges Linking computation with data and archived data Will need the ability to quickly search archived data much more detailed that what Google can deliver How can we quickly discover, convert, and transfer archived data to be close to the user and to computation? (especially HPC) Need to match data I/O capabilities with growth in the number of CPU cores and core speed.

24 Long-term accessibility We have data in the NEEShub from the 1970s Science: Rescue of Old Data Offers Lesson for Particle Physicists by Andrew Curry (Feb 2011) Described the need to find old, almost lost data for a physics experiment from the 1980s The data will need to remain viable and accessible for years into the future

25 Discipline Sociology Sociological factors in data curation Disciplinary differences in how data are archived, how to value archived data, and determining what is worth retaining Who determines what is worth keeping? What is the practice in the specific discipline? International standards and practices in metadata tagging, representing numbers, and even character sets NEES is working with partners in Japan and China we need to determine how to represent their data in a common standard framework Terminology for numbers (, vs.., lakh vs. 100,000) Changing the behavior of scientists to value curation and long-term accessibility

26 Managing Curation at Scale How can we efficiently use data curator s time? NEES now has 1.8M files, what will happen in 3 more years? How can we manage 10M files with a limited curation staff? For NEES,we are using the OAIS model as a guideline for designing a curation pipeline for curating NEES data The OAIS model is proving to be a useful model for thinking about how to undertake data curation We are developing a curation pipeline to help automate curation for the many files in the NEES Project Warehouse

27 Data Analytics There are technologies available today that can be used to provide solutions to these problems High performance computing Parallel file systems Map Reduce/Hadoop A sustainable solution requires more than a set of technologies An effective data cyberinfrastructure involves both sociological and technological components. What is needed to educate and train researchers to effectively learn to use new technologies?

28 Our approach Developing a joint research and education and program in big data analytics between the University of Stavanger and Purdue University and AMD Research. Chunming Rong, Tomasz Wlodarczyk (Stavanger) Thomas Hacker, Ray Hansen, Natasha Nikolaidis (Purdue) Greg Rodgers (AMD Research) Funded by SIU: Strategic Collaboration on Advanced Data Analysis and Communication between Purdue University and University of Stavanger Developing a semester long joint course in HPC and Big Data Analytics, and a short summer course (to be delivered next week)

29 Planned Course Objectives Students will learn to put modern tools to use in order to do data analysis of large and complex data sets. Students will be able to: design, construct, test, and benchmark a small data processing cluster (based on Hadoop) Demonstrate knowledge of MapReduce functionalities through the development of a MapReduce program Understand Hadoop job tracker, task tracker, scheduling issues, communications, and resource management. Construct programs based on MapReduce paradigm for typical algorithmic problems Use functional programming concept to describe data dependencies and analyze complexity of MapReduce programs

30 Planned Course Objectives Algorithms Understand algorithmic complexity of the worst case, expected case, and best case running time (big-oh notation), and the orders of complexity (e.g. N, N^2, Log N, NP-Hard) Examine a basic algorithm and identify the algorithmic complexity order File Systems Describe the concepts of a distributed file system, how it differs from a local file system, the performance of distributed file systems. Describe a parallel file system, the performance advantages possible through the use of a parallel file system, and the inherent reliability and fault tolerance mechanisms needed for parallel file systems. Examples include OrangeFS and Lustre understand peak and sustained bandwidth rates understand the differences between RDBMS, data warehouse, unstructured big data, and keyed files.

31 Short Course Format Lecture in the morning followed by lab in the afternoon Labs are built on a set of Desktop PCs running Hadoop in an RHEL6 virtual machine running on top of VMware Using pfsense (open source firewall) to create a secure network connection from the instruction site to the computers running Hadoop Working to refine the network and lab equipment setup based on our experiences delivering the short course next week.

32 Short Course Day 1 Topics Lecture Introduction and motivation for the course History of HPC, big data, Moore's Law. Science domain areas, and problems in each of those areas that motivate the need for this. Where are we today, and what is the projected need later? How are things driven by increases in computing power? Definition of big data, big compute, why we need both combined Mixture of trends, principles, and implementation in historic context that students should understand. Parallel application types Introduction to MapReduce Dataflow within MapReduce with plug-in Labs The hadoop command, HDFS, and Linux basics Hadoop basic examples from lectures

33 Short Course Day 2 Topics Lectures Introduction to MapReduce, continued Combiners More complex MapReduce example (search assist) Hadoop Architecture Motivation for Hadoop Hadoop building blocks (name node, data node, etc.) Fault tolerance and failures, replication, and data aware scheduling. Main components (HDFS, MapReduce, modes (local, distributed, pseudo distributed), etc.) HDFS GUI Labs We will use combiners and multiple reducers to improve performance. We will look at network traffic and data counters to evaluate. Students will evaluate the performance improvement for each optimization of MapReduce program. The advanced student will gather network and data statistics to explain why each phase got better.

34 Short Course Day 3 Topics Lectures Hadoop Architecture, continued Comparison of HDFS with other Parallel File System architectures (GoogleFS, Lustre, OrangeFS), and how Hadoop differs from these systems Chaining MapReduce jobs Mapreduce Algorithms: K-means or other algorithms Schemas for unstructured data using Hive Introduction to data organization. Why are we concerned about data organization? What are the impacts of poor organization on performance and correctness? Data organization: Level of data organization - data structure, file level, cluster level, data parallelization, organization level. How do we deal with large sequential files from a performance perspective and how it would be represented in parallel file system (e.g. HDFS) Lab Hive

35 Expected Outcomes Provide education and training to researchers to allow them to effectively think about big data to effectively use the technologies in their research and daily work. Improved data collection and management practices by researchers Development of new techniques for collaboration on a joint course across the Atlantic with a shared lab infrastructure for lab assignments.

36 Conclusions There is a need for data intensive training and education for scientists and engineers Effectively use existing technologies Develop new disciplinary practices for annotating and preserving valuable data Understand the critical need for data curation for the viability and long-term accessibility of data We are developing a education and research program focused on these issues Short course Semester length joint course at University of Stavanger and Purdue University Holding a symposium at the CloudCom conference in December DataCom - Symposium on High Performance and Data Intensive Computing Thomas Hacker, Purdue Univ., USA Tomasz Wiktor Wlodarczyk, University of Stavanger, Norway DataCom is organized under CloudCom as two tracks Big Data HPC on Cloud

Data Management. Facility Access Challenges: Rudi Eigenmann NEES Operations Headquarters NEEScomm Center Purdue University

Data Management. Facility Access Challenges: Rudi Eigenmann NEES Operations Headquarters NEEScomm Center Purdue University George E. Brown Jr. Network for Earthquake Engineering Simulation Facility Access Challenges: Data Management Rudi Eigenmann NEES Operations Headquarters NEEScomm Center Purdue University Why Data Management?

More information

Cybersecurity Operations in a Multi- Institutional Academic Setting: The NEES Story

Cybersecurity Operations in a Multi- Institutional Academic Setting: The NEES Story Cybersecurity Operations in a Multi- Institutional Academic Setting: The NEES Story Saurabh Bagchi, Fahad Ali Arshad, Gaspar Modelo- Howard Network for Earthquake Engineering and Simulation (NEES) School

More information

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 (Possible) HEP Use Case for NDN Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015 Outline LHC Experiments LHC Computing Models CMS Data Federation & AAA Evolving Computing Models & NDN Summary Phil DeMar:

More information

Big Data and Storage Management at the Large Hadron Collider

Big Data and Storage Management at the Large Hadron Collider Big Data and Storage Management at the Large Hadron Collider Dirk Duellmann CERN IT, Data & Storage Services Accelerating Science and Innovation CERN was founded 1954: 12 European States Science for Peace!

More information

Scaling-out Semantic Data Management and Processing

Scaling-out Semantic Data Management and Processing Scaling-out Semantic Data Management and Processing Tomasz Wiktor Wlodarczyk, Norway CIPSI Director: prof. Chunming Rong Areas of interest: Reasoning, analytics and simulation Distributed systems Dependability

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

IBM Deep Computing Visualization Offering

IBM Deep Computing Visualization Offering P - 271 IBM Deep Computing Visualization Offering Parijat Sharma, Infrastructure Solution Architect, IBM India Pvt Ltd. email: parijatsharma@in.ibm.com Summary Deep Computing Visualization in Oil & Gas

More information

Hadoop on the Gordon Data Intensive Cluster

Hadoop on the Gordon Data Intensive Cluster Hadoop on the Gordon Data Intensive Cluster Amit Majumdar, Scientific Computing Applications Mahidhar Tatineni, HPC User Services San Diego Supercomputer Center University of California San Diego Dec 18,

More information

CSE-E5430 Scalable Cloud Computing Lecture 2

CSE-E5430 Scalable Cloud Computing Lecture 2 CSE-E5430 Scalable Cloud Computing Lecture 2 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 14.9-2015 1/36 Google MapReduce A scalable batch processing

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

Testing 3Vs (Volume, Variety and Velocity) of Big Data

Testing 3Vs (Volume, Variety and Velocity) of Big Data Testing 3Vs (Volume, Variety and Velocity) of Big Data 1 A lot happens in the Digital World in 60 seconds 2 What is Big Data Big Data refers to data sets whose size is beyond the ability of commonly used

More information

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA

Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research

More information

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012

Here comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012 Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth

More information

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12

Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12 Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using

More information

A Service for Data-Intensive Computations on Virtual Clusters

A Service for Data-Intensive Computations on Virtual Clusters A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King rainer.schmidt@arcs.ac.at Planets Project Permanent

More information

Architecture & Experience

Architecture & Experience Architecture & Experience Data Mining - Combination from SAP HANA, R & Hadoop Markus Severin, Solution Principal Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein

More information

Big Data on Microsoft Platform

Big Data on Microsoft Platform Big Data on Microsoft Platform Prepared by GJ Srinivas Corporate TEG - Microsoft Page 1 Contents 1. What is Big Data?...3 2. Characteristics of Big Data...3 3. Enter Hadoop...3 4. Microsoft Big Data Solutions...4

More information

Integration of Virtualized Worker Nodes in Batch Systems

Integration of Virtualized Worker Nodes in Batch Systems Integration of Virtualized Worker Nodes Dr. A. Scheurer, Dr. V. Büge, O. Oberst, P. Krauß Linuxtag 2010, Berlin, Session: Cloud Computing, Talk ID: #16197 KIT University of the State of Baden-Wuerttemberg

More information

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics

Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Engineering the Data Processing Pipeline

Engineering the Data Processing Pipeline Engineering the Data Processing Pipeline Mark Stalzer Center for Advanced Computing Research California Institute of Technology stalzer@caltech.edu October 29, 2009 A systems engineering view of computational

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, sborkar95@gmail.com Assistant Professor, Information

More information

Data Refinery with Big Data Aspects

Data Refinery with Big Data Aspects International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data

More information

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper Testing the In-Memory Column Store for in-database physics analysis Dr. Maaike Limper About CERN CERN - European Laboratory for Particle Physics Support the research activities of 10 000 scientists from

More information

Applying Hadoop to Semantic Big Data Processing. CIPSI and Big Data. Tomasz Wiktor Wlodarczyk Chunming Rong. University of Stavanger, Norway

Applying Hadoop to Semantic Big Data Processing. CIPSI and Big Data. Tomasz Wiktor Wlodarczyk Chunming Rong. University of Stavanger, Norway Applying Hadoop to Semantic Big Data Processing CIPSI and Big Data Tomasz Wiktor Wlodarczyk Chunming Rong University of Stavanger, Norway Index Querying in Hadoop- based triple store Scaling- out NCBO

More information

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

An Oracle White Paper November 2010. Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics An Oracle White Paper November 2010 Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics 1 Introduction New applications such as web searches, recommendation engines,

More information

Large scale processing using Hadoop. Ján Vaňo

Large scale processing using Hadoop. Ján Vaňo Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine

More information

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM

Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Maximizing Hadoop Performance and Storage Capacity with AltraHD TM Executive Summary The explosion of internet data, driven in large part by the growth of more and more powerful mobile devices, has created

More information

The Resilient Smart Grid Workshop Network-based Data Service

The Resilient Smart Grid Workshop Network-based Data Service The Resilient Smart Grid Workshop Network-based Data Service October 16 th, 2014 Jin Chang Agenda Fermilab Introduction Smart Grid Resilience Challenges Network-based Data Service (NDS) Introduction Network-based

More information

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

An Oracle White Paper June 2012. High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database An Oracle White Paper June 2012 High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database Executive Overview... 1 Introduction... 1 Oracle Loader for Hadoop... 2 Oracle Direct

More information

Hadoop IST 734 SS CHUNG

Hadoop IST 734 SS CHUNG Hadoop IST 734 SS CHUNG Introduction What is Big Data?? Bulk Amount Unstructured Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per day) If a regular machine need to

More information

Enabling High performance Big Data platform with RDMA

Enabling High performance Big Data platform with RDMA Enabling High performance Big Data platform with RDMA Tong Liu HPC Advisory Council Oct 7 th, 2014 Shortcomings of Hadoop Administration tooling Performance Reliability SQL support Backup and recovery

More information

Challenges in e-science: Research in a Digital World

Challenges in e-science: Research in a Digital World Challenges in e-science: Research in a Digital World Thom Dunning National Center for Supercomputing Applications National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

More information

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components of Hadoop. We will see what types of nodes can exist in a Hadoop

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Big Data, Big Red II, Data Capacitor II, Wrangler, Jetstream, and Globus Online

Big Data, Big Red II, Data Capacitor II, Wrangler, Jetstream, and Globus Online Big Data, Big Red II, Data Capacitor II, Wrangler, Jetstream, and Globus Online A national science & engineering cloud funded by the National Science Foundation Award #ACI-1445604 Big Data, Big Red II,

More information

MapReduce and Hadoop Distributed File System

MapReduce and Hadoop Distributed File System MapReduce and Hadoop Distributed File System 1 B. RAMAMURTHY Contact: Dr. Bina Ramamurthy CSE Department University at Buffalo (SUNY) bina@buffalo.edu http://www.cse.buffalo.edu/faculty/bina Partially

More information

THE DEVELOPMENT OF THE NETWORK FOR EARTHQUAKE ENGINEERING SIMULATION (NEES)

THE DEVELOPMENT OF THE NETWORK FOR EARTHQUAKE ENGINEERING SIMULATION (NEES) THE DEVELOPMENT OF THE NETWORK FOR EARTHQUAKE ENGINEERING SIMULATION (NEES) Stephen Mahin 1, Robert Nigbor 2, Cherri Pancake 3, Robert Reitherman 4 and Sharon Wood 5 1 2 3 4 5 Dept. of Civil and Environmental

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

BIG DATA TRENDS AND TECHNOLOGIES

BIG DATA TRENDS AND TECHNOLOGIES BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.

More information

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D.

PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD. Natasha Balac, Ph.D. PACE Predictive Analytics Center of Excellence @ San Diego Supercomputer Center, UCSD Natasha Balac, Ph.D. Brief History of SDSC 1985-1997: NSF national supercomputer center; managed by General Atomics

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

The Future of Data Management

The Future of Data Management The Future of Data Management with Hadoop and the Enterprise Data Hub Amr Awadallah (@awadallah) Cofounder and CTO Cloudera Snapshot Founded 2008, by former employees of Employees Today ~ 800 World Class

More information

Data Semantics Aware Cloud for High Performance Analytics

Data Semantics Aware Cloud for High Performance Analytics Data Semantics Aware Cloud for High Performance Analytics Microsoft Future Cloud Workshop 2011 June 2nd 2011, Prof. Jun Wang, Computer Architecture and Storage System Laboratory (CASS) Acknowledgement

More information

Oracle Big Data SQL Technical Update

Oracle Big Data SQL Technical Update Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical

More information

High-Performance Computing and Big Data Challenge

High-Performance Computing and Big Data Challenge High-Performance Computing and Big Data Challenge Dr Violeta Holmes Matthew Newall The University of Huddersfield Outline High-Performance Computing E-Infrastructure Top500 -Tianhe-II UoH experience: HPC

More information

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7

Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Architecting for the next generation of Big Data Hortonworks HDP 2.0 on Red Hat Enterprise Linux 6 with OpenJDK 7 Yan Fisher Senior Principal Product Marketing Manager, Red Hat Rohit Bakhshi Product Manager,

More information

Data Centric Computing Revisited

Data Centric Computing Revisited Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data

More information

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil

Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil Betriebssystem-Virtualisierung auf einem Rechencluster am SCC mit heterogenem Anwendungsprofil Volker Büge 1, Marcel Kunze 2, OIiver Oberst 1,2, Günter Quast 1, Armin Scheurer 1 1) Institut für Experimentelle

More information

Hadoop implementation of MapReduce computational model. Ján Vaňo

Hadoop implementation of MapReduce computational model. Ján Vaňo Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed

More information

Overview of HPC Resources at Vanderbilt

Overview of HPC Resources at Vanderbilt Overview of HPC Resources at Vanderbilt Will French Senior Application Developer and Research Computing Liaison Advanced Computing Center for Research and Education June 10, 2015 2 Computing Resources

More information

Apache Hadoop: The Big Data Refinery

Apache Hadoop: The Big Data Refinery Architecting the Future of Big Data Whitepaper Apache Hadoop: The Big Data Refinery Introduction Big data has become an extremely popular term, due to the well-documented explosion in the amount of data

More information

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS

A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS A STUDY ON HADOOP ARCHITECTURE FOR BIG DATA ANALYTICS Dr. Ananthi Sheshasayee 1, J V N Lakshmi 2 1 Head Department of Computer Science & Research, Quaid-E-Millath Govt College for Women, Chennai, (India)

More information

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research

Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Cloud Computing for Research Roger Barga Cloud Computing Futures, Microsoft Research Trends: Data on an Exponential Scale Scientific data doubles every year Combination of inexpensive sensors + exponentially

More information

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing.

Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Exploring the roles and responsibilities of data centres and institutions in curating research data a preliminary briefing. Dr Liz Lyon, UKOLN, University of Bath Introduction and Objectives UKOLN is undertaking

More information

SEAIP 2009 Presentation

SEAIP 2009 Presentation SEAIP 2009 Presentation By David Tan Chair of Yahoo! Hadoop SIG, 2008-2009,Singapore EXCO Member of SGF SIG Imperial College (UK), Institute of Fluid Science (Japan) & Chicago BOOTH GSB (USA) Alumni Email:

More information

Data sharing and Big Data in the physical sciences. 2 October 2015

Data sharing and Big Data in the physical sciences. 2 October 2015 Data sharing and Big Data in the physical sciences 2 October 2015 Content Digital curation: Data and metadata Why consider the physical sciences? Astronomy: Video Physics: LHC for example. Video The Research

More information

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk

HPC and Big Data. EPCC The University of Edinburgh. Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk HPC and Big Data EPCC The University of Edinburgh Adrian Jackson Technical Architect a.jackson@epcc.ed.ac.uk EPCC Facilities Technology Transfer European Projects HPC Research Visitor Programmes Training

More information

From Distributed Computing to Distributed Artificial Intelligence

From Distributed Computing to Distributed Artificial Intelligence From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms

More information

A Study of Data Management Technology for Handling Big Data

A Study of Data Management Technology for Handling Big Data Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 9, September 2014,

More information

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2 Volume 6, Issue 3, March 2016 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Special Issue

More information

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing

An Alternative Storage Solution for MapReduce. Eric Lomascolo Director, Solutions Marketing An Alternative Storage Solution for MapReduce Eric Lomascolo Director, Solutions Marketing MapReduce Breaks the Problem Down Data Analysis Distributes processing work (Map) across compute nodes and accumulates

More information

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc. Beyond Web Application Log Analysis using Apache TM Hadoop A Whitepaper by Orzota, Inc. 1 Web Applications As more and more software moves to a Software as a Service (SaaS) model, the web application has

More information

Freshmen Acceptance Rate AAU Public Universities

Freshmen Acceptance Rate AAU Public Universities Freshmen University of California, Berkeley 27% University of California, Los Angeles 27% University of North Carolina, Chapel Hill 37% University of Virginia 38% University of California, San Diego 44%

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

Mapping the Universe; Challenge for the Big Data science

Mapping the Universe; Challenge for the Big Data science Mapping the Universe; Challenge for the Big Data science Dr.Utain Sawangwit National Astronomical Research Institute of Thailand utane@narit.or.th Observational Cosmology relies on big surveys and very

More information

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK A REVIEW ON HIGH PERFORMANCE DATA STORAGE ARCHITECTURE OF BIGDATA USING HDFS MS.

More information

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE AGENDA Introduction to Big Data Introduction to Hadoop HDFS file system Map/Reduce framework Hadoop utilities Summary BIG DATA FACTS In what timeframe

More information

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012

Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 Unstructured Data Accelerator (UDA) Author: Motti Beck, Mellanox Technologies Date: March 27, 2012 1 Market Trends Big Data Growing technology deployments are creating an exponential increase in the volume

More information

The Greenplum Analytics Workbench

The Greenplum Analytics Workbench The Greenplum Analytics Workbench External Overview 1 The Greenplum Analytics Workbench Definition Is a 1000-node Hadoop Cluster. Pre-configured with publicly available data sets. Contains the entire Hadoop

More information

Modern Data Architecture for Predictive Analytics

Modern Data Architecture for Predictive Analytics Modern Data Architecture for Predictive Analytics David Smith VP Marketing and Community - Revolution Analytics John Kreisa VP Strategic Marketing- Hortonworks Hortonworks Inc. 2013 Page 1 Your Presenters

More information

The George E. Brown, Jr. Network for Earthquake Engineering Simulation. NEES Experimental Site Backup Plan at NEES@Lehigh

The George E. Brown, Jr. Network for Earthquake Engineering Simulation. NEES Experimental Site Backup Plan at NEES@Lehigh The George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES Experimental Site Backup Plan at NEES@Lehigh Last Modified November 12, 2012 Acknowledgement: This work was supported primarily

More information

www.thinkparq.com www.beegfs.com

www.thinkparq.com www.beegfs.com www.thinkparq.com www.beegfs.com KEY ASPECTS Maximum Flexibility Maximum Scalability BeeGFS supports a wide range of Linux distributions such as RHEL/Fedora, SLES/OpenSuse or Debian/Ubuntu as well as a

More information

NT1: An example for future EISCAT_3D data centre and archiving?

NT1: An example for future EISCAT_3D data centre and archiving? March 10, 2015 1 NT1: An example for future EISCAT_3D data centre and archiving? John White NeIC xx March 10, 2015 2 Introduction High Energy Physics and Computing Worldwide LHC Computing Grid Nordic Tier

More information

Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen

Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen Unterstützung datenintensiver Forschung am KIT Aktivitäten, Dienste und Erfahrungen Achim Streit Steinbuch Centre for Computing (SCC) KIT Universität des Landes Baden-Württemberg und nationales Forschungszentrum

More information

Buyer s Guide to Big Data Integration

Buyer s Guide to Big Data Integration SEPTEMBER 2013 Buyer s Guide to Big Data Integration Sponsored by Contents Introduction 1 Challenges of Big Data Integration: New and Old 1 What You Need for Big Data Integration 3 Preferred Technology

More information

Hadoop and Map-Reduce. Swati Gore

Hadoop and Map-Reduce. Swati Gore Hadoop and Map-Reduce Swati Gore Contents Why Hadoop? Hadoop Overview Hadoop Architecture Working Description Fault Tolerance Limitations Why Map-Reduce not MPI Distributed sort Why Hadoop? Existing Data

More information

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW

AGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this

More information

Moving Beyond the Web, a Look at the Potential Benefits of Grid Computing for Future Power Networks

Moving Beyond the Web, a Look at the Potential Benefits of Grid Computing for Future Power Networks Moving Beyond the Web, a Look at the Potential Benefits of Grid Computing for Future Power Networks by Malcolm Irving, Gareth Taylor, and Peter Hobson 1999 ARTVILLE, LLC. THE WORD GRID IN GRID-COMPUTING

More information

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture

Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture Analysis and Optimization of Massive Data Processing on High Performance Computing Architecture He Huang, Shanshan Li, Xiaodong Yi, Feng Zhang, Xiangke Liao and Pan Dong School of Computer Science National

More information

Big Data and Natural Language: Extracting Insight From Text

Big Data and Natural Language: Extracting Insight From Text An Oracle White Paper October 2012 Big Data and Natural Language: Extracting Insight From Text Table of Contents Executive Overview... 3 Introduction... 3 Oracle Big Data Appliance... 4 Synthesys... 5

More information

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary

Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary Big Science and Big Data Dirk Duellmann, CERN Apache Big Data Europe 28 Sep 2015, Budapest, Hungary 16/02/2015 Real-Time Analytics: Making better and faster business decisions 8 The ATLAS experiment

More information

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. Impact of Big Data in Oil & Gas Industry Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. New Age Information 2.92 billions Internet Users in 2014 Twitter processes 7 terabytes

More information

Cisco Data Preparation

Cisco Data Preparation Data Sheet Cisco Data Preparation Unleash your business analysts to develop the insights that drive better business outcomes, sooner, from all your data. As self-service business intelligence (BI) and

More information

High Performance Computing (HPC)

High Performance Computing (HPC) High Performance Computing (HPC) High Performance Computing (HPC) White Paper Attn: Name, Title Phone: xxx.xxx.xxxx Fax: xxx.xxx.xxxx 1.0 OVERVIEW When heterogeneous enterprise environments are involved,

More information

Storage Architectures for Big Data in the Cloud

Storage Architectures for Big Data in the Cloud Storage Architectures for Big Data in the Cloud Sam Fineberg HP Storage CT Office/ May 2013 Overview Introduction What is big data? Big Data I/O Hadoop/HDFS SAN Distributed FS Cloud Summary Research Areas

More information

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING

www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING VISUALISATION GPU COMPUTING www.xenon.com.au STORAGE HIGH SPEED INTERCONNECTS HIGH PERFORMANCE COMPUTING GPU COMPUTING VISUALISATION XENON Accelerating Exploration Mineral, oil and gas exploration is an expensive and challenging

More information

Workshop on Hadoop with Big Data

Workshop on Hadoop with Big Data Workshop on Hadoop with Big Data Hadoop? Apache Hadoop is an open source framework for distributed storage and processing of large sets of data on commodity hardware. Hadoop enables businesses to quickly

More information

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect on AWS Services Overview Bernie Nallamotu Principle Solutions Architect \ So what is it? When your data sets become so large that you have to start innovating around how to collect, store, organize, analyze

More information

CERN s Scientific Programme and the need for computing resources

CERN s Scientific Programme and the need for computing resources This document produced by Members of the Helix Nebula consortium is licensed under a Creative Commons Attribution 3.0 Unported License. Permissions beyond the scope of this license may be available at

More information

Graduate School Rankings By U.S. News & World Report: ELECTRICAL/ELECTRONIC ENGINEERING

Graduate School Rankings By U.S. News & World Report: ELECTRICAL/ELECTRONIC ENGINEERING Rank Universities Score 2 University of Illinois, Urbana-Champaign 4.7 3 University of Michigan, Ann Arbor 4.4 4 University of Texas, Austin 4.2 5 Purdue University, West Lafayette 4.1 6 University of

More information

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division In this talk Big data storage: Current trends Issues with current storage options Evolution of storage to support big

More information

Integrated Grid Solutions. and Greenplum

Integrated Grid Solutions. and Greenplum EMC Perspective Integrated Grid Solutions from SAS, EMC Isilon and Greenplum Introduction Intensifying competitive pressure and vast growth in the capabilities of analytic computing platforms are driving

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Optimizing Data Centers for Big Infrastructure Applications

Optimizing Data Centers for Big Infrastructure Applications white paper Optimizing Data Centers for Big Infrastructure Applications Contents Whether you need to analyze large data sets or deploy a cloud, building big infrastructure is a big job. This paper discusses

More information

Graduate School Rankings By U.S. News & World Report: MECHANICAL ENGINEERING

Graduate School Rankings By U.S. News & World Report: MECHANICAL ENGINEERING Rank Universities Score 1 University of California, Berkeley 4.8 4 Purdue University, West Lafayette 4.2 6 University of Minnesota, Twin Cities 3.9 7 Penn State University, University Park 3.8 7 Texas

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering, and Education (CIF21) investment

More information