Distributed Big Data and Analytics (DBDA) Internet2 CINO Ini,a,ve Working Group Co- Chair Mee,ng 10 August 2015 Chairs Alex Feltus, Clemson Sam Gustman, USC Marc Hoit, NC State 1 1
Meeting Objectives Discussion of eight submitted use cases Discussion of next steps 2
Use case input from the working group Use Case # Use Case Title Name Ins0tu0on 1 Data Analy,cs of Campus- Scale Power SubmiHed by Alex Feltus Clemson System Contact Dan Noneaker 2 Intelligent Management Systems Center SubmiHed by Jane Combs Univ of Cincinna, Contact Prof. Jay Lee 3 Machine Tool Ball Screw Health Monitoring SubmiHed by Jane Combs Univ of Cincinna, Contact Prof. Jay Lee 4 Bioinforma,cs SubmiHed by Jane Combs Univ of Cincinna, 5 Computa,onal Fluid Dynamics Research: SubmiHed by Jane Combs Univ of Cincinna, Aerospace 6 Geography/Climate SubmiHed by Jane Combs Univ of Cincinna, 7 High Energy Physics SubmiHed by Jane Combs Univ of Cincinna, 8 Modeling and Simula,ons SubmiHed by Jane Combs Univ of Cincinna, 3
Data Analytics of Campus-Scale Power System Submitted by Alex Feltus at Clemson Contact Dan Noneaker at Clemson Project/Research Title: Data Analytics of Campus-Scale Power System Industry Sector: Electric Power Utility Science Sub-domain: Electrical Engineering Short Description of Project & Relation to Big Data: The local electric power grid will be heavily instrumented on a campus containing a mix of residential sites, office spaces, industrial-scale electromechanical systems, and distributed energy sources. The instruments will be networked to a server that provides data for use in analytics focused on electric energy consumption, electric-service reliability, power quality, and local grid planning and design. The analytics will support research in local-grid technologies, distributed control of the electric grid, and power electronics, etc. Potential Industrial Partners: Duke Energy (Clemson's electric service provider), other electric utilities, power-industry instrumentation and electronics manufacturers, power-system monitoring and control vendors Other Faculty Involved All power faculty at Clemson, power research staff at Clemson's CURI site in Charleston, SoC faculty working in data analytics Best Contact: Dan Noneaker, ECE Dept. Chair, dnoneak@clemson.edu Big Data Attributes: sensor, near-realtime, distributed, geospatial Aggregate Data Size: 2 TB 4 TB 2017 16 TB 2020 1 PB 4
Intelligent Maintenance Systems (IMS) Center Contact Prof. Jay Lee at University of Cincinnati Project/Research Title: Utilizing Prognostics & Health Management (PHM) Cloud Technology to Improve Band Sawing Process Industry Sector: Manufacturing / Industrial Machinery Science Sub-domain: Data Analytics / Prognostics & Health Management Short Description of Project & Relation to Big Data: The goal of this project is to acquire a large amount of operating data from band saw machines both in the field and from an in-house test bed. This data is then analyzed using the Watchdog Agent toolkit to assess and predict the health condition of the monitored band saws. Once validated, this approach will be used to construct a commercial cloud-based platform and mobile app for the project sponsor. Best Contact: Professor Jay Lee Big Data Attributes: Sensor, Near Real-time Aggregate Data Size: 5 TB 10 TB 2017 2020 5
Machine Tool Ball Screw Health Monitoring Contact Prof. Jay Lee at University of Cincinnati Project/Research Title: Machine Tool Ball Screw Health Monitoring Industry Sector: Manufacturing / Industrial Machinery Science Sub-domain: Data Analysis / Prognostics & Health Management Short Description of Project & Relation to Big Data: The goal of this project is to conduct multiple run-to-failure tests using commercially available machine tool ball screws and motors to collect data and design a data driven model for health monitoring and prediction of such ball screws. Data from these tests is transferred and stored on a central server. A mobile app will be developed for monitoring these tests. Best Contact: Professor Jay Lee Big Data Attributes: Sensor, Near Real-time Aggregate Data Size: 5TB 15TB 2017 30TB 2020 6
Bioinformatics Project/Research Title: NIH BD2K-LINCS Perturbation Data Coordination and Integration Center Industry Sector: Health care / Environmental Health/ Biomedical Research Science Sub-domain: Bioinformatics and Data Science Short Description of Project & Relation to Big Data: The Library of Integrated Network Based Cellular Signatures (LINCS) project is expected to produce masses of data collected from human cells and tissues perturbed with drugs and other molecules. The center s role is to develop new methods to integrate big data, come up with intelligent ways to mine and analyze it, intuitive tools to interact with it and to educate the research community on how to best leverage this trove of information for biomedical research. Best Contact: Jane Combs, combsje@uc.edu Big Data Attributes: In biomedical research, these data sources include the diverse, complex, disorganized, massive, and multimodal data being generated by researchers, hospitals, and mobile devices around the world. Aggregate Data Size: 2017 2020 7
Computational Fluid Dynamics Research: Aerospace Project/Research Title: Study of Active and Passive Flow Control Techniques over Turbine Blades Industry Sector: Aerospace Science Sub-domain: Mechanical Engineering, Comp Fluid Dynamics Short Description of Project & Relation to Big Data: Collaborative immersive visualization of large datasets and simulation trajectories to support the study of active and passive flow control techniques and turbine-blade cooling. The goal of such simulations is to provide predictive performance analysis of physical systems that may contain many integrated components and which are described by multiple, interacting, physical processes. Best Contact: Jane Combs, combsje@uc.edu Big Data Attributes: Simulation datasets and data visualization Aggregate Data Size: 2TB 4TB 2017 6TB 2020 8
Geography/Climate Project/Research Title: Toward a Circumarctic Lakes Observation Network (CALON)--Multiscale Observations of Lacustrine Systems Industry Sector: Geography Science Sub-domain: Climate Short Description of Project & Relation to Big Data: Expand on existing lake monitoring sites in northern Alaska by developing a network of regionally representative lakes along environmental gradients from which we will collect baseline data to assess current physical, chemical, and biological lake characteristics. Download and process hundreds of up to 1TB processed satellite image data sets with four Internet2 universities and NSF National Snow and Ice Data Center. Develop and refine data management, visualization, and archiving activities with ACADIS. Best Contact: Jane Combs, combsje@uc.edu Big Data Attributes: Satellite image data sets Aggregate Data Size: 1TB 3TB 2017 2020 9
High Energy Physics Project/Research Title: Large Hadron Collider (LHCb) experiment at CERN studying heavy flavor physics Industry Sector: Nuclear Physics, Energy Science Sub-domain: Physics Short Description of Project & Relation to Big Data: The physics focus is studying oscillations of matter into anti-matter and studying the differences in decays rates of matter and corresponding anti-matter to mirror-image final states. These address the nature of fundamental interactions between the basic constituents of matter. We move large data files from host laboratories to computers at UC for final analysis. For example, a single file with an LHCb NTUPLE from a small fraction of the data is 3GB. The size of the data set to be transferred for this analysis will be on the order of 1TB. Best Contact: Jane Combs, combsje@uc.edu Big Data Attributes: LHCb NTUPLE data sets Aggregate Data Size: 3TB 6TB 2017 6TB 2020 10
Modeling and Simulation Project/Research Title: Study of Active and Passive Flow Control Techniques over Turbine Blades Industry Sector: Consumer Products Science Sub-domain: Mechanical Engineering, Modeling and Simulation Short Description of Project & Relation to Big Data: The UC Simulation Center, in collaboration with Procter & Gamble, focuses on high-fidelity numerical simulation of complex flow phenomena with a wide range of length and time scales. The UC Simulation center is a partnership where students work on M&S of complex industrial problems associated with porous media, multiphase flows, etc. Predictive performance analysis of the multidisciplinary, multi-scale systems generate terabytes of data. Best Contact: Jane Combs, combsje@uc.edu Big Data Attributes: Simulation datasets and data visualization Aggregate Data Size: 2TB 4TB 2017 6TB 2020 11
Next Steps Schedule deep dive calls between Co-Chairs and Use Case POCs Determine co-chair presenters for August 31 Joint Collaborative Innovation Community call Preparation for in person meetings at TechEx October 4-7 in Cleveland, Ohio DBDA Innovation Working Group meeting 75 minutes Review current status, use cases, gather new ideas Collaborative Innovation Community meeting all 3 working groups together 90 minutes Each working group presents status Invite new participants and new ideas Innovation hackathon over lunch for new ideas in current innovation areas or new ones Monthly team meeting next one September 12
Thank You 13