NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015

Size: px
Start display at page:

Download "NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015"

Transcription

1 NASA Earth Science Research in Data and Computational Science Technologies Report of the ESTO/AIST Big Data Study Roadmap Team September 2015 I. Background Over the next decade, the dramatic growth of NASA s Earth Science data collections is projected to outpace the ability of scientists to analyze that data meaningfully. What has been termed the Vs of data (volume, variety, velocity, etc.) pose significant challenges for both Earth Science missions and researchers as traditional methods for developing science data pipelines, distributing scientific datasets and performing effective analysis will require new approaches. The Intergovernmental Panel on Climate Change s (IPCC) Assessment Report 6, for example, predicts the growth of data to tens of petabytes. Future remote sensing projects include an increasing set of data- intensive instruments that will pose severe challenges to existing systems. Likewise, with Earth Science data archives, these massive increases will shift the focus from distributing whole data sets to providing online services for computation and analysis. In addition, instruments flown on NASA Earth- observing satellites will continue to generate data and stress the boundaries of end- to- end data systems. These challenges taken together require new thinking in data capture, management, processing, and analysis, both onboard and on the ground. Big Data and Data Science have been used as terms to describe this data deluge and the discipline devote to address it. For the purposes of this document, Big Data is a term used to describe the state of collection and analysis of data that exceeds conventional methods or software systems. This state of affairs necessitates new approaches that will change the paradigm by which data is collected and analyzed. Data Science focuses on the use of systematic architectural, software, methodological, and algorithmic approaches (e.g., data management, intelligent algorithms, statistics and visualization) for generation, capture, management, analysis, and discovery from massive data sets and streams (i.e., big data) including those from multiple sensors, models, archives, and other sources, to enable research and decision support. Data Science is a term that is being applied at a growing number of universities that are establishing programs in this area. Data Science is thus emerging as a critical area of research and technology to advance scientific discovery. The sheer volume of data increase over the next decade, coupled with the highly distributed and heterogeneous nature of scientific data sets is requiring these new approaches. Numerous technologies are under development in multiple communities to address Big Data challenges. While NASA can leverage some of this capability (e.g., map- reduce to scale computation), there are significant technologies that need to be developed

2 that scale to address the data- intensive challenges underlying modern observing systems and the science questions they were created to investigate. Existing techniques also do not address the end- to- end nature of NASA s science- driven observational environment. NASA needs to develop a multi- year plan to address these challenges in total, rather than with incremental, isolated improvements that are unlikely to scale. Such an approach promises significant scientific yield from NASA s missions, instruments, archives, and research community and will be necessary in order to remain on the critical path to accomplish science in the Big Data era. II. Current State: Mission, Instrument and Large- scale Data Analysis Challenges Currently, the analysis of large data collections from NASA or other agencies is executed through traditional computational and data analysis approaches, which require users to bring data to their desktops and perform local data analysis. Alternatively, data are hauled to large computational environments that provide centralized data analysis via traditional High Performance Computing (HPC). Scientific data archives, however, are not only growing massive, but are also becoming highly distributed. As a consequence, neither traditional approach provides a good solution for optimizing analysis into the future. Further, assumptions across the NASA mission and science data lifecycle, which historically assume that all data can be collected, transmitted, processed, and archived, will not scale as more capable instruments stress legacy- based systems. A new paradigm is needed in order to increase the productivity and effectiveness of scientific data analysis. This paradigm must recognize that architectural and analytical choices are interrelated, and must be carefully coordinated in any system that aims to allow efficient, interactive scientific exploration and discovery to exploit massive data collections, from point of collection (e.g., onboard) to analysis and decision support. Expanding on this point, the most effective approach to analyzing a distributed set of massive data may involve some exploration and iteration, putting a premium on the flexibility afforded by the architectural framework. The framework should enable scientist users to assemble workflows efficiently, manage the uncertainties related to data analysis and inference, and optimize deep- dive analytics to enhance scalability. These challenges are not limited to NASA. Multiple agencies are confronted with the question of how to draw scientific inference from growing, distributed archives, as identified in the appendix. NASA has already made significant investments in capturing and sharing data from massive, online data systems. The NASA Earth Science Distributed Information System (ESDIS), most prominently the Distributed Active Archive Centers (DAACs), provide an excellent foundation for capturing and building high quality repositories. Technology investments to date by NASA (e.g., ROSES ACCESS, ROSES AIST, etc.) and the DOE (e.g.,

3 Earth System Grid) have focused on developing software services that have improved access to distributed data. The challenge that is posed now is how to integrate these capabilities with careful architectural approaches and emerging technologies that will allow NASA to continue to scale the entire data lifecycle and support the data analysis needs of the Earth Science community. The existing infrastructures are prime candidates for grounding a more distributed, scalable computational environment to optimize scientific data analysis and move to the era of Big Data Analytics. An integrated architecture and ecosystem will address the modern data and computing challenges across the NASA mission and science data lifecycle: Reproducibility Uncertainty Data fusion Data reduction Data movement Data visualization Cost Performance III. Use Cases Several use cases have been identified which present challenges to future NASA missions and science. These use cases identify the challenges and required capabilities that are needed in order to not just keep pace but increase the science yield from NASA Earth Science missions, instruments and data collections. Use Cases 1 the following identify use cases and their data science challenges. Use Case Data Science Challenge Enabling Mission/ Capability Climate Modeling Formulate hypotheses from observed empirical relationships; Simulate current and past conditions under those hypotheses using climate models; Test hypotheses by comparing simulations to observations; Evaluate uncertainty of predictions originated from statistical sampling of models and observations. Missions such as NI-SAR and SWOT will generate massive observational data. However, they are have different architectural patterns including compute intensive, data intensive, heterogeneous, Highly distributed data sources; fusion of different observations; moving computation to the data; data reduction CMIP6 will move towards exascale archives requiring new approaches to evaluating models relative to observational data. Satellite Missions Massive data rates, data movement challenges, computational NI-SAR and SWOT require new approaches for computation, data movement, data archiving and distribution, 1 NIST has an active working group developing Big Data use cases that includes NASA contributions in

4 etc. scalability, archiving and distribution; onboard processing for data reduction/analysis; high-volume data transfer for ground processing Distributed computation; highly distributed data sources; data fusion of multiple products; massive new satellite observations. analytics. Applications - Hydrology (Central Valley of California) Understanding groundwater dynamics on a regional scale using measurements from satellite, airborne and in-situ measurements. Compare against predictive models. Integration of data from PALSAR-2, Sentinel, Grace- FO, ASO, and SMAP. Scale to support NI-SAR and SWOT. Comparison against models. Requires new architectural approaches for distributed data analytics.. Airborne Missions Airborne missions tend to be much more agile and on-demand. Integrating this into a data ecosystem provides new opportunities to quickly generate and understand various measurements. On-demand architectures; distributed data sources; on-the-fly data processing; onboard processing for data reduction/analysis; high-volume data transfer for ground processing Current missions such as CARVE and Airborne Snow Observatory; Future such as proposed EVI-3 and ASO follow-on missions Other science disciplines such as biology carry similar use cases and have identified similar technology needs to those of NASA. These are identified in the section on benchmarking. IV. Conceptual Architecture: The Data Lifecycle for NASA Earth Science The data lifecycle is a critical perspective for developing a comprehensive set of capabilities to increase scientific yield from missions. This encompassing approach must include developing capabilities from the point of collection all the way to analysis and extracted understanding. Figure 1 below shows this concept and exposes the need to improve data science capabilities at multiple points of the data lifecycle: from onboard computing, to data triage of massive data streams, to data analysis. This perspective requires architectural considerations for determining and integrating methodologies and infrastructure for capturing and analyzing data across the full lifecycle. To frame the end-

5 to- end concept, the model below provides guidance for NASA investments and capabilities that will be required to support effectiveness and scalability in the Big Data era. The data lifecycle view reveals that choices and results of operating on data at one point in the lifecycle inevitably shape and possibly limit the possibilities for working with the data at a later point in the lifecycle. Coordination must be addressed in both directions: ultimately objectives for scientific data understanding and reproducibility must back- drive the design and development of capabilities early in the data lifecycle so as to enable desired results, to the greatest extent possible. See Figure 2. There is also an outer loop to the data lifecycle: Successful data understanding (and possibly, disappointments) drives the next round of science instrument and mission proposals. Figure 1: Data Lifecycle for NASA Science Missions

6 Figure 2: End-to-End NASA Mission/Science Data Lifecycle The data lifecycle views shown in Figures 1 and 2 identify a number of challenges across the mission lifecycle that affect the various scientific disciplines that are core to NASA. Earth Science, in particular, benefits from new approaches to improving data analysis given the complex, heterogeneous, high- volume, and distributed nature of the remote sensing data acquired from Earth- observing missions. As an example, use of these data for comparison against and verification of climate model simulations is critical to supporting research in global climate change. However, much of this data is managed in highly distributed repositories with different data representations, formats, access methods, and owners. Significant time and effort are required to access, move, and rectify data from different repositories to support specific analyses of interest to particular researchers. These requirements constrain both the type and scope of analyses that can be performed, and therefore ultimately the scientific hypotheses that can be tested. A major goal for data science is to reduce the cost of large- scale, interactive data analysis and, at the same time, leverage the richness of massive data sets to reduce uncertainty in the answers to crucial scientific questions. This is not possible in the current environment due to lack of scalability. Data science seeks to enable not just more discipline science (e.g., more data incorporated in analysis, more parameters to compare, etc.), but better science (e.g., quantitative assessments of uncertainty, repeatability of results, etc.), to be performed in a similar or shorter amount of time despite growing sets of data.

7 VI. Capability Gaps and Drivers A major gap has been the focus on localized data analysis from instruments vs. a holistic consideration of all the observing systems and how they fit into a big data analytics view and capability. Data systems today are organized around the capture and archiving of data from specific missions or observing capabilities. However, the integration of data from multiple instruments (spaceborne, airborne, ground- based) is important for supporting scientific research, in particular, in moving from isolated data analysis to knowledge discovery through the use of a big data analytics approach. This includes making data available from multiple sources and integrating those data using intelligent algorithms and methods. In addition, as data grows and more automated methods are in place for data discovery, this affords opportunities to improve the efficiency and effectiveness of ongoing mission operations and move it towards a data- driven approach where data is reduced onboard or at ground stations, prior to archive, as well as during fully offline science analysis. The introduction of new approaches with interpretation of data across the lifecycle allows for informed decisions at arbitrary points in the lifecycle allowing for mission plans to be updated, new relevant data products to be generated, etc. This same full view of the data lifecycle also will inform provenance practices, so that the relevant details all the way back to the point of collection can be captured to provide a basis for reproducing the end results of scientific understanding. This is a paradigm shift from how mission and science operations vs. analysis are performed today, largely in separate arenas 2. Figure 3 below shows the current approach for organizing NASA Earth Science data systems. The approach focuses extensively on the stewardship of data, by collecting data into organized and community- accessible archives. This approach has enabled NASA to build high quality data archives, but as the need for a shift towards systematic approaches to data analysis increases, it is important to take a broader view of how activities across the entire data lifecycle can be organized and integrated. Today, onboard computing, mission operations, science data processing and archiving, and data analysis are performed as generally independent architectures, systems and components of the data lifecycle. 2 This state of affairs is reflected in the disjoint NASA programs for science missions vs. research and analysis.

8 On$Board$ Processing$ Science$Teams$ Outreach$ Research$ TDRS% Netwo rk% Acquisi+on$ and$$ Command$ Mission$ Mission$ Opera+on Opera+ons$ s$ Instrume Instrume nt$ Opera+on nt$ Opera+on s$ L0A$$ s$ Processin L0A$$ Processin g$ EDOS/ EDOS/ g$ Ground$ GDS$ EDOS/ Systems$ GDS$ Instrument$ Opera+ons$ L0A$$ Processing$ Science$ Science$ $ Processing$ $ Processing$ L0B$ L1$ L0B$ L2$ L1$ L3$ L2$ L4$ L3$ $ L4$ $ Science$ $ Processing$ L0B$ L1$ L2$ L3$ L4$ $ Science$ SDS$ SDS$ Systems$ Science$ Science$ Management$$ Management$$ Archive$ Archive$ &$ Distribu+on$ &$ &$ Distribu+on$ EOSDIS$DAAC$ EOSDIS$ EOSDIS$DAAC$ Centers$ Figure 3: Today, Stewardship Model of NASA Data Systems Taking a broader view, cyber- infrastructures, machine learning and other intelligent algorithms (analytics), statistics, and visualization should be brought together as integrated architectural solutions that can be scaled to meet NASA s data challenges in enabling missions and science. The Figure 4 below shows a shift from data generation and stewardship of archives to enabling data analysis through an integrated cyber- infrastructure where data, algorithms, computation, and visualization are brought together across a highly distributed data environment to quickly stand up new analytic capabilities for different stakeholders, measurements, science questions, and applications. This requires new thinking as follows: 1. New architectural approaches across the entire science mission and data lifecycle in order to increase the science yield by scaling and improving the integration of each of the various components of the lifecycle. 2. Increasing capability onboard to support data reduction, autonomous mission (re- )planning, etc., as part of managing bandwidth capabilities. 3. Integrated analytics as part of the real- time mission pipeline. 4. Ability to quickly construct new analytic centers that can unify archives, computing capabilities, and software services to bring heterogeneous data sets as well as models/simulations together for analysis.

9 5. Systems that can react to increased velocity of data across the lifecycle with new technology approaches for data triage and capture. 6. Interoperability with other agencies including capture of data from their instruments and integration with their ground systems, archives, and analytic capabilities. 7. Quantitative uncertainty management at all stages of the data lifecycle in order to provide uncertainties required for scientific inference.. On$Board$ Processing$ $ On$Demand$ Algorithms$ Science$Teams$ TDRS% Netwo rk% Mission$ Opera+on Mission$ Opera+ons$ Satellite$ Instrument$ Science$ Manage$ Systems$ Data Science Infrastructure (Data, Algorithms, Machines) Research$ Acquisi+on$ and$$ Command$ Data Capture! Instrume Instrume nt$ Opera+on Instrument$ nt$ Opera+ons$ L0A$$ s$ L0A$$ Processing$ L0A$$ Processin g$ EDOS/ EDOS/ g$ Ground$ GDS$ EDOS/ Systems$ GDS$ Airborne$ Science$ Manage$ NASA$ Archives$ Science$ Manage$ Other%Data% Other%Data% Other$ Systems%(e.g.% Systems%(e.g.% Systems$ NOAA)% (InDSitu,$Other$ NOAA)% Agency,$etc)$ Figure 4. Future, NASA Mission-Science Data Ecosystem Analy)cs$ Centers$ Data Analysis! Applica+ons$ Decision$ Support$ V. Proposed NASA Data and Computational Science Technology Areas As discussed above, as the data- intensive nature of NASA science and exploration missions increase, there is a greater need to consider the data lifecycle from the point of collection all the way to extracted understanding of the data in order to support scalability and full utilization of the data. Furthermore, missions may have no choice but to require that data reduction and intelligent triage be performed across the data lifecycle to identify which data are to be captured and archived. In addition to considerations of the data and software, it is critical that common information models be developed and defined in order to ensure that consistent definitions of the data are applied (what is definitions of the

10 data?. In particular, a rigorous, probabilistic definition of uncertainty should be adopted to cover all NASA missions and data so that uncertainty is defined in the same way for all data that are to be used in scientific studies. This practice ensures that the data is reliably managed, is supportive of discovery, and is fully utilized. It is important to point out that data across this entire lifecycle view should not be considered data at rest, but rather data that is discoverable, accessible, and utilizable to update plans and inform other decisions, support local operations, and of course enable science. A well- architected data system from onboard data capture to ground- based operations through data analysis must be in place to support all of these objectives. Critical areas of the lifecycle (with considerations at each stage) include: a) Data Generation: Perform original processing at the sensor/instrument b) Data Triage: Make choices at the collection point about which data to keep c) Data Compression: Maximize information throughput against available bandwidth d) Data Transport: Improve resource efficiencies to enable moving the most data e) Data Processing: Increasing computation availability f) Data Archiving: Scaling the capture, management and distribution of data g) Data Visualization: Develop and apply visualization techniques to enable data exploration and insights h) Data Analytics: Create services to integrate the analysis of massive, distributed, heterogeneous data, and propagate uncertainties. In addition, there are relevant technologies and architectural approaches that are inherently cross cutting in that they apply equally to several Earth Science areas of investigation. This perspective is particularly important for architecture, which can enable how various technologies can be integrated to construct a general data- intensive approach for the NASA Earth Science enterprise. Table 1: Proposed AIST Technology Areas Technology Name Big Data Architecture Earth Science Remote Sensing Big Data Information Models and Semantics Onboard data science methods for data triage Onboard data science methods for data reduction Data Lifecycle Area (s) Cross-Cutting Cross-Cutting Data triage Data Compression Definition of a scalable data big data lifecycle architecture for earth observing systems identifying how Big Data can scale from onboard computing to data analysis to increase science yield. Advanced semantic technologies for defining, deriving, and integrating heterogeneous ontologies and information models as applied across the entire data lifecycle (onboard, ground-based operations, archives, analysis) Onboard data science methods for real-time event detection, and planning. Onboard data science methods for data reduction.

11 Massive Data Movement Technologies Real-time ground-based data science methods Open source data processing frameworks Reusable data science methodologies for missions and science Data Transport Data Processing Data Processing (1) Data Processing; (2) Data Analytics Massive data movement technologies for ground-based networks from operations through analysis Real-time ground-based data science methods for data reduction and real-time event detection for massive data streams as part of the data lifecycle architecture. Open source data processing and workflow frameworks that can massively scale to computational infrastructures (HPC, public cloud, etc.) handling large data streams, products, including near-real time constraints, as part of the data lifecycle architecture. Development of reusable data science methodologies for analysis of data on the ground as part of the data lifecycle architecture. This includes on-demand data analytics for massive data repositories. Federated data access Data Archives Federation of data access from distributed repositories as part of the data lifecycle architecture, moving towards on-demand distributed data analytics Massive Data Distribution Data Archives Massive data distribution for large-scale repositories and archives including methods for data reduction, computation, etc., as integrated, on-demand data analytics. Intelligent search and mining (1) Data Archives; (2) Data Analytics Provide methods for intelligent search and mining of massive data. This may include integration of on-demand analytics to perform deep searches. Visualization of massive data sets Visualization Visualization of massive data sets including data reduction methods that are driven by domain. On-demand distributed data analytics Data Analytics On-demand data analytics that can integrate data from archives, repositories, etc., applying data science methods (data reduction, fusion, feature detection, etc.) provided through a computational infrastructure Distributed data analytics Data Analytics Analysis of data across distributed archives to support Earth system science Uncertainty Quantification; Measurement Science Open source data management/science frameworks Computational Infrastructures Data Analytics (1) Data Archives; (2) Data Analytics (1) Data Processing; (2) Data Analytics Quantification and management of uncertainty through all data processing steps, and subsequently through analytical algorithms, as part of a measurement science strategy for data fusion and data science Open source data management/science frameworks that can massively scale to handle and manage large data streams, products, including near-real time constraints, as part of the data lifecycle architecture, for archiving and analytics as part of a big data cyber-infrastructure. Computational Infrastructures to scale data analytics using HPC and public cloud. This includes on-demand massive HPC and storage for integration to drive analytics. Figure 5 shows the mapping of these technologies to the Big Data Analytics architecture concept. A key element, as mentioned, is the integration of the various technologies based on a data- driven architecture view.

12 Cross9Cu;ng% Onboard% Data%Triage% Data%ReducBon% On$Board$ Processing$ TDRS% Network% Satellite$ Instrument$ Science$ Systems$ Manage$ Mission$ Mission$ Opera+on Opera+ons$ s$ Acquisi+on$ and$$ Command$ $ On$Demand$ Algorithms$ NASA$ Science$ Archives$ Manage$ Cyberinfrastructure% Real9Time%Data%Triage% Reusable%Data%Science%Methods% On9demand%workflows,%computaBon% Integrated%Cyberinfrastrcture% Data%System%Architectures% InformaBon%Architectures% Science$Teams$ Research$ Data Science Infrastructure (Data, Algorithms, Machines) Airborne$ Science$ Manage$ Instrume Instrume nt$ nt$ Instrument$ Opera+on Opera+on Opera+ons$ s$ s$ L0A$$ L0A$$ L0A$$ Processing$ Processin Processin g$ g$ Ground$ EDOS/ Systems$ EDOS/ GDS$ GDS$ Ground%System% Analy)cs$ Centers$ Other%Data% Other$$ Other%Data% Systems%(e.g.% Systems$$ Systems%(e.g.% NOAA)% (InDSitu,$Other$ NOAA)% Agency,$etc)$ Open%Source%Data%Management% %%%%%%%%%/Processing%Frameworks% Data%Movement% Federated%Data%Access% Scalable%ComputaBon%and%Storage% Applica+ons$ Decision$ Support$ Data%AnalyBcs%and%Viz% Distributed%Data%AnalyBcs% On9demand%computaBon% Intelligent%search%and%mining% Uncertainty%QuanBficaBon% VisualizaBon%of%massive%data%sets% Figure 5: Architecture-Technology Mapping VIII. Benchmarking Several U.S. agencies have initiated efforts in Big Data. This includes technology investments, new initiatives and new organizations. These efforts are captured in the appendix and are summarized below. Agency National Institutes of Health (NIH) National Science Foundation (NSF) Big Data Overview and Strategy The NIH has initiated a new program in Data Science and appointed an Associate Director to lead the effort. They are establishing an NIH Commons (computation, software, standards, etc.) through its Big Data to Knowledge (BD2K) initiative. The NIH commons will provide capabilities to various NIH institutes who support directed research efforts. Efforts focus on enabling data management and big data analytics capabilities. The NSF has several initiatives coordinating through the Office of Cyberinfrastructure (OCI). OCI coordinates with various disciplines within the NSF including the EarthCube program that seeks to build a national Geosciences Cyberinfrastructure. Goals of the NSF include: 1) Derive knowledge from data; 2) Develop new cyber-infrastructures to manage, curate and serve data; 3) Develop new approaches for education and workforce development; and 4) Enable new types of inter-disciplinary collaboration, community building

13 DARPA NOAA Department of Energy (DOE) USGS National Institutes of Standards and Technology (NIST) DARPA has several programs in Big Data including the XDATA and Memex Programs that are developing data science frameworks for big data analytics and mechanisms to explore deep searching of the Internet. DARPA is working to explore the use of open source technologies and their application to these programs. NOAA is working to explore commercial opportunities to build cyber-infrastructures. This includes the use of cloudbased computing capabilities and support to scale data management and computation. NOAA is also participating in the Big Earth Data Initiative (BEDI) project. DOE has been exploring programs in extreme scale science, particularly as it relates to high performance computing (HPC). The goal is to address the combined challenges of Big Data and Big Compute to develop a exascale computing environment for simulation and data analysis at scale cutting across various disciplines in energy, biology and climate. USGS is exploring its role in big data, focusing on data capture and integration, as well as sharing and leveraging HPC infrastructure across the agency. Programs such as EROS are exploring new architectural approaches to scale for the future. NIST has established a program in Data Science focusing on the development of architectures, use cases, standards and interoperability. They are also focused on areas including measurement foundations/principles to increase the accuracy of derived inferences from massive data. Science in broadest terms unifies the data science technology needs of NASA and other agencies. Beyond NASA science disciplines such as climate science, other disciplines such as biological science are declaring a direct need for new technologies to support science data analysis (often captured as data- intensive science or data- driven science ). This should not be surprising, given the vast quantities of data being produced in such areas as genomics and proteomics. Data science methods that automate the extraction, classification, reduction and discovery from massive data sets will have direct value across several science disciplines and agencies. International agencies, such as the European Space Agency (ESA), also have initiated several efforts in Big Data. In 2013 and 2014, ESA held a workshop on Big Data for Space 3 covering several of the technologies that are described in this report. Results of the recent conference highlight movement towards the big data lifecycle, enabling integrated analytics, integrating high- performance computing (HPC) with data infrastructures, new techniques for machine learning, and a shift towards new cyber- infrastructures. International groups such as the International Virtual Observatory Alliance (IVOA), International Planetary Data Alliance (IPDA), and the Global Organization for Earth System Science Portals (GO- ESSP) have developed efforts to share data and technology across international boundaries in order to support world- wide science data analysis. IX. Ten Year Roadmap of Capabilities What is this table? It s not explained. Today Near-Term (2-5 Years) Far Term (5-10 Years) Highly curated data repositories Scalable data science framework Onboard data science and reduction methods Access and search methods for distributed Integrated data analytics Distributed data analytics and computation 3

14 repositories Core data standards Data provenance for reproducibility Integrated uncertainty analysis Machine Learning Techniques Is this about machine learning? Is this about machine learning? Is this about machine learning? Unified architecture for data exploration and analysis On-demand data science methods integrated with data repositories Data visualization methods Scalable computing infrastructure Virtual observational missions Virtual, Immersive Visualization Environments for Massive Data Disruptive Approaches Leveraging Computational Science (5-10 Years) Far-Term Disruptive Capabilities Onboard data science and reduction methods Distributed data analytics and computation Integrated uncertainty analysis Virtual observational missions Virtual, Immersive Visualization Environments for Massive Data Increase onboard autonomy, planning and data processing as close to the instrument as possible as part of a broader big data strategy to increase science return. Shift to a data analytics ecosystem to enable analysis of distributed data from multiple instruments (ground-based, airborne, satellite) as well as comparison against climate models. Develop an architectural approach that manages uncertainly levels as data and computational demands change. Shift towards a data-driven approach for missions where new instruments integrate into an overall virtual infrastructure; support planning of new science goals and observations from a combination of instruments and archival data. Enable immersive environments for scientific analysis as well as outreach using observational data from NASA missions. I would like to see an additional role in this table devoted to the interaction between the architecture and the (statistical) methods that define analytical algorithms. New statistical methods optimized for a given architecture; what architecture is best given a set of target analyses; trade- off between uncertainty and cost, etc. Technology Prioritization Technology Name Priority Applicable Far-Term Capability Big Data Architecture for Earth Science Remote Sensing Highest Cross Cutting Big Data Information Models and Semantics Medium Distributed data analytics and computation Onboard data science methods for data triage Medium Onboard data science and reduction Onboard data science methods for data reduction Medium Onboard data science and reduction Massive Data Movement Technologies Medium Distributed data analytics and computation Real-time ground-based data science methods Highest Virtual observational missions Open source data processing frameworks Highest Distributed data analytics and computation; Virtual observational missions; Virtual, immersive visualization environments for massive data Reusable data science methodologies for missions and science Highest Virtual observational missions; Distributed data analytics and computation Federated data access Medium Distributed data analytics and computation Massive Data Distribution Medium Distributed data analytics and computation

15 Intelligent search and mining Medium Distributed data analytics and computation Visualization of massive data sets Medium Virtual, immersive visualization environments for massive data On-demand distributed data analytics Highest Distributed data analytics and computation Distributed data analytics Highest Distributed data analytics and computation Uncertainty Quantification; Measurement Science Medium I thing this should be Integrated uncertainty analysis Highest Open source data management/science frameworks Medium Distributed data analytics and computation; Virtual observational missions; Virtual, immersive visualization environments for massive data Computational Infrastructures Highest Distributed data analytics and computation; Virtual observational missions; Virtual, immersive visualization environments for massive data X. Overall Recommendations NASA data technology needs to shift from ad hoc investments across the mission and science data lifecycle to an integrated architecture where technology investments fit into a broader capability and vision to enable Earth Science. The NASA data ecosystem needs to shift from a stewardship model to a data- driven discovery model where both data management and data discovery are enabled through a systematic data architecture, computational infrastructure and lifecycle model. Data architectures should be modeled and assessed overall to plan for technology capabilities and improvements to ensure scalability, and to address performance, cost, and uncertainty management goals. Data discovery methods should be applied across the entire data lifecycle to support scalability and discovery at each point, from onboard computing, to data processing and archiving, to distributed data analytics. Architectures should enable flexible tradeoffs of where to and how to compute, to include the improved integration of HPC and data infrastructures. New capabilities should ensure reproducibility of derived scientific results. Computational and data science should play an important role in planning new missions including identification of how data, algorithms, and computing can be applied and integrated to improve overall data discovery. References [1] 2014 NASA Office of the Chief Technologist Roadmap (under development) [2] National Research Council, Frontiers in Massive Data Analysis, 2013.

16 [3] 2011 PCAST Report on Information Technology. [4] American Geophysical Union, Trends in Earth and Space Science Appendix A. NASA Mission/Science Data Lifecycle As the data- intensive nature of NASA science and exploration missions increases, there is an increasing need to consider the data lifecycle from the point of collection all the way to the application and use of the data. Considerations across the entire lifecycle need to be made in order to support scalability and use of the data. Furthermore, missions may require that data reduction and intelligent triage be done on the data itself across the lifecycle to identify which data should be captured and archived. In addition to considerations of the data and software, it is critical that common information models be developed and defined in order to ensure consistent definitions of the data are applied so that the data itself can be accurately managed, discovered and used. It is important to point out that data across this entire lifecycle should not be considered data at rest, but rather data that should be discoverable, accessible, and usable to update plans, support local operations, and enable science. As a result, a well- architected data system from on- board data capture through ground- based operations and data analysis must be in place to enabling scalability at multiple points across that lifecycle. Critical areas of the lifecycle (with considerations at each stage) include: a) Data Generation: Performing original processing at the sensor/instrument b) Data Triage: Make choices at the collection point about which data to keep c) Data Compression: Maximize information throughput against available bandwidth d) Data Transport: Improve resource efficiencies to enable moving the most data e) Data Processing: Increasing computation availability f) Data Archiving: Scaling the capture, management and distribution of data g) Visualization: Develop and apply analysis techniques to enable data understanding from visualization of massive data h) Data Analytics: Create analytics services to integrate massive, distributed, heterogeneous data Appendix B. ESTO/AIST Proposed Big Data Technology Thrust Areas. Technology Name Big Data Architecture for Earth Science Remote Sensing Big Data Information Models and Semantics Onboard data science methods for data triage Onboard data science methods for data Data Lifecycle Area (s) Cross-Cutting Cross-Cutting Data triage Data Compression Definition of a scalable data big data lifecyce architecture for earth observing systems identifying how Big Data can scale from onboard computing to data analysis to increase science yield. Advanced semantic technologies for defining, deriving, and integrating heterogeneous ontologies and information models as applied across the entire data lifecycle (onboard, ground-based operations, archives, analysis) Onboard data science methods for real-time event detection, and planning. Onboard data science methods for data reduction.

17 reduction Massive Data Movement Technologies Real-time ground-based data science methods Open source data processing frameworks Real-time ground-based data science methods Reusable data science methodologies for missions and science Data Transport Data Processing Data Processing Data Processing (1) Data Processing; (2) Data Analytics Massive data movement technologies for ground-based networks from operations through analysis Real-time ground-based data science methods for data reduction and real-time event detection for massive data streams as part of the data lifecycle architecture. Open source data processing and workflow frameworks that can massively scale to computational infrastructures (HPC, public cloud, etc) handling large data streams, products, including near-real time constraints, as part of the data lifecycle architecture. Real-time ground-based data science methods for data reduction and real-time event detection for massive data streams as part of the data lifecycle architecture. Duplicated? Development of reusable data science methodologies for analysis of data on the ground as part of the data lifecycle architecture. This includes on-demand data analytics for massive data repositories. Federated data access Data Archives Federation of data access from distributed repositories as part of the data lifecycle architecture, moving towards on-demand distributed data analytics Massive Data Distribution Data Archives Massive data distribution for large-scale repositories and archives including methods for data reduction, computation, etc, as integrated, on-demand data analytics. Intelligent search and mining Visualization of massive data sets On-demand distributed data analytics (1) Data Archives; (2) Data Analytics Visualization Data Analytics Provide methods for intelligent search and mining of massive data. This may include integration of on-demand analytics to perform deep searches. Visualization of massive data sets incuding data reduction methods that are driven by domain. On-demand data analytics that can integrate data from archives, repositories, etc, applying data science methods (data reduction, fusion, feature detection, etc..) provided through a computational infrastructure Distributed data analytics Data Analytics Analysis of data across distributed archives to support Earth system science including application of novel machine learning, statistical methods (e.g., data fusion) and other computational capabilities. Uncertainty Quantification; Measurement Science Open source data management/science frameworks Computational Infrastructures Data Analytics (1) Data Archives; (2) Data Analytics (1) Data Processing; (2) Data Analytics Management of uncertainty in all phases of the data life cycle and analysis as part of a measurement science strategy for data fusion and data science Open source data management/science frameworks that can massively scale to handle and manage large data streams, products, including near-real time constraints, as part of the data lifecycle architecture, for archiving and analytics as part of a big data cyberinfrastructure. Computational Infrastructures to scale data analytics using HPC and public cloud. This includes on-demand massive HPC and storage for integration to drive analytics. Appendix C. Mapping 2014 OCT TA- 11 to Proposed ESTO/AIST Big Data Technology Thrust Areas. The 2014 NASA OCT TA- 11 Roadmap team identified the following areas: Flight Computing: Includes technologies to support greater computation and data management at the point of collection onboard. In some cases, data reduction at the point of collection through intelligent triage methods may be required. Flight computing technologies include ultra- reliable, radiation- hardened platforms, which, until recently, have been extremely costly and limited in performance. Ground Computing: Includes exascale supercomputing and data storage, as well as quantum, cognitive, and other types of advanced computing, for Big Data analysis

18 and high- fidelity physics- based simulations for Earth and space science, and aerospace research and engineering. Science, Engineering, and Mission Data Lifecycle: As the data- intensive nature of NASA science and exploration missions increases, there is an increasing need to consider the data lifecycle from the point of collection all the way to the application and use of the data. Intelligent Data Understanding: Intelligent data understanding (IDU) refers to the capability to automatically mine and analyze large datasets that are large, noisy, and of varying modalities (discrete, continuous, text, graph, etc.), in order to extract or discover information that can be used for further analysis or in decision making, on the ground or onboard. It is closely coupled to the capability to detect and respond to interesting events and/or to generate alerts. Semantic Technologies: Technologies that enable data understanding, analysis, and automated consulting and operations. Each of these areas has cross- cutting challenges which are relevant to AIST as follows. Flight Computing TA Technology Name High Performance Flight Software Enables on-board, high performance autonomy and data processing Applicable Technology Area Onboard data science methods for data reduction, real-time event detection, and planning Ground Computing TA Technology Name Exascale Supercomputer Automated Exascale Software Development Toolset Exascale Supercomputing File System Public Cloud Supercomputer Provides peak computational capability of 1 exaflop (10^18 floating point operations per second) for exascale performance of NASA computations, with excellent energy efficiency and reliability, to support NASA s exponentially growing high end computational needs. Provides automated, exascale application performance monitoring, analysis, tuning, and scaling. Provides online data storage capacity of 1 exabyte, enabling data storage for exascale M&S and data analysis, with sufficient performance and reliability to maintain productivity for a broad array of NASA applications. Provides additional resources for NASA supercomputer users, such as for mission-critical computing in an emergency. Applicable Technology Area Computational infrastructures to scale data analytics Computational infrastructures to scale data analytics Computational infrastructures to scale data analytics Computational infrastructures to scale data analytics Mission, Science and Engineering Data Lifecycle

19 TA Technology Name Reference Information System Architecture Frameworks Distributed Information Architecture Frameworks Onboard Data Capture and Triage Methodologies Real-time Data Triage and Data Reduction Methodologies Scalable Data Processing Frameworks Massive Engineering and Science Data Analysis Methodologies Remote Data Access Framework Provide reference architectures for the end-to-end science and engineering data lifecycle. Provide reference information architectures to define data across the end-to-end engineering and science data lifecycle Apply novel machine learning capabilities on board to support data reduction, model-based compression and triage of massive data sets Apply novel machine learning capabilities in ground data processing systems to support data reduction and triage of massive data sets Provide scalable software processing frameworks for processing scientific and engineering data sets. Provide scalable methodologies for analysis of massive data. Provides access to and sharing of distributed data sources in a secure environment. Applicable Technology Area Definition of a scalable data big data lifecyce architecture for earth observing systems identifying how Big Data can scale from onboard computing to data analysis to increase science yield. Advanced semantic technologies for auto generating ontologies and information models from existing large, distributed, existing data. Onboard data science methods for data reduction, real-time event detection, and planning. Real-time ground-based data science methods for data reduction and real-time event detection for massive data streams as part of the data lifecycle architecture. Open source data processing frameworks that can massively scale to handle large data streams, products, including near-real time constraints, as part of the data lifecycle architecture. Development of reusable data science methodologies for analysis of data on the ground as part of the data lifecycle architecture. This includes on-demand data analytics for massive data repositories. Federation of data access from distributed repositories as part of the data lifecycle architecture, moving towards on-demand distributed data analytics Massive Data Movement Services Large-Scale Data Dissemination Environments Toolset for Massive Model Data On Demand Data Analytics Develop new technologies for the movement of massive, multi-petabyte data over the network. Enable scaling data infrastructures (software, computation, networks, etc) that are required to support large-scale data dissemination Makes data/information transparent, scalable and usable when infusing multiple large and diverse datasets in complex models. Provides analytics service coupled with computational infrastructures. Data movement (observations, climate model output, etc) across the data lifecycle is critical to scaling for Big Data. Develop capabilities to improve movement of data between points of the data lifecycle from networks to parallel methods for data transfer. Massive data distribution for large-scale repostiroeis and archives including methods for data reduction, computation, etc, as integrated, ondemand data analytics. Scale tools that provide on-demand data analytics that bring together distributed models, observational data, and provides a computational infrastructure to enable analysis and intercomparison. On-demand data analytics that can integrate data from archives, repositories, etc, applying data science methods (data reduction, fusion, feature detection, etc..) provided through a computational infrastructure Intelligent Data Understanding TA 2.1 Technology Name Intelligent Data Collection and Prioritization Toolset Provides a means to reduce the size of the data (e.g., removing clouds), remove corrupted data and/or collect complementary data for value- Applicable Technology Area Development of reusable data science methodologies for analysis of data on the ground as part of the data lifecycle architecture. In

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research

More information

Data Centric Systems (DCS)

Data Centric Systems (DCS) Data Centric Systems (DCS) Architecture and Solutions for High Performance Computing, Big Data and High Performance Analytics High Performance Computing with Data Centric Systems 1 Data Centric Systems

More information

Big Data R&D Initiative

Big Data R&D Initiative Big Data R&D Initiative Howard Wactlar CISE Directorate National Science Foundation NIST Big Data Meeting June, 2012 Image Credit: Exploratorium. The Landscape: Smart Sensing, Reasoning and Decision Environment

More information

Data Driven Discovery In the Social, Behavioral, and Economic Sciences

Data Driven Discovery In the Social, Behavioral, and Economic Sciences Data Driven Discovery In the Social, Behavioral, and Economic Sciences Simon Appleford, Marshall Scott Poole, Kevin Franklin, Peter Bajcsy, Alan B. Craig, Institute for Computing in the Humanities, Arts,

More information

ICT Perspectives on Big Data: Well Sorted Materials

ICT Perspectives on Big Data: Well Sorted Materials ICT Perspectives on Big Data: Well Sorted Materials 3 March 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations in

More information

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing

NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing NSF Workshop: High Priority Research Areas on Integrated Sensor, Control and Platform Modeling for Smart Manufacturing Purpose of the Workshop In October 2014, the President s Council of Advisors on Science

More information

Information and Communications Technology Strategy 2014-2017

Information and Communications Technology Strategy 2014-2017 Contents 1 Background ICT in Geoscience Australia... 2 1.1 Introduction... 2 1.2 Purpose... 2 1.3 Geoscience Australia and the Role of ICT... 2 1.4 Stakeholders... 4 2 Strategic drivers, vision and principles...

More information

NASA s Big Data Challenges in Climate Science

NASA s Big Data Challenges in Climate Science NASA s Big Data Challenges in Climate Science Tsengdar Lee, Ph.D. High-end Computing Program Manager NASA Headquarters Presented at IEEE Big Data 2014 Workshop October 29, 2014 1 2 7-km GEOS-5 Nature Run

More information

EL Program: Smart Manufacturing Systems Design and Analysis

EL Program: Smart Manufacturing Systems Design and Analysis EL Program: Smart Manufacturing Systems Design and Analysis Program Manager: Dr. Sudarsan Rachuri Associate Program Manager: K C Morris Strategic Goal: Smart Manufacturing, Construction, and Cyber-Physical

More information

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons

NIH Commons Overview, Framework & Pilots - Version 1. The NIH Commons The NIH Commons Summary The Commons is a shared virtual space where scientists can work with the digital objects of biomedical research, i.e. it is a system that will allow investigators to find, manage,

More information

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration

Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration Overcoming the Technical and Policy Constraints That Limit Large-Scale Data Integration Revised Proposal from The National Academies Summary An NRC-appointed committee will plan and organize a cross-disciplinary

More information

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO

The Fusion of Supercomputing and Big Data. Peter Ungaro President & CEO The Fusion of Supercomputing and Big Data Peter Ungaro President & CEO The Supercomputing Company Supercomputing Big Data Because some great things never change One other thing that hasn t changed. Cray

More information

National Big Data R&D Initiative

National Big Data R&D Initiative National Big Data R&D Initiative Suzi Iacono, PhD National Science Foundation Co-chair NITRD Big Data Senior Steering Group for CASC Spring Meeting April 23, 2014 Why is Big Data Important? Transformative

More information

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar rajasekar@unc.edu sekar@diceresearch.org

Integrating Data Life Cycle into Mission Life Cycle. Arcot Rajasekar rajasekar@unc.edu sekar@diceresearch.org Integrating Data Life Cycle into Mission Life Cycle Arcot Rajasekar rajasekar@unc.edu sekar@diceresearch.org 1 Technology of Interest Provide an end-to-end capability for Exa-scale data orchestration From

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering, and Education (CIF21) investment

More information

Big Workflow: More than Just Intelligent Workload Management for Big Data

Big Workflow: More than Just Intelligent Workload Management for Big Data Big Workflow: More than Just Intelligent Workload Management for Big Data Michael Feldman White Paper February 2014 EXECUTIVE SUMMARY Big data applications represent a fast-growing category of high-value

More information

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material,

More information

Data Intensive Science and Computing

Data Intensive Science and Computing DEFENSE LABORATORIES ACADEMIA TRANSFORMATIVE SCIENCE Efficient, effective and agile research system INDUSTRY Data Intensive Science and Computing Advanced Computing & Computational Sciences Division University

More information

Oracle Real Time Decisions

Oracle Real Time Decisions A Product Review James Taylor CEO CONTENTS Introducing Decision Management Systems Oracle Real Time Decisions Product Architecture Key Features Availability Conclusion Oracle Real Time Decisions (RTD)

More information

Manjula Ambur NASA Langley Research Center April 2014

Manjula Ambur NASA Langley Research Center April 2014 Manjula Ambur NASA Langley Research Center April 2014 Outline What is Big Data Vision and Roadmap Key Capabilities Impetus for Watson Technologies Content Analytics Use Potential use cases What is Big

More information

Information Visualization WS 2013/14 11 Visual Analytics

Information Visualization WS 2013/14 11 Visual Analytics 1 11.1 Definitions and Motivation Lot of research and papers in this emerging field: Visual Analytics: Scope and Challenges of Keim et al. Illuminating the path of Thomas and Cook 2 11.1 Definitions and

More information

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43%

CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% CYBERINFRASTRUCTURE FRAMEWORK FOR 21 ST CENTURY SCIENCE, ENGINEERING, AND EDUCATION (CIF21) $100,070,000 -$32,350,000 / -24.43% Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,

More information

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials

EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials EPSRC Cross-SAT Big Data Workshop: Well Sorted Materials 5th August 2015 Contents Introduction 1 Dendrogram 2 Tree Map 3 Heat Map 4 Raw Group Data 5 For an online, interactive version of the visualisations

More information

Databases & Data Infrastructure. Kerstin Lehnert

Databases & Data Infrastructure. Kerstin Lehnert + Databases & Data Infrastructure Kerstin Lehnert + Access to Data is Needed 2 to allow verification of research results to allow re-use of data + The road to reuse is perilous (1) 3 Accessibility Discovery,

More information

Big Data in the context of Preservation and Value Adding

Big Data in the context of Preservation and Value Adding Big Data in the context of Preservation and Value Adding R. Leone, R. Cosac, I. Maggio, D. Iozzino ESRIN 06/11/2013 ESA UNCLASSIFIED Big Data Background ESA/ESRIN organized a 'Big Data from Space' event

More information

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED

Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory. The Nation s Premier Laboratory for Land Forces UNCLASSIFIED Dr. Raju Namburu Computational Sciences Campaign U.S. Army Research Laboratory 21 st Century Research Continuum Theory Theory embodied in computation Hypotheses tested through experiment SCIENTIFIC METHODS

More information

Data Requirements from NERSC Requirements Reviews

Data Requirements from NERSC Requirements Reviews Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community

More information

CONNECTING DATA WITH BUSINESS

CONNECTING DATA WITH BUSINESS CONNECTING DATA WITH BUSINESS Big Data and Data Science consulting Business Value through Data Knowledge Synergic Partners is a specialized Big Data, Data Science and Data Engineering consultancy firm

More information

Appendix 1 ExaRD Detailed Technical Descriptions

Appendix 1 ExaRD Detailed Technical Descriptions Appendix 1 ExaRD Detailed Technical Descriptions Contents 1 Application Foundations... 3 1.1 Co- Design... 3 1.2 Applied Mathematics... 5 1.3 Data Analytics and Visualization... 8 2 User Experiences...

More information

NASA s Intelligent Synthesis Environment Program Revolutionizing the Agency s Engineering and Science Practice

NASA s Intelligent Synthesis Environment Program Revolutionizing the Agency s Engineering and Science Practice The Voice of the Customer NASA s Intelligent Synthesis Environment Program Revolutionizing the Agency s Engineering and Science Practice An AES PAL Technical Quarterly Journal Robert D. Braun, Ph.D., Chief

More information

Architecting an Industrial Sensor Data Platform for Big Data Analytics

Architecting an Industrial Sensor Data Platform for Big Data Analytics Architecting an Industrial Sensor Data Platform for Big Data Analytics 1 Welcome For decades, organizations have been evolving best practices for IT (Information Technology) and OT (Operation Technology).

More information

Providing On-Demand Situational Awareness

Providing On-Demand Situational Awareness ITT Exelis Geospatial Intelligence Solutions Providing On-Demand Situational Awareness Use of U.S. Department of Defense (DoD) and U.S. Army imagery in this brochure does not constitute or imply DoD or

More information

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499

Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499 Core Techniques and Technologies for Advancing Big Data Science & Engineering (BIGDATA) NSF 12-499 Vasant Honavar Program Director Information & Intelligent Systems (IIS) Division Computer and Information

More information

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics

BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are

More information

Anatomy of a Decision

Anatomy of a Decision research@bluehillresearch.com @BlueHillBoston 617.624.3600 Anatomy of a Decision BI Platform vs. Tool: Choosing Birst Over Tableau for Enterprise Business Intelligence Needs What You Need To Know The demand

More information

BIG DATA Funding Opportunities

BIG DATA Funding Opportunities BIG DATA Funding Opportunities Jill Morris Morris.856@osu.edu 688-5423 Institute for Population Research The Ohio State University NSF bigdata@nsf.gov NSF Big Data Initiatives Core Techniques and Technologies

More information

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses

NITRD: National Big Data Strategic Plan. Summary of Request for Information Responses NITRD: National Big Data Strategic Plan Summary of Request for Information Responses Introduction: Demographics Summary of Responses Next generation Capabilities Data to Knowledge to Action Access to Big

More information

Exploitation of ISS scientific data

Exploitation of ISS scientific data Cooperative ISS Research data Conservation and Exploitation Exploitation of ISS scientific data Luigi Carotenuto Telespazio s.p.a. Copernicus Big Data Workshop March 13-14 2014 European Commission Brussels

More information

for Oil & Gas Industry

for Oil & Gas Industry Wipro s Upstream Storage Solution for Oil & Gas Industry 1 www.wipro.com/industryresearch TABLE OF CONTENTS Executive summary 3 Business Appreciation of Upstream Storage Challenges...4 Wipro s Upstream

More information

Workprogramme 2014-15

Workprogramme 2014-15 Workprogramme 2014-15 e-infrastructures DCH-RP final conference 22 September 2014 Wim Jansen einfrastructure DG CONNECT European Commission DEVELOPMENT AND DEPLOYMENT OF E-INFRASTRUCTURES AND SERVICES

More information

Smart Manufacturing as a Real-Time Networked Enterprise and a Market-Driven Innovation Platform

Smart Manufacturing as a Real-Time Networked Enterprise and a Market-Driven Innovation Platform Smart Manufacturing as a Real-Time Networked Enterprise and a Market-Driven Innovation Platform Jim Davis Vice Provost IT & CTO at UCLA and SMLC Board Director Technology Denise Swink CEO SMLC Role/Viewpoint

More information

SCIENCE DATA ANALYSIS ON THE CLOUD

SCIENCE DATA ANALYSIS ON THE CLOUD SCIENCE DATA ANALYSIS ON THE CLOUD ESIP Cloud Computing Cluster Thomas Huang and Phil Yang Agenda Invited speakers Petr Votava, NASA Earth Exchange (NEX): Early Observations on Community Engagement in

More information

Next-Generation Building Energy Management Systems

Next-Generation Building Energy Management Systems WHITE PAPER Next-Generation Building Energy Management Systems New Opportunities and Experiences Enabled by Intelligent Equipment Published 2Q 2015 Sponsored By Daikin Applied and Intel Casey Talon Senior

More information

The Scientific Data Mining Process

The Scientific Data Mining Process Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In

More information

Challenges and Solutions for Big Data in the Public Sector:

Challenges and Solutions for Big Data in the Public Sector: Challenges and Solutions for Big Data in the Public Sector: Digital Government Institute s Annual Big Data Conference, October 9, Washington, DC Reagan Building Dr. Brand Niemann Director and Senior Data

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite

IBM Software IBM Business Process Management Suite. Increase business agility with the IBM Business Process Management Suite IBM Software IBM Business Process Management Suite Increase business agility with the IBM Business Process Management Suite 2 Increase business agility with the IBM Business Process Management Suite We

More information

Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration

Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration Supply Chain Platform as a Service: a Cloud Perspective on Business Collaboration Guopeng Zhao 1, 2 and Zhiqi Shen 1 1 Nanyang Technological University, Singapore 639798 2 HP Labs Singapore, Singapore

More information

Transform Your Bank in Measurable Steps

Transform Your Bank in Measurable Steps Banking Transformation Framework Transform Your Bank in Measurable Steps Table of Contents 2 Establish a Platform for Transformation 3 Transform Your Business 3 Use the Reference Architecture As a Foundation

More information

2015 Top 10 Information and Communication Technologies (Technical Insights)

2015 Top 10 Information and Communication Technologies (Technical Insights) 2015 Top 10 Information and Communication Technologies (Technical Insights) Information and Communication technologies that will have the highest impact in 2015 -TI May 2015 Contents Section Slide Number

More information

CYBERINFRASTRUCTURE FRAMEWORK $143,060,000 FOR 21 ST CENTURY SCIENCE, ENGINEERING, +$14,100,000 / 10.9% AND EDUCATION (CIF21)

CYBERINFRASTRUCTURE FRAMEWORK $143,060,000 FOR 21 ST CENTURY SCIENCE, ENGINEERING, +$14,100,000 / 10.9% AND EDUCATION (CIF21) CYBERINFRASTRUCTURE FRAMEWORK $143,060,000 FOR 21 ST CENTURY SCIENCE, ENGINEERING, +$14,100,000 / 10.9% AND EDUCATION (CIF21) Overview The Cyberinfrastructure Framework for 21 st Century Science, Engineering,

More information

COGNITIVE SCIENCE AND NEUROSCIENCE

COGNITIVE SCIENCE AND NEUROSCIENCE COGNITIVE SCIENCE AND NEUROSCIENCE Overview Cognitive Science and Neuroscience is a multi-year effort that includes NSF s participation in the Administration s Brain Research through Advancing Innovative

More information

Propulsion Gas Path Health Management Task Overview. Donald L. Simon NASA Glenn Research Center

Propulsion Gas Path Health Management Task Overview. Donald L. Simon NASA Glenn Research Center Propulsion Gas Path Health Management Task Overview Donald L. Simon NASA Glenn Research Center Propulsion Controls and s Research Workshop December 8-10, 2009 Cleveland, OH www.nasa.gov 1 National Aeronautics

More information

DYNAMIC INFRASTRUCTURE Helping build a smarter planet

DYNAMIC INFRASTRUCTURE Helping build a smarter planet John Sheehy Systems Architect 18 Feb 2009 Building a smarter planet with a dynamic infrastructure DYNAMIC INFRASTRUCTURE Helping build a smarter planet 1 2009 IBM Corporation The world is smaller and flatter.

More information

Role of Analytics in Infrastructure Management

Role of Analytics in Infrastructure Management Role of Analytics in Infrastructure Management Contents Overview...3 Consolidation versus Rationalization...5 Charting a Course for Gaining an Understanding...6 Visibility into Your Storage Infrastructure...7

More information

Strategic Vision. for Stewarding the Nation s Climate Data. Our. NOAA s National Climatic Data Center

Strategic Vision. for Stewarding the Nation s Climate Data. Our. NOAA s National Climatic Data Center Strategic Vision Our for Stewarding the Nation s Climate Data NOAA s National Climatic Data Center M AY 2013 Our Strategic Vision for Stewarding the Nation s Climate Data 2 FOREWORD From the Director The

More information

USGS Community for Data Integration

USGS Community for Data Integration Community of Science: Strategies for Coordinating Integration of Data USGS Community for Data Integration Kevin T. Gallagher USGS Core Science Systems January 11, 2013 U.S. Department of the Interior U.S.

More information

Government Technology Trends to Watch in 2014: Big Data

Government Technology Trends to Watch in 2014: Big Data Government Technology Trends to Watch in 2014: Big Data OVERVIEW The federal government manages a wide variety of civilian, defense and intelligence programs and services, which both produce and require

More information

White Paper. Version 1.2 May 2015 RAID Incorporated

White Paper. Version 1.2 May 2015 RAID Incorporated White Paper Version 1.2 May 2015 RAID Incorporated Introduction The abundance of Big Data, structured, partially-structured and unstructured massive datasets, which are too large to be processed effectively

More information

Integrating a Big Data Platform into Government:

Integrating a Big Data Platform into Government: Integrating a Big Data Platform into Government: Drive Better Decisions for Policy and Program Outcomes John Haddad, Senior Director Product Marketing, Informatica Digital Government Institute s Government

More information

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens

Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens Scalable End-User Access to Big Data http://www.optique-project.eu/ HELLENIC REPUBLIC National and Kapodistrian University of Athens 1 Optique: Improving the competitiveness of European industry For many

More information

The 5G Infrastructure Public-Private Partnership

The 5G Infrastructure Public-Private Partnership The 5G Infrastructure Public-Private Partnership NetFutures 2015 5G PPP Vision 25/03/2015 19/06/2015 1 5G new service capabilities User experience continuity in challenging situations such as high mobility

More information

How To Improve The Performance Of Anatm

How To Improve The Performance Of Anatm EXPLORATORY RESEARCH IN ATM David Bowen Chief ATM 4 th May 2015 1 ATM Research in Europe HORIZON Transport Challenges smart, green and integrated transport FlightPath 2050 five challenges to aviation beyond

More information

Considering the Way Forward for Data Science and International Climate Science

Considering the Way Forward for Data Science and International Climate Science Considering the Way Forward for Data Science and International Climate Science Improving Data Mobility and Management for International Climate Science July 14-16, 2014 Boulder, CO Sara J. Graves, Ph.D.

More information

Network Mission Assurance

Network Mission Assurance Network Mission Assurance Michael F. Junod, Patrick A. Muckelbauer, PhD, Todd C. Hughes, PhD, Julius M. Etzl, and James E. Denny Lockheed Martin Advanced Technology Laboratories Camden, NJ 08102 {mjunod,pmuckelb,thughes,jetzl,jdenny}@atl.lmco.com

More information

From Distributed Computing to Distributed Artificial Intelligence

From Distributed Computing to Distributed Artificial Intelligence From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Big Data and the Fourth Paradigm The two dominant paradigms

More information

The National Consortium for Data Science (NCDS)

The National Consortium for Data Science (NCDS) The National Consortium for Data Science (NCDS) A Public-Private Partnership to Advance Data Science Ashok Krishnamurthy PhD Deputy Director, RENCI University of North Carolina, Chapel Hill What is NCDS?

More information

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8)

Mission Need Statement for the Next Generation High Performance Production Computing System Project (NERSC-8) Mission Need Statement for the Next Generation High Performance Production Computing System Project () (Non-major acquisition project) Office of Advanced Scientific Computing Research Office of Science

More information

A Vision for Research Excellence in Canada

A Vision for Research Excellence in Canada A Vision for Research Excellence in Canada Compute Canada s Submission to the Digital Research Infrastructure Strategy Consultations Contents A Vision for Research Excellence in Canada 3 Overview of Recommendations

More information

Big Data in Subsea Solutions

Big Data in Subsea Solutions Big Data in Subsea Solutions Subsea Valley Conference 2014 Telenor Arena, Fornebu, April 2-3 Roar Fjellheim, Computas AS Computas AS - Brief company profile Norwegian IT consulting company providing services

More information

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India.

Impact of Big Data in Oil & Gas Industry. Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. Impact of Big Data in Oil & Gas Industry Pranaya Sangvai Reliance Industries Limited 04 Feb 15, DEJ, Mumbai, India. New Age Information 2.92 billions Internet Users in 2014 Twitter processes 7 terabytes

More information

How Big Data is Different

How Big Data is Different FALL 2012 VOL.54 NO.1 Thomas H. Davenport, Paul Barth and Randy Bean How Big Data is Different Brought to you by Please note that gray areas reflect artwork that has been intentionally removed. The substantive

More information

MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS

MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS MEETING THE CHALLENGES OF COMPLEXITY AND SCALE FOR MANUFACTURING WORKFLOWS Michael Feldman White paper November 2014 MARKET DYNAMICS Modern manufacturing increasingly relies on advanced computing technologies

More information

Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics

Big Data Terminology - Key to Predictive Analytics Success. Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics Big Data Terminology - Key to Predictive Analytics Success Mark E. Johnson Department of Statistics University of Central Florida F2: Statistics Outline Big Data Phenomena Terminology Role Background on

More information

NIST Big Data Phase I Public Working Group

NIST Big Data Phase I Public Working Group NIST Big Data Phase I Public Working Group Reference Architecture Subgroup May 13 th, 2014 Presented by: Orit Levin Co-chair of the RA Subgroup Agenda Introduction: Why and How NIST Big Data Reference

More information

Software development for the on demand enterprise. Building your business with the IBM Software Development Platform

Software development for the on demand enterprise. Building your business with the IBM Software Development Platform Software development for the on demand enterprise Building your business with the IBM Software Development Platform An on demand business is an enterprise whose business processes integrated end-to-end

More information

Ad-Hoc Task Force on Big Data NAC Science Committee

Ad-Hoc Task Force on Big Data NAC Science Committee Ad-Hoc Task Force on Big Data NAC Science Committee Dr. Erin Smith Executive Secretary, Big Data Task Force Strategic Integration and Management Division Science Mission Directorate Task Force Membership

More information

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Create the Data Center of the Future Accelerate

More information

Big Data, Integration and Governance: Ask the Experts

Big Data, Integration and Governance: Ask the Experts Big, Integration and Governance: Ask the Experts January 29, 2013 1 The fourth dimension of Big : Veracity handling data in doubt Volume Velocity Variety Veracity* at Rest Terabytes to exabytes of existing

More information

ANALYTICS STRATEGY: creating a roadmap for success

ANALYTICS STRATEGY: creating a roadmap for success ANALYTICS STRATEGY: creating a roadmap for success Companies in the capital and commodity markets are looking at analytics for opportunities to improve revenue and cost savings. Yet, many firms are struggling

More information

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management

PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management PALANTIR CYBER An End-to-End Cyber Intelligence Platform for Analysis & Knowledge Management INTRODUCTION Traditional perimeter defense solutions fail against sophisticated adversaries who target their

More information

From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems

From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems From Data to Insight: Big Data and Analytics for Smart Manufacturing Systems Dr. Sudarsan Rachuri Program Manager Smart Manufacturing Systems Design and Analysis Systems Integration Division Engineering

More information

Symposium on the Interagency Strategic Plan for Big Data: Focus on R&D

Symposium on the Interagency Strategic Plan for Big Data: Focus on R&D Symposium on the Interagency Strategic Plan for Big Data: Focus on R&D NAS Board on Research Data and Information October 23, 2014 Big Data Senior Steering Group (BDSSG) Allen Dearry, NIH, Co-Chair Suzi

More information

MANAGING USER DATA IN A DIGITAL WORLD

MANAGING USER DATA IN A DIGITAL WORLD MANAGING USER DATA IN A DIGITAL WORLD AIRLINE INDUSTRY CHALLENGES AND SOLUTIONS WHITE PAPER OVERVIEW AND DRIVERS In today's digital economy, enterprises are exploring ways to differentiate themselves from

More information

Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery

Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery Harnessing the Potential of Data Scientists and Big Data for Scientific Discovery Ed Lazowska, University of Washington Saul Perlmu=er, UC Berkeley Yann LeCun, New York University Josh Greenberg, Alfred

More information

NITRD and Big Data. George O. Strawn NITRD

NITRD and Big Data. George O. Strawn NITRD NITRD and Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? Who is NITRD? NITRD's Big Data Research

More information

Igniting the Next Industrial Revolution

Igniting the Next Industrial Revolution Igniting the Next Industrial Revolution Defining an M2M Technology Platform for the Industrial Internet M2M Evolution Conference, 30 Jan 2014 Nikhil Chauhan Director Product Marketing, GE Software Sufficiently

More information

Data Isn't Everything

Data Isn't Everything June 17, 2015 Innovate Forward Data Isn't Everything The Challenges of Big Data, Advanced Analytics, and Advance Computation Devices for Transportation Agencies. Using Data to Support Mission, Administration,

More information

Big Data Analytics. Chances and Challenges. Volker Markl

Big Data Analytics. Chances and Challenges. Volker Markl Volker Markl Professor and Chair Database Systems and Information Management (DIMA), Technische Universität Berlin www.dima.tu-berlin.de Big Data Analytics Chances and Challenges Volker Markl DIMA BDOD

More information

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing

More information

Statistical Analysis and Visualization for Cyber Security

Statistical Analysis and Visualization for Cyber Security Statistical Analysis and Visualization for Cyber Security Joanne Wendelberger, Scott Vander Wiel Statistical Sciences Group, CCS-6 Los Alamos National Laboratory Quality and Productivity Research Conference

More information

What s the Big Deal? Big Data, Cloud & the Internet of Things. Christine Kirkpatrick San Diego Supercomputer Center, UC San Diego

What s the Big Deal? Big Data, Cloud & the Internet of Things. Christine Kirkpatrick San Diego Supercomputer Center, UC San Diego What s the Big Deal? Big Data, Cloud & the Internet of Things Christine Kirkpatrick San Diego Supercomputer Center, UC San Diego A Futurist s Near-Term View The Future Depends on Data Self-driving car

More information

ESA Earth Observation Big Data R&D Past, Present, & Future Activities

ESA Earth Observation Big Data R&D Past, Present, & Future Activities ESA Earth Observation Big Data R&D Past, Present, & Future Activities [Sveinung.Loekken, Jordi.Farres]@esa.int Ground Segment and Mission Operations Department, Earth Observation Programmes Directorate,

More information

Model, Analyze and Optimize the Supply Chain

Model, Analyze and Optimize the Supply Chain Model, Analyze and Optimize the Supply Chain Optimize networks Improve product flow Right-size inventory Simulate service Balance production Optimize routes The Leading Supply Chain Design and Analysis

More information

T a c k l i ng Big Data w i th High-Performance

T a c k l i ng Big Data w i th High-Performance Worldwide Headquarters: 211 North Union Street, Suite 105, Alexandria, VA 22314, USA P.571.296.8060 F.508.988.7881 www.idc-gi.com T a c k l i ng Big Data w i th High-Performance Computing W H I T E P A

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 Background The command over cloud computing infrastructure is increasing with the growing demands of IT infrastructure during the changed business scenario of the 21 st Century.

More information

Collaborations between Official Statistics and Academia in the Era of Big Data

Collaborations between Official Statistics and Academia in the Era of Big Data Collaborations between Official Statistics and Academia in the Era of Big Data World Statistics Day October 20-21, 2015 Budapest Vijay Nair University of Michigan Past-President of ISI vnn@umich.edu What

More information

Data-Intensive Science and Scientific Data Infrastructure

Data-Intensive Science and Scientific Data Infrastructure Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific

More information

High Performance Computing

High Performance Computing High Parallel Computing Hybrid Program Coding Heterogeneous Program Coding Heterogeneous Parallel Coding Hybrid Parallel Coding High Performance Computing Highly Proficient Coding Highly Parallelized Code

More information

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out Big Data Challenges and Success Factors Deloitte Analytics Your data, inside out Big Data refers to the set of problems and subsequent technologies developed to solve them that are hard or expensive to

More information