From Distributed Computing to Distributed Artificial Intelligence

Similar documents
(Possible) HEP Use Case for NDN. Phil DeMar; Wenji Wu NDNComm (UCLA) Sept. 28, 2015

Standards for Big Data in the Cloud

Testing the In-Memory Column Store for in-database physics analysis. Dr. Maaike Limper

BIG DATA AGGREGATOR STASINOS KONSTANTOPOULOS NCSR DEMOKRITOS, GREECE. Big Data Europe

Data Requirements from NERSC Requirements Reviews

Biomedical Informatics Applications, Big Data, & Cloud Computing

Emerging Geospatial Trends The Convergence of Technologies. Jim Steiner Vice President, Product Management

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Data-Intensive Science and Scientific Data Infrastructure

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

Hadoop. Sunday, November 25, 12

Data Centric Systems (DCS)

How To Teach Physics At The Lhc

Big Data Analytics. for the Exploitation of the CERN Accelerator Complex. Antonio Romero Marín

An Integrated CyberSecurity Approach for HEP Grids. Workshop Report.

Shoal: IaaS Cloud Cache Publisher

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Cray: Enabling Real-Time Discovery in Big Data

High Performance Computing and Big Data: The coming wave.

Six Days in the Network Security Trenches at SC14. A Cray Graph Analytics Case Study

Reference Architecture, Requirements, Gaps, Roles

Big Data Analytics. Genoveva Vargas-Solar French Council of Scientific Research, LIG & LAFMIA Labs

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Big Data Hope or Hype?

International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: Vol. 1, Issue 6, October Big Data and Hadoop

How To Handle Big Data With A Data Scientist

Big Data and Hadoop. Sreedhar C, Dr. D. Kavitha, K. Asha Rani

The Data Grid: Towards an Architecture for Distributed Management and Analysis of Large Scientific Datasets

Training for Big Data

HPC technology and future architecture

Cluster, Grid, Cloud Concepts

CS 698: Special Topics in Big Data. Chapter 2. Computing Trends for Big Data

HADOOP, a newly emerged Java-based software framework, Hadoop Distributed File System for the Grid

Complexity and Scalability in Semantic Graph Analysis Semantic Days 2013

New Jersey Big Data Alliance

Data Intensive Science and Computing

Bringing Compute to the Data Alternatives to Moving Data. Part of EUDAT s Training in the Fundamentals of Data Infrastructures

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Supercomputing and Big Data: Where are the Real Boundaries and Opportunities for Synergy?

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

MEDICAL DATA MINING. Timothy Hays, PhD. Health IT Strategy Executive Dynamics Research Corporation (DRC) December 13, 2012

Cloud Computing and Advanced Relationship Analytics

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

McAfee Global Threat Intelligence File Reputation Service. Best Practices Guide for McAfee VirusScan Enterprise Software

QLIKVIEW INTEGRATION TION WITH AMAZON REDSHIFT John Park Partner Engineering

Industry 4.0 and Big Data

Analyses on functional capabilities of BizTalk Server, Oracle BPEL Process Manger and WebSphere Process Server for applications in Grid middleware

Recent and Future Activities in HPC and Scientific Data Management Siegfried Benkner

US NSF s Scientific Software Innovation Institutes

ON DEMAND ACCESS TO BIG DATA. Peter Haase fluid Operations AG

On-demand Provisioning of Workflow Middleware and Services An Overview

Problems to store, transfer and process the Big Data 6/2/2016 GIANG TRAN - TTTGIANG2510@GMAIL.COM 1

A Big Picture for Big Data

GIS Initiative: Developing an atmospheric data model for GIS. Olga Wilhelmi (ESIG), Jennifer Boehnert (RAP/ESIG) and Terri Betancourt (RAP)

Any Threat, Anywhere, Anytime. ddn.com. DDN Whitepaper. Scalable Infrastructure to Enable the Warfighter

Performance Monitoring of the Software Frameworks for LHC Experiments

Essential Characteristics of Cloud Computing: On-Demand Self-Service Rapid Elasticity Location Independence Resource Pooling Measured Service

Concept and Project Objectives

NextGen Infrastructure for Big DATA Analytics.

Analyzing Big Data with AWS

Cloud Computing and Software Agents: Towards Cloud Intelligent Services

Customer Site Requirements for incontact Workforce Optimization

Big Data a threat or a chance?

How To Build A Cloud Based Intelligence System

Introduction to Data Mining

Definition of Computers. INTRODUCTION to COMPUTERS. Historical Development ENIAC

Enterprise Energy Management with JouleX and Cisco EnergyWise

DATA. Big Data Operational Excellence Ahead in the Cloud. Detect Patterns with Mass Correlation. Limit Surprise with Smart Data

What happens when Big Data and Master Data come together?

Mobile Cloud Computing: Paradigms and Challenges 移 动 云 计 算 : 模 式 与 挑 战

Revealing Trends and Insights in Online Hiring Market Using Linking Open Data Cloud: Active Hiring a Use Case Study

From Big Data to Smart Data Thomas Hahn

Big Data Processing in Cloud Environments

The Tonnabytes Big Data Challenge: Transforming Science and Education. Kirk Borne George Mason University

The Intersection of Big Data and Analytics. Philip Russom TDWI Research Director for Data Management May 5, 2011

Transcription:

From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos

Big Data and the Fourth Paradigm The two dominant paradigms for scientific discovery: Theory Experiments large-scale computer simulations emerging as the third paradigm in the 20th century The fourth paradigm, which seeks to exploit information buried in massive datasets, has emerged as an essential complement to the three existing paradigms The complexity and challenge of the fourth paradigm arises from the increasing rate, heterogeneity, and volume of data generation. Large Hadron Collider (LHC) currently generate tens of petabytes of reduced data per year observational and simulation data in the climate domain are expected to reach exabytes by 2021 Light source experiments are expected to generate hundreds of terabytes per day

LHC Data Challenge Starting from this event (particle collision) Data DataCollection Collection Data DataStorage Storage Data Data Processing Processing You are looking for this signature Selectivity: 1 in 1013 Like looking for 1 person in a thousand world populations! Or for a needle in 20 million haystacks!

Amount of data from the LHC detectors Balloon (30 Km) CMS CD stack with 1 year LHC data! (~ 20 Km) ATLAS ~15 PetaBytes / year ~1010 events / year ~103 batch and interactive users ~ 20.000.000 CD / year Concorde (15 Km) Mt. Blanc (4.8 Km) LHCb

Grid / Cloud Technologies

Definition of Grid systems Collection of geographically distributed heterogeneous resources Most generalized, globalized form of distributed computing An infrastructure that enables flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources Ian Foster and Carl Kesselman

Information about sites: http://goc.grid.sinica.edu.tw/gstat/

Exascale Challenges Current Petascale systems is unlike to scale to exascale environments, due to the disparity among computational power, machine memory and I/O bandwidth The exascale simulations will not be able to write enough data out to permanent storage to ensure a reliable analysis Current Grid infrastructures are not user friendly and are far from efficient, for small groups and individuals Grid infrastructures, when implemented by HEP VOs, tends to be centralized, from the data point of view. Users demand mobility, efficient data sharing and in the same time autonomy

IKAROS Platform Data/Metadata-Collector Ikaros-EG plugin job creation Content provider + mobile devices mobile-grid + WI-FI, 3G android.apk android.apk android.apk android.apk android.apk android.apk 20 android.apk

Elastic Transfer (et) Create your Personal Storage Cloud Directly, transfer your files from your workstation to another PC Third-party Data transfer Flexible data & storage sharing You are on the road, behind fifteen firewalls, and want to share some web application you're developing locally, or just share a set of files with someone real quick (Reverse HTTP) http://www.et-js.org/

Nice! So, now can I... Discover whether corruption in politics is a location-based issue? Check what is the best route to a house by the sea, with low rent? Find the ideal husband/wife? Determine how to improve my economy, relying on agriculture?

Well, you kind of can... If you can read through petabytes of information can determine what is useful and what is not contact 30 different organizations hosting the data have experts combining the data visualize them in a meaningful way I hope you got the point by now...

So, did we fail?

Bits and pieces If you had individual people producing simple statements Decipherable by machines People need food Souvlaki is food Souvlaki contains meat <people, need, food> <souvlaki, is, food> <souvlaki, contains, meat> Could computers combine knowledge to be intelligent? <?,need,meat>: Who needs meat? <souvlaki,contains,?>: What do I need to make a souvlaki?

Distributed Artificial Intelligence to the rescue! You start with something like this RDF graph:

You end up with something like...

How does it work? You use MACHINES (agents will do fine...)! You query LOTS of resources... With BILLIONS of small, statements You REASON upon them You provide answers in realistic time You visualize the results

Challenges Data providers speak different languages Data providers can go offline Even knowing who to ask is a problem Responding in time can be challenging The (data) world changes

SemaGrow: Distributed, Heterogeneous, Semantic Query Processing Distributed queries over SPARQL endpoints On-the-fly mapping across data provider languages Adaptive to problematic data providers Allows complex queries Support for streaming data (sensors!)

Summary Distributed computing allows Generating amazing amounts of data Handling amazing amounts of data Computational availability and fail-over On-demand computation power Security Distributed artificial intelligence allows Asking complex questions over data Combining data Generating knowledge Exploiting knowledge

From Distributed Computing to Distributed Artificial Intelligence Dr. Christos Filippidis, NCSR Demokritos Dr. George Giannakopoulos, NCSR Demokritos Thank you!