On Establishing Big Data Breakwaters
|
|
|
- Barbra Harrison
- 10 years ago
- Views:
Transcription
1 On Establishing Big Data Breakwaters with Analytics Dr. - Ing. Morris Riedel Head of Research Group High Productivity Data Processing, Juelich Supercomputing Centre, Germany Adjunct Associated Professor, University of Iceland, Iceland Co-chair Research Data Alliance, Big Data Analytics Interest Group
2 Big Data Waves Surfboards Breakwaters How to engage in the rising tide of big data? Unsolved Questions: Scale Heterogeneity Stewardship Curation Long-Term Access and Storage [1] HLEG Report [2] KE Report Research Challenges: Collection, Trust, Usability Interoperability, Diversity Security, Smart Analytics, Education and training Data publication and access Commercial exploitation New social paradigms Preservation and sustainability Concrete Next Steps
3 Looking into Big Data Waves Volume Variety Structured Data Automobiles Simulations Mobile Locationbased Data Machine Data IMHO, it s great! Velocity Click Stream Image-based analysis large instruments Text Data Context Social Network RFID Smart Meter Customer Data Point of Sale
4 Understanding Big Data Waves Big Data Streams with high velocity Web Actions n optional filters Data stream with data hidden in logs/events Data Sources Measurement Devices Data stream with measured data Computational Simulations Data stream with data of HPC/HTC simulation Data Sinks
5 Crowdsourcing increases # of Big Data Streams Usual Citizens / Citizen Scientist Data streams with data (low trust) Individuals with domain as Hobby Exabytes Data streams with data (moderate trust) Scientific/Engineering Domain Experts Data streams with data (high trust) Petabytes Terabytes
6 Alpha Magnetic Spectrometer ISS What are building blocks of the Universe? Search for 20 TB Cosmic ~400 Antimatter JUROPA++ TB 1-2 Pflop/s + Booster JSC is main facility for AMS computing in Germany Selected Juelich Supercomputing Centre (JSC) Research Examples Simulation Labs (e.g. Climate SimLab) one example out of many JSC SimLabs ~10 years 500 HDF Files ~50 TB 50 TB ~20 TB Towards Exascale Applications with combined characteristics of simulations and analytics
7 The next generation radio telescope for science pushing the limits of the observable universe out by billions of galaxies The square kilometre array 1 PB in 20 seconds LOFAR test site Jülich
8 Large-scale Computational Massively Parallel Applications simulate Reality Better Prediction Accuracy means Bigger Data TOP 500 HPC Systems 11/2013 We are unable to store the output data of all computational simulations/users [3] F. Berman
9 Making use of Big Data is important but How? Netflix Prize Challenge ~2009 Challenge: Improve movie recommender Prize: 1 M $ for team with 10% improvements Open historical data to train a new system Traditional Machine Learning Algorithms used Prize received with Artificial Neural Networks It becomes a strategic differentiator in the market H1N1 Virus Made Headlines ~2009 Nature paper from Google employees Explains how Google is able to predict flus Not only national scale, but down to regions Possible via logged big data - search queries [4] Jeremy Ginsburg et al., Detecting influenza epidemics using search engine query data, Nature 457 (2009) ~1998-today
10 Smart Analytics are Needed to Take Advantage of Big Data The challenge is to understand which analytics make sense Terms & Motivation Understanding climate change, finding alternative energy sources, and preserving the health of an ageing population are all cross-disciplinary problems that require high-performance data storage, smart analytics, transmission and mining to solve. [1] HLEG Report In the data-intensive scientific world, new skills are needed for creating, handling, manipulating, analysing, and making available large amounts of data for re-use by others. [2] KE Report Integration of data analytics with exascale simulations represents a new kind of workflow that will impact both data-intensive science and exascale computing. [5] DOE ASCAC Report
11 Big Data Analytics Interest Group Can we establish a neutral group gathering clear evidence which analytics make sense in Research? Develops community based recommendations on feasible data analytics approaches to address community needs of utilizing large quantities of data Analyzes different scientific research domain applications and their potential use of various big data analytics techniques A systematic classification of feasible combinations of analysis algorithms, analytical tools, data and resource characteristics and scientific queries will be covered in these recommendations
12 Big Data Analytics Interest Group Agricultural Data Interoperability IG Big Data Analytics IG Brokering IG Certification of Digital Repositories IG Community Capability Model WG Data Citation WG Data Foundation and Terminology WG Data in Context IG Data Type Registries WG Defining Urban Data Exchange for Science IG Digital Practices in History and Ethnography IG Engagement Group IG Legal Interoperability IG Long tail of research data IG Marine Data Harmonization IG Metadata IG Metadata Standards Directory WG PID Information Types WG Practical Policy WG Preservation e-infrastructure IG Publishing Data IG Standardization of Data Categories and Codes IG Structural Biology IG Toxicogenomics Interoperability IG UPC Code for Data IG Wheat Data Interoperability WG [6] Research Data Alliance Big Data Analytics IG Web page
13 Big Data Analytics Interest Group New Technology Impact As Challenge for Users NoSQL Databases Example Selected Features Simplicity of design and deployment Horizontal scaling Less constrained consistency models Finer control over availability Simple retrieval and appending... Types Key-Value-based (e.g. Cassandra) Column-based (e.g. Apache Hbase) Document-based (e.g. MongoDB) Graph-based (e.g. Neo4J) String-based Key-Value Stores
14 Big Data Analytics Interest Group New Technology Impact As Challenge for Users Map-Reduce Paradigm vs. Traditional HPC What is the right technique out of many? [7] Computer Vision using Map-Reduce
15 Big Data Analytics Interest Group Selected Data Analytics Techniques Classification? Clustering Regression Groups of data exist New data classified to existing groups No groups of data exist Create groups from data close to each other Identify a line with a certain slope describing the data
16 Big Data Analytics Interest Group Big Data Analytics Techniques require their Parallelization [9] Geoffrey Fox et al. [10] Gesmundo et al. [11] Zhanquan Sun et al. Parallel K-Means Clustering Algorithms Parallel Perceptron Using Map-Reduce Parallel Support Vector Machines
17 Methods & Tools Sampling vs. Big Data Parallelization! Applied Statistics Data Mining Machine Scientific Computing Learning Algorithms Computational Scientist Software Engineer Engineer Data Miner Statistician Data Scientist new DBs Training Data Scientists Big Data Analytics Reference Material for Data Scientists Insights Statistical Data Mining Course HPC B(ig Data) Course Data Scientists with skills of various fields
18 Concrete Use Case Underutilized Unique Big Data Big Data not always means Big Volume, but also Context (uniqueness) In-flight Measurements ( Context ) MOZAIC 1994 today (~20 years): Ozone, water vepor, CO, NOy, + p, T, Wind IAGOS 2011 today ( better data ): + Cloud droplets + CO 2, CH 4, Aerosole Challenge: Large underutilized data collections exist Approach: Initial general data analytics techniques Domain-specific data analysis by experts Big Data Analytics
19 Concrete Use Case Automating Quality Control But Bigger Data not always means Better data Challenge: Large data collection exist with many measurements Approach: Initial general data analytics techniques E.g. automated outlier detection for quality control (Previously only manual sampling, but better?) Domain-specific focussed data validation by experts Big Data Analytics With thanks to Robert Huber, Marum, Bremen
20 ScienceTube Long-term Data Preservation and Curation bears potentials to lower Data Waves and supports big data analytics process [8] ScienceTube Search? References to data? Sharing? Metadata? Trust? Open Data? We need to dive into data Selected Benefits of open data infrastructures for science & engineering: High reliability, so data scientists can count on its availability Open deposit, allowing user-community centres to store data easily Persistent identification, allowing data centres to register a huge amount of markers to track the origins and characteristics of the information Metadata support to allow effective management, use and understanding Avoids re-creation of datasets through easy data lookups and re-use Enables easier identification of duplicates to remove them & save storage
21 Towards Exascale: Applications with combined characteristics of simulations & analytics In-Situ Analytics correlations visual analytics e.g. map-reduce jobs, R-MPI In-situ correlations & data reduction e.g. clustering, classification In-situ statistical data mining exascale application scientific visualization & beyond steering key-value pair DB analytics part visualization part interactive scalable I/O distributed archive computational simulation part in-memory Exascale computer with access to exascale storage/archives Inspired by a [5] DOE ASCAC report
22 References [1] High Level Expert Group on Scientific Data, Riding the Wave,Report to the European Commission, October 2010, Online: [2] Knowledge Exchange Partner, A Surfboard for Riding the Wave, November 2012, Online: [3] Fran Berman, Maximising the Potential of Research Data, January 2013, Online: [4] Jeremy Ginsburg et al., Detecting influenza epidemics using search engine query data, Nature 457 (2009), Online: [5] DOE ASCAC Data Subcommittee Report, Synergistic Challenges in Data-Intensive Science and Exascale Computing, 2013, Online: [6] Research Data Alliance (RDA), Big Data Analytics Web Page, Online: [7] Brandyn White et al., Web-Scale Computer Vision using MapReduce for Multimedia Data Mining, Online: [8] M. Riedel and P. Wittenburg et al. A Data Infrastructure Reference Model with Applications: Towards Realization of a ScienceTube Vision with a Data Replication Service, 2013, Online: [9] G. Fox, MPI and Map-Reduce, Talk at CCGSC 2010 Flat Rock, NC, 2010, Online: [10] Gesmundo et al., Hadoop Perceptron, 2012, Online: [11] Zhanquan Sun et al., Study on Parallel SVM Based on MapReduce, Online:
23 Thanks Talk available at: Contact:
Understanding Big Data Analytics Applications in Earth Science Morris Riedel, Rahul Ramachandran/Kuo Kwo-Sen, Peter Baumann Big Data Analytics
Understanding Big Data Applications in Earth Science Morris Riedel, Rahul Ramachandran/Kuo Kwo-Sen, Peter Baumann Big Data Interest Group Co Chairs are Needed in Big Data-driven Scientific Research The
European Data Infrastructure - EUDAT Data Services & Tools
European Data Infrastructure - EUDAT Data Services & Tools Dr. Ing. Morris Riedel Research Group Leader, Juelich Supercomputing Centre Adjunct Associated Professor, University of iceland BDEC2015, 2015-01-28
High Productivity Data Processing Analytics Methods with Applications
High Productivity Data Processing Analytics Methods with Applications Dr. Ing. Morris Riedel et al. Adjunct Associate Professor School of Engineering and Natural Sciences, University of Iceland Research
Classification Techniques in Remote Sensing Research using Smart Data Analytics
Classification Techniques in Remote Sensing Research using Smart Data Analytics Federated Systems and Data Division Research Group High Productivity Data Processing Morris Riedel Juelich Supercomputing
COMP9321 Web Application Engineering
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411
Big Data and Analytics: Challenges and Opportunities
Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif
How To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
Data-intensive HPC: opportunities and challenges. Patrick Valduriez
Data-intensive HPC: opportunities and challenges Patrick Valduriez Big Data Landscape Multi-$billion market! Big data = Hadoop = MapReduce? No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard,
Scalable Developments for Big Data Analytics in Remote Sensing
Scalable Developments for Big Data Analytics in Remote Sensing Federated Systems and Data Division Research Group High Productivity Data Processing Dr.-Ing. Morris Riedel et al. Research Group Leader,
NoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
Transforming the Telecoms Business using Big Data and Analytics
Transforming the Telecoms Business using Big Data and Analytics Event: ICT Forum for HR Professionals Venue: Meikles Hotel, Harare, Zimbabwe Date: 19 th 21 st August 2015 AFRALTI 1 Objectives Describe
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
International Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
BIG DATA & ANALYTICS. Transforming the business and driving revenue through big data and analytics
BIG DATA & ANALYTICS Transforming the business and driving revenue through big data and analytics Collection, storage and extraction of business value from data generated from a variety of sources are
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON
BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON Overview * Introduction * Multiple faces of Big Data * Challenges of Big Data * Cloud Computing
Big Data and Data Science: Behind the Buzz Words
Big Data and Data Science: Behind the Buzz Words Peggy Brinkmann, FCAS, MAAA Actuary Milliman, Inc. April 1, 2014 Contents Big data: from hype to value Deconstructing data science Managing big data Analyzing
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS
BIG DATA: STORAGE, ANALYSIS AND IMPACT GEDIMINAS ŽYLIUS WHAT IS BIG DATA? describes any voluminous amount of structured, semi-structured and unstructured data that has the potential to be mined for information
International Journal of Advanced Engineering Research and Applications (IJAERA) ISSN: 2454-2377 Vol. 1, Issue 6, October 2015. Big Data and Hadoop
ISSN: 2454-2377, October 2015 Big Data and Hadoop Simmi Bagga 1 Satinder Kaur 2 1 Assistant Professor, Sant Hira Dass Kanya MahaVidyalaya, Kala Sanghian, Distt Kpt. INDIA E-mail: [email protected]
Challenges for Data Driven Systems
Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2
Why is Internal Audit so Hard?
Why is Internal Audit so Hard? 2 2014 Why is Internal Audit so Hard? 3 2014 Why is Internal Audit so Hard? Waste Abuse Fraud 4 2014 Waves of Change 1 st Wave Personal Computers Electronic Spreadsheets
International Journal of Innovative Research in Information Security (IJIRIS) ISSN: 2349-7017(O) Volume 1 Issue 3 (September 2014)
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE N.Alamelu Menaka * Department of Computer Applications Dr.Jabasheela Department of Computer Applications Abstract-We are in the age of big data which
IBM Solution Framework for Lifecycle Management of Research Data. 2008 IBM Corporation
IBM Solution Framework for Lifecycle Management of Research Data Aspects of Lifecycle Management Research Utilization of research paper Usage history Metadata enrichment Usage Pattern / Citation Collaboration
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing. University of Florida, CISE Department Prof.
CIS 4930/6930 Spring 2014 Introduction to Data Science Data Intensive Computing University of Florida, CISE Department Prof. Daisy Zhe Wang Data Science Overview Why, What, How, Who Outline Why Data Science?
Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging
Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging
Big Data and Analytics (Fall 2015)
Big Data and Analytics (Fall 2015) Core/Elective: MS CS Elective MS SPM Elective Instructor: Dr. Tariq MAHMOOD Credit Hours: 3 Pre-requisite: All Core CS Courses (Knowledge of Data Mining is a Plus) Every
DATA EXPERTS MINE ANALYZE VISUALIZE. We accelerate research and transform data to help you create actionable insights
DATA EXPERTS We accelerate research and transform data to help you create actionable insights WE MINE WE ANALYZE WE VISUALIZE Domains Data Mining Mining longitudinal and linked datasets from web and other
Data Requirements from NERSC Requirements Reviews
Data Requirements from NERSC Requirements Reviews Richard Gerber and Katherine Yelick Lawrence Berkeley National Laboratory Summary Department of Energy Scientists represented by the NERSC user community
Big Data Analytics. Tools and Techniques
Big Data Analytics Basic concepts of analyzing very large amounts of data Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research
Big Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
HPC technology and future architecture
HPC technology and future architecture Visual Analysis for Extremely Large-Scale Scientific Computing KGT2 Internal Meeting INRIA France Benoit Lange [email protected] Toàn Nguyên [email protected]
Research on Clustering Analysis of Big Data Yuan Yuanming 1, 2, a, Wu Chanle 1, 2
Advanced Engineering Forum Vols. 6-7 (2012) pp 82-87 Online: 2012-09-26 (2012) Trans Tech Publications, Switzerland doi:10.4028/www.scientific.net/aef.6-7.82 Research on Clustering Analysis of Big Data
Advanced Big Data Analytics with R and Hadoop
REVOLUTION ANALYTICS WHITE PAPER Advanced Big Data Analytics with R and Hadoop 'Big Data' Analytics as a Competitive Advantage Big Analytics delivers competitive advantage in two ways compared to the traditional
NextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
Chapter 7. Using Hadoop Cluster and MapReduce
Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &
BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research & Innovation 04-08-2011 to the EC 8 th February, Luxembourg Your Atos business Research technologists. and Innovation
OpenAIRE Research Data Management Briefing paper
OpenAIRE Research Data Management Briefing paper Understanding Research Data Management February 2016 H2020-EINFRA-2014-1 Topic: e-infrastructure for Open Access Research & Innovation action Grant Agreement
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science
CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science Dr. Daisy Zhe Wang CISE Department University of Florida August 25th 2014 20 Review Overview of Data Science Why Data
Integrating Big Data into the Computing Curricula
Integrating Big Data into the Computing Curricula Yasin Silva, Suzanne Dietrich, Jason Reed, Lisa Tsosie Arizona State University http://www.public.asu.edu/~ynsilva/ibigdata/ 1 Overview Motivation Big
Oracle Big Data SQL Technical Update
Oracle Big Data SQL Technical Update Jean-Pierre Dijcks Oracle Redwood City, CA, USA Keywords: Big Data, Hadoop, NoSQL Databases, Relational Databases, SQL, Security, Performance Introduction This technical
How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning
How to use Big Data in Industry 4.0 implementations LAURI ILISON, PhD Head of Big Data and Machine Learning Big Data definition? Big Data is about structured vs unstructured data Big Data is about Volume
Big Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
Data-Intensive Science and Scientific Data Infrastructure
Data-Intensive Science and Scientific Data Infrastructure Russ Rew, UCAR Unidata ICTP Advanced School on High Performance and Grid Computing 13 April 2011 Overview Data-intensive science Publishing scientific
Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data
CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address
SURVEY REPORT DATA SCIENCE SOCIETY 2014
SURVEY REPORT DATA SCIENCE SOCIETY 2014 TABLE OF CONTENTS Contents About the Initiative 1 Report Summary 2 Participants Info 3 Participants Expertise 6 Suggested Discussion Topics 7 Selected Responses
The InterNational Committee for Information Technology Standards INCITS Big Data
The InterNational Committee for Information Technology Standards INCITS Big Data Keith W. Hare JCC Consulting, Inc. April 2, 2015 Who am I? Senior Consultant with JCC Consulting, Inc. since 1985 High performance
Software Engineering for Big Data. CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo
Software Engineering for Big Data CS846 Paulo Alencar David R. Cheriton School of Computer Science University of Waterloo Big Data Big data technologies describe a new generation of technologies that aim
Big Data Analytics. An Introduction. Oliver Fuchsberger University of Paderborn 2014
Big Data Analytics An Introduction Oliver Fuchsberger University of Paderborn 2014 Table of Contents I. Introduction & Motivation What is Big Data Analytics? Why is it so important? II. Techniques & Solutions
Reference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
This Symposium brought to you by www.ttcus.com
This Symposium brought to you by www.ttcus.com Linkedin/Group: Technology Training Corporation @Techtrain Technology Training Corporation www.ttcus.com Big Data Analytics as a Service (BDAaaS) Big Data
Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD
Big Analytics for Space Exploration, Entrepreneurship and Policy Opportunities Tiffani Crawford, PhD Big Analytics Characteristics Large quantities of many data types Structured Unstructured Human Machine
BIG DATA TOOLS. Top 10 open source technologies for Big Data
BIG DATA TOOLS Top 10 open source technologies for Big Data We are in an ever expanding marketplace!!! With shorter product lifecycles, evolving customer behavior and an economy that travels at the speed
So What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
BIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
Is a Data Scientist the New Quant? Stuart Kozola MathWorks
Is a Data Scientist the New Quant? Stuart Kozola MathWorks 2015 The MathWorks, Inc. 1 Facts or information used usually to calculate, analyze, or plan something Information that is produced or stored by
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA http://kzhang6.people.uic.edu/tutorial/amcis2014.html August 7, 2014 Schedule I. Introduction to big data
Big Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres [email protected] Talk outline! We talk about Petabyte?
BIG DATA TRENDS AND TECHNOLOGIES
BIG DATA TRENDS AND TECHNOLOGIES THE WORLD OF DATA IS CHANGING Cloud WHAT IS BIG DATA? Big data are datasets that grow so large that they become awkward to work with using onhand database management tools.
Big Data Analytics. Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs
1 Big Data Analytics Genoveva Vargas-Solar http://www.vargas-solar.com/big-data-analytics French Council of Scientific Research, LIG & LAFMIA Labs Montevideo, 22 nd November 4 th December, 2015 INFORMATIQUE
Hadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
Big Data Analytics in Space Exploration and Entrepreneurship
Space Society of Silicon Valley Big Data Analytics in Space Exploration and Entrepreneurship Tiffani Crawford, PhD January 14, 2015 Big Data Analytics Data Characteristics Large quantities of many data
BIG DATA-AS-A-SERVICE
White Paper BIG DATA-AS-A-SERVICE What Big Data is about What service providers can do with Big Data What EMC can do to help EMC Solutions Group Abstract This white paper looks at what service providers
A Service for Data-Intensive Computations on Virtual Clusters
A Service for Data-Intensive Computations on Virtual Clusters Executing Preservation Strategies at Scale Rainer Schmidt, Christian Sadilek, and Ross King [email protected] Planets Project Permanent
Industry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, [email protected] Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
Healthcare data analytics. Da-Wei Wang Institute of Information Science [email protected]
Healthcare data analytics Da-Wei Wang Institute of Information Science [email protected] Outline Data Science Enabling technologies Grand goals Issues Google flu trend Privacy Conclusion Analytics
Big Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
ANALYTICS CENTER LEARNING PROGRAM
Overview of Curriculum ANALYTICS CENTER LEARNING PROGRAM The following courses are offered by Analytics Center as part of its learning program: Course Duration Prerequisites 1- Math and Theory 101 - Fundamentals
Introduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21)
CYBERINFRASTRUCTURE FRAMEWORK FOR 21 st CENTURY SCIENCE AND ENGINEERING (CIF21) Goal Develop and deploy comprehensive, integrated, sustainable, and secure cyberinfrastructure (CI) to accelerate research
# Not a part of 1Z0-061 or 1Z0-144 Certification test, but very important technology in BIG DATA Analysis
Section 9 : Case Study # Objectives of this Session The Motivation For Hadoop What problems exist with traditional large-scale computing systems What requirements an alternative approach should have How
Cloud and Big Data Standardisation
Cloud and Big Data Standardisation EuroCloud Symposium ICS Track: Standards for Big Data in the Cloud 15 October 2013, Luxembourg Yuri Demchenko System and Network Engineering Group, University of Amsterdam
Application Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
NITRD and Big Data. George O. Strawn NITRD
NITRD and Big Data George O. Strawn NITRD Caveat auditor The opinions expressed in this talk are those of the speaker, not the U.S. government Outline What is Big Data? Who is NITRD? NITRD's Big Data Research
Where is... How do I get to...
Big Data, Fast Data, Spatial Data Making Sense of Location Data in a Smart City Hans Viehmann Product Manager EMEA ORACLE Corporation August 19, 2015 Copyright 2014, Oracle and/or its affiliates. All rights
How To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 [email protected] www.scch.at Michael Zwick DI
Luncheon Webinar Series May 13, 2013
Luncheon Webinar Series May 13, 2013 InfoSphere DataStage is Big Data Integration Sponsored By: Presented by : Tony Curcio, InfoSphere Product Management 0 InfoSphere DataStage is Big Data Integration
The Next Wave of Data Management. Is Big Data The New Normal?
The Next Wave of Data Management Is Big Data The New Normal? Table of Contents Introduction 3 Separating Reality and Hype 3 Why Are Firms Making IT Investments In Big Data? 4 Trends In Data Management
Statistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software. SC13, November, 2013
Intel HPC Distribution for Apache Hadoop* Software including Intel Enterprise Edition for Lustre* Software SC13, November, 2013 Agenda Abstract Opportunity: HPC Adoption of Big Data Analytics on Apache
The? Data: Introduction and Future
The? Data: Introduction and Future Husnu Sensoy Global Maksimum Data & Information Technologies Global Maksimum Data & Information Technologies The Data Company Massive Data Unstructured Data Insight Information
SQL VS. NO-SQL. Adapted Slides from Dr. Jennifer Widom from Stanford
SQL VS. NO-SQL Adapted Slides from Dr. Jennifer Widom from Stanford 55 Traditional Databases SQL = Traditional relational DBMS Hugely popular among data analysts Widely adopted for transaction systems
Conquering the Astronomical Data Flood through Machine
Conquering the Astronomical Data Flood through Machine Learning and Citizen Science Kirk Borne George Mason University School of Physics, Astronomy, & Computational Sciences http://spacs.gmu.edu/ The Problem:
Chapter 6. Foundations of Business Intelligence: Databases and Information Management
Chapter 6 Foundations of Business Intelligence: Databases and Information Management VIDEO CASES Case 1a: City of Dubuque Uses Cloud Computing and Sensors to Build a Smarter, Sustainable City Case 1b:
