High Performance Compu2ng and Big Data. High Performance compu2ng Curriculum UvA- SARA h>p://
|
|
- Abner Jackson
- 8 years ago
- Views:
Transcription
1 High Performance Compu2ng and Big Data High Performance compu2ng Curriculum UvA- SARA h>p://
2 Big data was big news in 2012 and probably in 2013 too. The Harvard Business Review talks about it as The Management Revolu0on. The Wall Street Journal says Meet the New Big Data, and Big Data is on the Rise, Bringing Big Ques0ons.
3 Big- Data popularity
4 Where Big Data Comes From? Big Data is not Specific applica*on type, but rather a trend or even a collec2on of Trends- napping mul2ple applica2on types Data growing in mul2ple ways More data (volume of data ) More Type of data (variety of data) Faster Ingest of data (velocity of data) More Accessibility of data (internet, instruments, ) Data Growth and availability exceeds organiza2on ability to make intelligent decision based on it Addison Snell CEO. Intersect360, Research
5 Data is Big If It is Measured in MW A good sweet spot for a data center is 15 MW Facebook s leased data centers are typically between 2.5 MW and 6.0 MW. Facebook s Pineville data center is 30 MW Google s compu2ng infrastructure uses 260 MW Robert Grossman, Collin BenneC University of Chicago Open Data Group
6 Jim Gray Vision We have to do be>er at producing tools to support the whole research cycle from data capture and data cura*on to data analysis and data visualiza*on. Today, the tools for capturing data both at the mega- scale and at the milli- scale are just dreadful. Aber you have captured the data, you need to curate it before you can start doing any kind of data analysis, and we lack good tools for both data cura*on and data analysis. Then comes the publica*on of the results of your research, and the published literature is just the 2p of the data iceberg. By this I mean that people collect a lot of data and then reduce this down to some number of column inches in Science or Nature or 10 pages if it is a computer science person wri2ng. So what I mean by data iceberg is that there is a lot of data that is collected but not curated or published in any systema*c way. Based on the transcript of a talk given by Jim Gray to the NRC- CSTB1 in Mountain View, CA, on January 11, 2007
7 Advice From Jim Gray 1. Analysing Big data requires scale- out solu2ons not scale- up solu2ons (GrayWulf) 2. Move the analysis to the data. 3. Work with scien2sts to find the most common 20 queries and make them fast. 4. Go from working to working. Robert Grossman, Collin BenneC University of Chicago Open Data Group
8 What is the Fundamental Trade Off? Robert Grossman, Collin BenneC University of Chicago Open Data Group
9 How much data? Google processes 20 PB a day (2008) Wayback Machine has 3 PB TB/month (3/2009) Facebook has 2.5 PB of user data + 15 TB/day (4/2009) ebay has 6.5 PB of user data + 50 TB/day (5/2009) CERN s Large Hydron Collider (LHC) generates 15 PB a year 640K ought to be enough for anybody. h>p://econsultancy.com/nl/blog/ big- data- shising- the- bell- curve- to- the- long- tail 9 BigData.pptx - Computer Science
10 BigData is the new hype By the end of 2011 the world had an es*mated 1.8 zetabytes (trillion gigabytes) of data and it is es*mated to 10 grow to 35 zetabytes in the next few years
11 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
12 How We Define Big Data Big in Big Data refers to: Big size is the primary defini2on. Big complexity rather than big volume. it can be small and not all large datasets are big data size ma>ers... but so does accessibility, interoperability and reusability. define Big Data using 3 Vs; namely: volume, variety, velocity Big Data - Back to mine 12
13 volume, variety, and velocity Aggrega2on that used to be measured in petabytes (PB) is now referenced by a term: zetabytes (ZB). A zetabyte is a trillion gigabytes (GB) or a billion terabytes in 2010, we crossed the 1ZB marker, and at the end of 2011 that number was es2mated to be 1.8ZB
14 volume, variety, and velocity The variety characteris2c of Big Data is really about trying to capture all of the data that pertains to our decision- making process. Making sense out of unstructured data, such as opinion, or analysing images.
15 Type of Data Rela2onal Data (Tables/Transac2on/Legacy Data) Text Data (Web) Semi- structured Data (XML) Graph Data Social Network, Seman2c Web (RDF), Streaming Data You can only scan the data once
16 volume, variety, and velocity velocity is the rate at which data arrives at the enterprise and is processed or well understood In other terms How long does it take you to do something about it or know it has even arrived?
17 volume, variety, velocity, and veracity Veracity refers to the quality or trustworthiness of the data. A common complica2on is that the data is saturated with both useful signals and lots of noise (data that can t be trusted)
18
19
20 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
21 Enabler: Data storage
22 Enabler: computa2on capacity
23 Enabler: Data storage
24 Type of available data
25 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
26 When to Consider a Big Data Solu2on User point of view You re limited by your current plaworm or environment because you can t process the amount of data that you want to process You want to involve new sources of data in the analy2cs, but you can t, because it doesn t fit into schema- defined rows and columns without sacrificing fidelity or the richness of the data You need to ingest data as quickly as possible and need to work with a schema- on- demand
27 When to Consider a Big Data Solu2on You re forced into a schema- on- write approach (the schema must be created before data is loaded), but you need to ingest data quickly, or perhaps in a discovery process, and want the cost benefits of a schema- on- read approach (data is simply copied to the file store, and no special transforma2on is needed) un2l you know that you ve got something that s ready for analysis? The data arriving too fast at your organiza2on s doorstep for the current analy2cs platorm to handle
28 When to Consider a Big Data Solu2on You want to analyse not just raw structured data, but also semi- structured and unstructured data from a wide variety of sources you re not sa2sfied with the effec2veness of your algorithms or models when all, or most, of the data needs to be analysed or when a sampling of the data isn t going to be nearly as effec2ve
29 When to Consider a Big Data Solu2on you aren t completely sure where the inves2ga2on will take you, and you want elas*city of compute, storage, and the types of analy2cs that will be pursued all of these became useful as we added more sources and new methods If your answers to any of these ques*ons are yes, you need to consider a Big Data solu*on.
30 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
31 What are the choices to store the Big data Pa>ern 1: Put the metadata in a database and point to files in a file system Pa>ern 2: Put the data into a distributed file system. Pa>ern 3: Put the data into a NoSQL applica2on Robert Grossman University of Chicago Open Data Group, November 14, 2011
32 What are the choices to store the Big data Pa>ern 1: Put the metadata in a database and point to files in a file system User Rela2onal DB Search is only done on the meta- data Robert Grossman University of Chicago Open Data Group, November 14, 2011
33 What are the choices to store the Big data Pa>ern 2: Put the data into a distributed file system. Google Distributed File System (GDFS). Hadoop Distributed File Systems (HDFS) Sector Distributed File System (SDFS) Robert Grossman University of Chicago Open Data Group, November 14, 2011
34 What are the choices to store the Big data Pa>ern 3: Put the data into a NoSQL database Google s BigTable Amazon s Dynamo Facebook s Cassandra Robert Grossman University of Chicago Open Data Group, November 14, 2011
35 Bill Howe, UW
36 MapReduce vs. Databases A. Pavlo, et al. "A comparison of approaches to large- scale data analysis," in SIGMOD 09: Proceedings of the 35th SIGMOD interna=onal conference on Management of data, New York, NY, USA, 2009, pp Conclusions: at the scale of the experiments we conducted, both parallel database systems displayed a significant performance advantage over Hadoop MR in execu2ng a variety of data intensive analysis benchmarks.
37 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
38 The problem TCP Was never designed to move large datasets over wide area high Performance Networks. For loading a webpage, TCP is great. For sustained data transfer, it is far from ideal. Most of the 2me even though the connec*on itself is good (let say 45Mbps), transfers are much slower. There are two reason for a slow transfer over fast connec2ons: Latency and packet loss bring TCP- based file transfer to a crawl. Robert Grossman University of Chicago Open Data Group, November 14, 2011
39 TCP Throughput vs RTT and Packet Loss LAN US US-EU US-ASIA Throughput (Mb/s) % 0.5% 0.1% % 0.05% Round Trip Time (ms) Source: Yunhong Gu, 2007, experiments over wide area 1G.
40 The solu2ons Use parallel TCP streams GridFTP Use specialized network protocols UDT, FAST, etc. Use RAID to stripe data across disks to improve throughput when reading These techniques are well understood in HEP, astronomy, but not yet in biology Robert Grossman University of Chicago Open Data Group, November 14, 2011
41 Moving 113GB of Bio- mirror Data Site RTT TCP UDT TCP/UDT Km Site RTT TCP UDT TCP/UDT Km NCSA NCSA Purdue ORNL ,200 TACC ,000 SDSC ,300 CSTNET ,000 Purdue ORNL ,200 TACC ,000 SDSC ,300 CSTNET ,000 GridFTP TCP and UDT transfer 2mes for 113 GB from gridip.bio- - - mirror.net/biomirror/ blast/ (Indiana USA). All TCP and UDT 2mes in minutes. Source: h>p://gridip.bio- mirror.net/biomirror/ Robert Grossman University of Chicago Open Data Group, November 14, 2011
42 Case study: CGI 60 genomes Trace by Complete Genomics showing performance of moving 60 complete human genomes from Mountain View to Chicago using the open source Sector/UDT. Approximately 18 TB at about 0.5 Mbs on 1G link. Robert Grossman University of Chicago Open Data Group, November 14, 2011
43 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
44 Scien2fic e- infrastructure some challenges to overcome Collec2on How can we make sure that data are collected together with the informa*on necessary to re- use them? Trust How can we make informed judgements about whether certain data are authen*c and can be trusted? How can we judge which repositories we can trust? How can appropriate access and use of resources be granted or controlled Riding the wave, How Europe can gain from the rising 2de of scien2fic data
45 Scien2fic e- infrastructure some challenges to overcome Usability How can we move to a situa2on where non- specialists can overcome the barriers and be able to start sensible work on unfamiliar data Interoperability How can we implement interoperability within disciplines and move to an overarching mul2- disciplinary way of understanding and using data? How can we find unfamiliar but relevant data resources beyond simple keyword searches, but involving a deeper probing into the data How can automated tools find the informa2on needed to tackle data Riding the wave, How Europe can gain from the rising 2de of scien2fic data
46 Scien2fic e- infrastructure some challenges to overcome Diversity How do we overcome the problems of diversity heterogeneity of data, but also of backgrounds and data- sharing cultures in the scien2fic community? How do we deal with the diversity of data repositories and access rules within or between disciplines, and within or across na2onal borders? Security How can we guarantee data integrity? How can we avoid data poisoning by individuals or groups intending to bias them in their interest? Riding the wave, How Europe can gain from the rising 2de of scien2fic data
47 content Defining BigData Enablers of BigData BigData is good when Where to store the data? Moving big Data Scien2fic e- infrastructure some challenges to overcome a wish list
48 Scien2fic e- infrastructure a wish list Open deposit, allowing user- community centres to store data easily Bit- stream preserva*on, ensuring that data authen2city will be guaranteed for a specified number of years Format and content migra*on, execu2ng CPU- intensive transforma2ons on large data sets at the command of the communi2es Riding the wave, How Europe can gain from the rising 2de of scien2fic data
49 Scien2fic e- infrastructure a wish list Persistent iden*fica*on, allowing data centres to register a huge amount of markers to track the origins and characteris2cs of the informa2on Metadata support to allow effec2ve management, use and understanding Maintaining proper access rights as the basis of all trust A variety of access and cura*on services that will vary between scien2fic disciplines and over 2me Riding the wave, How Europe can gain from the rising 2de of scien2fic data
50 Scien2fic e- infrastructure a wish list Execu*on services that allow a large group of researchers to operate on the stored date High reliability, so researchers can count on its availability Regular quality assessment to ensure adherence to all agreements Distributed and collabora*ve authen2ca2on, authorisa2on and accoun2ng A high degree of interoperability at format and seman2c level Riding the wave, How Europe can gain from the rising 2de of scien2fic data
51
52 Google BigQuery Google BigQuery is a web service that lets you do interac2ve analysis of massive datasets up to billions of rows. Scalable and easy to use, BigQuery lets developers and businesses tap into powerful data analy2cs on demand h>p:// v=p78t_zdwqyk
53 IBM BigInsights BigInsights = analy2cal platorm for persistent big data Based on open sources & IBM technologies Dis2nguishing characteris2cs Built- in Analy2cs Big Data: Frequently Asked Ques2ons for IBM InfoSphere BigInsights h>p://
54 References T. Hey, S. Tansley, and K. Tolle, The Fourth Paradigm: Data- Intensive Scien2fic Discovery, T. Hey, S. Tansley, and K. Tolle, Eds. Microsob, Available: h>p://research.microsob.com/en- us/collabora2on/ fourthparadigm/ h>p:// knowledge- crea2on- data- driven- science h>p://royalsociety.org/uploadedfiles/ Royal_Society_Content/policy/projects/sape/ SAOE.pdf h>p:// item/sobware- sustainability- ins2tute- blog- how- big- big- data
55 Real2me Analy2cs for Big Data: A Facebook Case Study h>p://
Big Data /Data Science Data Intensive (Science) Technologies
Big Data /Data Science Data Intensive (Science) Technologies Adam Belloum Ins:tute of Informa:cs University of Amsterdam a.s.z.belloum@uva.nl High Performance compu:ng Curriculum, Jan 2015 hmp://www.hpc.uva.nl/
More informationIntroduction to Big Data
UVA HPC & BIG DATA COURSE Introduction to Big Data Adam Belloum Content General Introduction Definitions Data Analytics Solutions for Big Data Analytics The Network (Internet) When to consider BigData
More informationIns+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho
Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED
More informationTheo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za
Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationTexas Digital Government Summit. Data Analysis Structured vs. Unstructured Data. Presented By: Dave Larson
Texas Digital Government Summit Data Analysis Structured vs. Unstructured Data Presented By: Dave Larson Speaker Bio Dave Larson Solu6ons Architect with Freeit Data Solu6ons In the IT industry for over
More informationBig Data Explained. An introduction to Big Data Science.
Big Data Explained An introduction to Big Data Science. 1 Presentation Agenda What is Big Data Why learn Big Data Who is it for How to start learning Big Data When to learn it Objective and Benefits of
More informationThe 3 questions to ask yourself about BIG DATA
The 3 questions to ask yourself about BIG DATA Do you have a big data problem? Companies looking to tackle big data problems are embarking on a journey that is full of hype, buzz, confusion, and misinformation.
More informationIntroduction to Big Data & Basic Data Analysis. Freddy Wetjen, National Library of Norway.
Introduction to Big Data & Basic Data Analysis Freddy Wetjen, National Library of Norway. Big Data EveryWhere! Lots of data may be collected and warehoused Web data, e-commerce purchases at department/
More informationNextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
More informationAn Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
More informationReturn on Experience on Cloud Compu2ng Issues a stairway to clouds. Experts Workshop Nov. 21st, 2013
Return on Experience on Cloud Compu2ng Issues a stairway to clouds Experts Workshop Agenda InGeoCloudS SoCware Stack InGeoCloudS Elas2city and Scalability Elas2c File Server Elas2c Database Server Elas2c
More informationSo What s the Big Deal?
So What s the Big Deal? Presentation Agenda Introduction What is Big Data? So What is the Big Deal? Big Data Technologies Identifying Big Data Opportunities Conducting a Big Data Proof of Concept Big Data
More informationFour Orders of Magnitude: Running Large Scale Accumulo Clusters. Aaron Cordova Accumulo Summit, June 2014
Four Orders of Magnitude: Running Large Scale Accumulo Clusters Aaron Cordova Accumulo Summit, June 2014 Scale, Security, Schema Scale to scale 1 - (vt) to change the size of something let s scale the
More informationPerformance Management in Big Data Applica6ons. Michael Kopp, Technology Strategist @mikopp
Performance Management in Big Data Applica6ons Michael Kopp, Technology Strategist NoSQL: High Volume/Low Latency DBs Web Java Key Challenges 1) Even Distribu6on 2) Correct Schema and Access paperns 3)
More informationSQream Technologies Ltd - Confiden7al
SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!
More informationHadoop. http://hadoop.apache.org/ Sunday, November 25, 12
Hadoop http://hadoop.apache.org/ What Is Apache Hadoop? The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using
More informationHow to Build a Data Center?
Next up Cloud Compu-ng Warehouse scale computers How to build/program data centers Google so?ware stack GFS BigTable Sawzall Chubby Map/reduce What is cloud compu-ng Illusion of infinite compu-ng resources
More informationBig Data a threat or a chance?
Big Data a threat or a chance? Helwig Hauser University of Bergen, Dept. of Informatics Big Data What is Big Data? well, lots of data, right? we come back to this in a moment. certainly, a buzz-word but
More informationUsing RDBMS, NoSQL or Hadoop?
Using RDBMS, NoSQL or Hadoop? DOAG Conference 2015 Jean- Pierre Dijcks Big Data Product Management Server Technologies Copyright 2014 Oracle and/or its affiliates. All rights reserved. Data Ingest 2 Ingest
More informationCS 378 Big Data Programming. Lecture 1 Introduc:on
CS 378 Big Data Programming Lecture 1 Introduc:on Class Logis:cs Class meets MW, 9:30 AM 11:00 AM Office Hours GDC 4.706 MW 11:00 12:00 AM By appointment Email: dfranke@cs.utexas.edu Web page: cs.utexas.edu/~dfranke/courses/2015spring/cs378-
More informationHunk & Elas=c MapReduce: Big Data Analy=cs on AWS
Copyright 2014 Splunk Inc. Hunk & Elas=c MapReduce: Big Data Analy=cs on AWS Dritan Bi=ncka BD Solu=ons Architecture Disclaimer During the course of this presenta=on, we may make forward looking statements
More informationMap- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering
Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming
More informationBig Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料
Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料 美 國 13 歲 學 生 用 Big Data 找 出 霸 淩 熱 點 Puri 架 設 網 站 Bullyvention, 藉 由 分 析 Twitter 上 找 出 提 到 跟 霸 凌 相 關 的 詞, 搭 配 地 理 位 置
More informationBig Data Research at DKRZ
Big Data Research at DKRZ Michael Lautenschlager and Colleagues from DKRZ and Scien:fic Compu:ng Research Group Symposium Big Data in Science Karlsruhe October 7th, 2014 Big Data in Climate Research Big
More informationBig Data and Clouds: Challenges and Opportuni5es
Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center
More informationESS event: Big Data in Official Statistics. Antonino Virgillito, Istat
ESS event: Big Data in Official Statistics Antonino Virgillito, Istat v erbi v is 1 About me Head of Unit Web and BI Technologies, IT Directorate of Istat Project manager and technical coordinator of Web
More informationDNS Big Data Analy@cs
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken! Tweede niveau! Derde niveau! Vierde niveau DNS Big Data Analy@cs Vijfde niveau DNS- OARC Fall 2015 Workshop October 4th 2015 Maarten Wullink,
More informationLarge-Scale Data Processing
Large-Scale Data Processing Eiko Yoneki eiko.yoneki@cl.cam.ac.uk http://www.cl.cam.ac.uk/~ey204 Systems Research Group University of Cambridge Computer Laboratory 2010s: Big Data Why Big Data now? Increase
More informationBig Data Zurich, November 23. September 2011
Institute of Technology Management Big Data Projektskizze «Competence Center Automotive Intelligence» Zurich, November 11th 23. September 2011 Felix Wortmann Assistant Professor Technology Management,
More informationBig Data Operations: Basis for Benchmarking Big Data Systems
Big Data Operations: Basis for Benchmarking Big Data Systems Justin Zhan North Carolina State A&U University, Greensboro Arcot Rajasekar Reagan Moore Shu Huang Yufeng Xin University of North Carolina at
More informationCPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu
CPS 216: Advanced Database Systems (Data-intensive Computing Systems) Shivnath Babu A Brief History Relational database management systems Time 1975-1985 1985-1995 1995-2005 Let us first see what a relational
More informationBig Data & QlikView. Democratizing Big Data Analytics. David Freriks Principal Solution Architect
Big Data & QlikView Democratizing Big Data Analytics David Freriks Principal Solution Architect TDWI Vancouver Agenda What really is Big Data? How do we separate hype from reality? How does that relate
More informationBig Data. The Big Picture. Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas
Big Data The Big Picture Our flexible and efficient Big Data solu9ons open the door to new opportuni9es and new business areas What is Big Data? Big Data gets its name because that s what it is data that
More informationMSc Data Science at the University of Sheffield. Started in September 2014
MSc Data Science at the University of Sheffield Started in September 2014 Gianluca Demar?ni Lecturer in Data Science at the Informa?on School since 2014 Ph.D. in Computer Science at U. Hannover, Germany
More informationInternational Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 ISSN 2278-7763. BIG DATA: A New Technology
International Journal of Advancements in Research & Technology, Volume 3, Issue 5, May-2014 18 BIG DATA: A New Technology Farah DeebaHasan Student, M.Tech.(IT) Anshul Kumar Sharma Student, M.Tech.(IT)
More informationDistributed Data Parallel Computing: The Sector Perspective on Big Data
Distributed Data Parallel Computing: The Sector Perspective on Big Data Robert Grossman July 25, 2010 Laboratory for Advanced Computing University of Illinois at Chicago Open Data Group Institute for Genomics
More informationSurfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics
Surfing the Data Tsunami: A New Paradigm for Big Data Processing and Analytics Dr. Liangxiu Han Future Networks and Distributed Systems Group (FUNDS) School of Computing, Mathematics and Digital Technology,
More informationWhat is Analytic Infrastructure and Why Should You Care?
What is Analytic Infrastructure and Why Should You Care? Robert L Grossman University of Illinois at Chicago and Open Data Group grossman@uic.edu ABSTRACT We define analytic infrastructure to be the services,
More informationBig Data Analytics. Prof. Dr. Lars Schmidt-Thieme
Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,
More informationBIG DATA: DATA EVERYWHERE
Line Pouchard, PhD Purdue Libraries, Research Data 03/10/2015 BIG DATA INTEREST GROUP Issues in Big Data Cura/on BIG DATA: DATA EVERYWHERE DEFINITIONS OF DATA CURATION Data curation is a term used to indicate
More informationCyber Security With Big Data
Cyber Security With Big Data Fast. Complete. Cost-Effec1ve. Harry J Foxwell, PhD Principal Consultant Oracle Public Sector Oct 2015 Safe Harbor Statement The following is intended to outline our general
More informationIndustry 4.0 and Big Data
Industry 4.0 and Big Data Marek Obitko, mobitko@ra.rockwell.com Senior Research Engineer 03/25/2015 PUBLIC PUBLIC - 5058-CO900H 2 Background Joint work with Czech Institute of Informatics, Robotics and
More informationIbis: Scaling Python Analy=cs on Hadoop and Impala
Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT
More informationData Warehousing. Yeow Wei Choong Anne Laurent
Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes
More informationHow To Scale Out Of A Nosql Database
Firebird meets NoSQL (Apache HBase) Case Study Firebird Conference 2011 Luxembourg 25.11.2011 26.11.2011 Thomas Steinmaurer DI +43 7236 3343 896 thomas.steinmaurer@scch.at www.scch.at Michael Zwick DI
More informationThe Emerging Discipline of Data Science. Principles and Techniques For Data- Intensive Analysis
The Emerging Discipline of Data Science Principles and Techniques For Data- Intensive Analysis What is Big Data Analy9cs? Is this a new paradigm? What is the role of data? What could possibly go wrong?
More informationKeywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.
Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics
More informationBig Data and Scientific Discovery
Big Data and Scientific Discovery Bill Harrod Office of Science William.Harrod@science.doe.gov! February 26, 2014! Big Data and Scien*fic Discovery Next genera*on scien*fic breakthroughs require: Major
More informationBig Data : The New Oil
Big Data : The New Oil Data Driven Innova7on: $ 15 billion by 2015 Data Driven Innova,on for Growth and Well Being: OECD Report: Interim Synthesis Report 2014 The new data- driven era of scien7fic discovery
More informationLinux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech
Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD fang.liu@oit.gatech.edu A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationHadoop implementation of MapReduce computational model. Ján Vaňo
Hadoop implementation of MapReduce computational model Ján Vaňo What is MapReduce? A computational model published in a paper by Google in 2004 Based on distributed computation Complements Google s distributed
More informationTechnical Update 2008
Technical Update 2008 Sandy Payette, Chief Executive Dan Davis, Chief Software Architect April 27, 2008 Mission Driven Use Cases Scholarly and Scien.fic Research and Communica.on Data Cura.on, Linking,
More informationReference Architecture, Requirements, Gaps, Roles
Reference Architecture, Requirements, Gaps, Roles The contents of this document are an excerpt from the brainstorming document M0014. The purpose is to show how a detailed Big Data Reference Architecture
More informationHow To Handle Big Data With A Data Scientist
III Big Data Technologies Today, new technologies make it possible to realize value from Big Data. Big data technologies can replace highly customized, expensive legacy systems with a standard solution
More informationEvalua&ng Streaming Strategies for Event Processing across Infrastructure Clouds (joint work)
Evalua&ng Streaming Strategies for Event Processing across Infrastructure Clouds (joint work) Radu Tudoran, Gabriel Antoniu (INRIA, University of Rennes) Kate Keahey, Pierre Riteau (ANL, University of
More informationPetabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
More informationBig Systems, Big Data
Big Systems, Big Data When considering Big Distributed Systems, it can be noted that a major concern is dealing with data, and in particular, Big Data Have general data issues (such as latency, availability,
More informationNoSQL for SQL Professionals William McKnight
NoSQL for SQL Professionals William McKnight Session Code BD03 About your Speaker, William McKnight President, McKnight Consulting Group Frequent keynote speaker and trainer internationally Consulted to
More informationLarge scale processing using Hadoop. Ján Vaňo
Large scale processing using Hadoop Ján Vaňo What is Hadoop? Software platform that lets one easily write and run applications that process vast amounts of data Includes: MapReduce offline computing engine
More informationBig Data Technologies Compared June 2014
Big Data Technologies Compared June 2014 Agenda What is Big Data Big Data Technology Comparison Summary Other Big Data Technologies Questions 2 What is Big Data by Example The SKA Telescope is a new development
More informationUpdate on the Cloud Demonstration Project
Update on the Cloud Demonstration Project Steven Wallace Joint Techs Summer 2011 13- July- 2011 Project Par4cipants BACKGROUND Twelve Universi,es: Caltech, Carnegie Mellon,Cornell George Mason, Indiana
More informationBig Data Challenges in Bioinformatics
Big Data Challenges in Bioinformatics BARCELONA SUPERCOMPUTING CENTER COMPUTER SCIENCE DEPARTMENT Autonomic Systems and ebusiness Pla?orms Jordi Torres Jordi.Torres@bsc.es Talk outline! We talk about Petabyte?
More informationBig Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies
Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies Big Data: Global Digital Data Growth Growing leaps and bounds by 40+% Year over Year! 2009 =.8 Zetabytes =.08
More informationBig Data Management in the Clouds. Alexandru Costan IRISA / INSA Rennes (KerData team)
Big Data Management in the Clouds Alexandru Costan IRISA / INSA Rennes (KerData team) Cumulo NumBio 2015, Aussois, June 4, 2015 After this talk Realize the potential: Data vs. Big Data Understand why we
More informationAGENDA. What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story. Our BIG DATA Roadmap. Hadoop PDW
AGENDA What is BIG DATA? What is Hadoop? Why Microsoft? The Microsoft BIG DATA story Hadoop PDW Our BIG DATA Roadmap BIG DATA? Volume 59% growth in annual WW information 1.2M Zetabytes (10 21 bytes) this
More informationKeywords Big Data Analytic Tools, Data Mining, Hadoop and MapReduce, HBase and Hive tools, User-Friendly tools.
Volume 3, Issue 8, August 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Significant Trends
More informationIntroduction to Big Data the four V's
Chapter 1: Introduction to Big Data the four V's This chapter is mainly based on the Big Data script by Donald Kossmann and Nesime Tatbul (ETH Zürich) Big Data Management and Analytics 15 Goal of Today
More informationIntroduc8on to Apache Spark
Introduc8on to Apache Spark Jordan Volz, Systems Engineer @ Cloudera 1 Analyzing Data on Large Data Sets Python, R, etc. are popular tools among data scien8sts/analysts, sta8s8cians, etc. Why are these
More informationBIG DATA CHALLENGES AND PERSPECTIVES
BIG DATA CHALLENGES AND PERSPECTIVES Meenakshi Sharma 1, Keshav Kishore 2 1 Student of Master of Technology, 2 Head of Department, Department of Computer Science and Engineering, A P Goyal Shimla University,
More informationA Novel Cloud Based Elastic Framework for Big Data Preprocessing
School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview
More informationHere comes the flood Tools for Big Data analytics. Guy Chesnot -June, 2012
Here comes the flood Tools for Big Data analytics Guy Chesnot -June, 2012 Agenda Data flood Implementations Hadoop Not Hadoop 2 Agenda Data flood Implementations Hadoop Not Hadoop 3 Forecast Data Growth
More informationPlay with Big Data on the Shoulders of Open Source
OW2 Open Source Corporate Network Meeting Play with Big Data on the Shoulders of Open Source Liu Jie Technology Center of Software Engineering Institute of Software, Chinese Academy of Sciences 2012-10-19
More informationBIG Big Data Public Private Forum
DATA STORAGE Martin Strohbach, AGT International (R&D) THE DATA VALUE CHAIN Value Chain Data Acquisition Data Analysis Data Curation Data Storage Data Usage Structured data Unstructured data Event processing
More informationExtreme Computing. Big Data. Stratis Viglas. School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk. Stratis Viglas Extreme Computing 1
Extreme Computing Big Data Stratis Viglas School of Informatics University of Edinburgh sviglas@inf.ed.ac.uk Stratis Viglas Extreme Computing 1 Petabyte Age Big Data Challenges Stratis Viglas Extreme Computing
More informationTHE AGE OF BIG DATA. Chula DataScience
THE AGE OF BIG DATA Asst. Prof. Natawut Nupairoj, Ph.D. Mobile Application and System Services Research Group Department of Computing Engineering Chulalongkorn University natawut.n@chula.ac.th Data is
More informationData Centric Computing Revisited
Piyush Chaudhary Technical Computing Solutions Data Centric Computing Revisited SPXXL/SCICOMP Summer 2013 Bottom line: It is a time of Powerful Information Data volume is on the rise Dimensions of data
More informationScience Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu
Science Gateways What are they and why are they having such a tremendous impact on science? Nancy Wilkins- Diehr wilkinsn@sdsc.edu What is a science gateway? science gateway /sī əәns gāt wā / n. 1. an
More informationData processing goes big
Test report: Integration Big Data Edition Data processing goes big Dr. Götz Güttich Integration is a powerful set of tools to access, transform, move and synchronize data. With more than 450 connectors,
More information.nl ENTRADA. CENTR-tech 33. November 2015 Marco Davids, SIDN Labs. Klik om de s+jl te bewerken
Klik om de s+jl te bewerken Klik om de models+jlen te bewerken Tweede niveau Derde niveau Vierde niveau.nl ENTRADA Vijfde niveau CENTR-tech 33 November 2015 Marco Davids, SIDN Labs Wie zijn wij? Mijlpalen
More informationAn Introduc@on to Big Data, Apache Hadoop, and Cloudera
An Introduc@on to Big Data, Apache Hadoop, and Cloudera Ian Wrigley, Curriculum Manager, Cloudera 1 The Mo@va@on for Hadoop 2 Tradi@onal Large- Scale Computa@on Tradi*onally, computa*on has been processor-
More informationDatenverwaltung im Wandel - Building an Enterprise Data Hub with
Datenverwaltung im Wandel - Building an Enterprise Data Hub with Cloudera Bernard Doering Regional Director, Central EMEA, Cloudera Cloudera Your Hadoop Experts Founded 2008, by former employees of Employees
More informationMaking Sense of Big Data. Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.
Making Sense of Big Data Dr. Thomas E. Potok Computa2onal Data Analy2cs Group Leader Oak Ridge Na2onal Laboratory potokte@ornl.gov 865-574- 0834 ORNL s Big Data Legacy Science National Security Energy
More informationBig Data Management and Analytics
Big Data Management and Analytics Lecture Notes Winter semester 2015 / 2016 Ludwig-Maximilians-University Munich Prof. Dr. Matthias Renz 2015 Based on lectures by Donald Kossmann (ETH Zürich), as well
More informationLecture 10: HBase! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl
Big Data Processing, 2014/15 Lecture 10: HBase!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind the
More informationNoSQL Data Base Basics
NoSQL Data Base Basics Course Notes in Transparency Format Cloud Computing MIRI (CLC-MIRI) UPC Master in Innovation & Research in Informatics Spring- 2013 Jordi Torres, UPC - BSC www.jorditorres.eu HDFS
More informationDriving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA
WHITE PAPER April 2014 Driving IBM BigInsights Performance Over GPFS Using InfiniBand+RDMA Executive Summary...1 Background...2 File Systems Architecture...2 Network Architecture...3 IBM BigInsights...5
More informationUpdate on the Cloud Demonstration Project
Update on the Cloud Demonstration Project Khalil Yazdi and Steven Wallace Spring Member Meeting April 19, 2011 Project Par4cipants BACKGROUND Eleven Universi1es: Caltech, Carnegie Mellon, George Mason,
More informationData Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
More informationKeywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop
Volume 4, Issue 1, January 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Transitioning
More informationContent Distribu-on Networks (CDNs)
Content Distribu-on Networks (CDNs) Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:0am in Architecture N101 hjp://www.cs.princeton.edu/courses/archive/spr12/cos461/ Second Half of the Course
More informationWA2192 Introduction to Big Data and NoSQL EVALUATION ONLY
WA2192 Introduction to Big Data and NoSQL Web Age Solutions Inc. USA: 1-877-517-6540 Canada: 1-866-206-4644 Web: http://www.webagesolutions.com The following terms are trademarks of other companies: Java
More informationComparison of the Frontier Distributed Database Caching System with NoSQL Databases
Comparison of the Frontier Distributed Database Caching System with NoSQL Databases Dave Dykstra dwd@fnal.gov Fermilab is operated by the Fermi Research Alliance, LLC under contract No. DE-AC02-07CH11359
More informationWhy are data sharing and reuse so difficult?
Why are data sharing and reuse so difficult? Chris9ne L. Borgman Professor and Presiden9al Chair in Informa9on Studies University of California, Los Angeles hmp://chris9neborgman.info and UCLA Knowledge
More informationData Management in SAP Environments
Data Management in SAP Environments the Big Data Impact Berlin, June 2012 Dr. Wolfgang Martin Analyst, ibond Partner und Ventana Research Advisor Data Management in SAP Environments Big Data What it is
More informationH T Tech nologies 2013
H T Technologies 2013 HOST: Eric Kavanagh THIS YEAR is Embedded Analytics Predictive analytics solutions exploit patterns found in data to identify risk and opportunities Embedded solutions can provide
More informationApplication Development. A Paradigm Shift
Application Development for the Cloud: A Paradigm Shift Ramesh Rangachar Intelsat t 2012 by Intelsat. t Published by The Aerospace Corporation with permission. New 2007 Template - 1 Motivation for the
More informationHow To Make A Cloud Based Computer Power Available To A Computer (For Free)
Cloud Compu)ng Adam Belloum Ins)tute of Informa)cs University of Amsterdam a.s.z.belloum@uva.nl High Performance compu)ng Curriculum, Jan 2015 hgp://www.hpc.uva.nl/ UvA- SURFsara What is Cloud Compu)ng?
More informationReal World Big Data Architecture - Splunk, Hadoop, RDBMS
Copyright 2015 Splunk Inc. Real World Big Data Architecture - Splunk, Hadoop, RDBMS Raanan Dagan, Big Data Specialist, Splunk Disclaimer During the course of this presentagon, we may make forward looking
More information