Distributed Data Management Summer Semester 2013 TU Kaiserslautern
|
|
- Samantha Cannon
- 7 years ago
- Views:
Transcription
1 Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel saarland.de 1
2 Lecture 8+ (DISTRIBUTED) DATA STREAM PROCESSING (INTRODUCTION) 2
3 So Far: Databases/NoSQL Datastores Data is changing, yes, but this is more due to inserts and update to stored data items Historic data is kept Queries operate on full data (tables) MapReduce is extreme, Write- once & Read- many 4mes Data warehousing, too: periodically loading data in store for deep(er) analy4cs Data mining 3
4 Data Stream Management vs. Tradi4onal Data Management Query & Results DATA Base/ Store Insert Update Delete At query 4me, data is accessed as a whole Data is persistently stored Queries are ad- hoc (mainly) 4
5 Data Stream Management vs. Tradi4onal Data Management (Cont d) results Set of queries DATA STREAM Data is moving! Con4nuously generated (assumed infinite!) At high pace Queries are (mainly) con4nuous (aka. standing). Registered once, observed forever. Answer to queries in (near) real- 4me required (o`en) Probabilis4c methods for efficiency or considering only part of the stream (sliding window) 5
6 Sensor Networks (Distributed) Sensor Networks; Smart Dust Mainly numeric measurements of (natural) phenomena Computa4on of queries like max, average, min, quan4les, value > τ 6
7 Mobile Ad- Hoc Networks Connec4on between sensors are ad- hoc Efficient and reliable rou4ng Understanding (changing) topology Power consump4on Also: Vehicular Ad- Hoc Netw. 7
8 Social Sensors Explicitly: Snow Tweets (hkp://snowcore.uwaterloo.ca/snowtweets/) #snowtweets 50.0 cm. at K1A 0A2 #snowtweets 10.0 in. at #snowtweets 4 cm at Palmerston North 4414 Implicitly: By men4oning topics, people, in social communica4on 4me 8
9 Earthquake News on Twiker source: hkp://blog.socialflow.com/ 9
10 Stock Market Real- 4me analysis of stock marked changes Compu4ng sta4s4cs over streams, e.g., for decision support Opportuni4es for reac4ng in real- 4me Even with fully automated means: algorithmic trading 10
11 DBMS vs. DSMS Database management system (DBMS) Persistent data (rela4ons) Random access One- 4me queries Data stream management system (DSMS) vola4le data streams Sequen4al access Con4nuous queries (theore4cally) unlimited secondary storage limited main memory Only the current state is relevant rela4vely low update rate Likle or no 4me requirements Assumes exact data Plannable query processing Considera4on of the order of the input poten4ally extremely high update rate Real- 4me requirements Assumes outdated/inaccurate data Variable data arrival and data characteris4cs h"p://en.wikipedia.org/wiki/data- stream_management_system 11
12 Data Stream Model Stream of data items is unbounded (available memory is not) No way to store en4re stream (how could we, its (probably) not ending) To compute query results, need to devise algorithm with likle memory consump4on 12
13 Overview of Data Stream Topics Synopses: concise representa4ons of stream content tailored to tasks, e.g., coun4ng dis4nct elements usually not exact, but approxima4ons (es4mators) of true values. Windows: focus of certain recent subset of data computa4on of func4ons/joins over window(s) content 13
14 Data Stream Mining: Teasers I tell you integer numbers between 1 and N I will tell all but one number A`er N- 1 numbers I ask: which number was missing. 14
15 Data Stream Mining: Teasers (Cont d) Keep Boolean array of length N: Mark posi4on for observed number Size required: N Computa4on at end: N to find missing number Much beker: keep sum of numbers: S Missing number is N*(N+1)/2 - S 15
16 Coun4ng Occurrences Consider a stream of elements a i, a 2, a 84, a 41, a 2, a 77, a 231, a 2, a 4, a 54, How o`en does a 2 occur? How to implement? Keep counter for each id Required space #ids (=N) Not feasible of N is very large 16
17 Probabilis4c Coun4ng: Count- Min Sketch Keep 2- dim array (h, r) h hash func4ons* that map to range 0 (r- 1) h h 2 h 3 h 4 Arriving item a For each j: array[j, h j (a)]++ Cormode, Muthukrishnan (2004). An Improved Data Stream Summary: The Count- Min Sketch and its Applica4ons. J. Algorithms 55:
18 Count- Min Sketch: Coun4ng h 1 h 2 h 3 h How o`en did we see item a? h 1 (a) = 4, h 2 (a)=5, h 3 (a)=0, h 4 (a)=2 Take minimum of the corresponding values in the 2- d array. Here: 4 Es4mate is never underes4ma4ng Overes4ma4on probabilis4cally bounded 18
19 Outlook Some more basics of stream processing Few more fundamentals of data stream mining Then, window based data stream management Then going distributed: Distributed DSMS Distributed (ad- hoc) sensor networks Scalable computa4on of massive data streams (think: Hadoop for data streams) 19
Distributed Data Management Summer Semester 2013 TU Kaiserslautern
Distributed Data Management Summer Semester 2013 TU Kaiserslautern Dr.- Ing. Sebas4an Michel smichel@mmci.uni- saarland.de 1 Lecture 1 MOTIVATION AND OVERVIEW 2 Distributed Data Management What does distributed
More informationIns+tuto Superior Técnico Technical University of Lisbon. Big Data. Bruno Lopes Catarina Moreira João Pinho
Ins+tuto Superior Técnico Technical University of Lisbon Big Data Bruno Lopes Catarina Moreira João Pinho Mo#va#on 2 220 PetaBytes Of data that people create every day! 2 Mo#va#on 90 % of Data UNSTRUCTURED
More informationData Stream Algorithms in Storm and R. Radek Maciaszek
Data Stream Algorithms in Storm and R Radek Maciaszek Who Am I? l Radek Maciaszek l l l l l l Consul9ng at DataMine Lab (www.dataminelab.com) - Data mining, business intelligence and data warehouse consultancy.
More informationData Mining on Streams
Data Mining on Streams Using Decision Trees CS 536: Machine Learning Instructor: Michael Littman TA: Yihua Wu Outline Introduction to data streams Overview of traditional DT learning ALG DT learning ALGs
More informationKeeping Pace with Big Data
- A Data Mining Perspec>ve Huan Liu, Tempe, AZ hep://www.public.asu.edu/~huanliu NSF Workshop on Big Data Analy6cs for Infrastructure and Building Resilience and Sustainability, Beijing, China Sept 19-20,
More informationMining Social Network Graphs
Mining Social Network Graphs Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata November 13, 17, 2014 Social Network No introduc+on required Really? We s7ll need to understand
More informationIbis: Scaling Python Analy=cs on Hadoop and Impala
Ibis: Scaling Python Analy=cs on Hadoop and Impala Wes McKinney, Budapest BI Forum 2015-10- 14 @wesmckinn 1 Me R&D at Cloudera Serial creator of structured data tools / user interfaces Mathema=cian MIT
More informationScalable Machine Learning - or what to do with all that Big Data infrastructure
- or what to do with all that Big Data infrastructure TU Berlin blog.mikiobraun.de Strata+Hadoop World London, 2015 1 Complex Data Analysis at Scale Click-through prediction Personalized Spam Detection
More informationSome Security Challenges of Cloud Compu6ng. Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo
Some Security Challenges of Cloud Compu6ng Kui Ren Associate Professor Department of Computer Science and Engineering SUNY at Buffalo Cloud Compu6ng: the Next Big Thing Tremendous momentum ahead: Prediction
More informationCloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman To motivate the Bloom-filter idea, consider a web crawler. It keeps, centrally, a list of all the URL s it has found so far. It
More informationBig Data & Scripting Part II Streaming Algorithms
Big Data & Scripting Part II Streaming Algorithms 1, 2, a note on sampling and filtering sampling: (randomly) choose a representative subset filtering: given some criterion (e.g. membership in a set),
More informationThe Library (Big) Data scien4st
The Library (Big) Data scien4st IFLA/ALA webinar: Big Data: new roles and opportuni4es for new librarians June 15 th 2016 IFLA Big Data Special Interest Group (SIG) Wouter Klapwijk, Stellenbosch University,
More informationBig Data : The New Oil
Big Data : The New Oil Data Driven Innova7on: $ 15 billion by 2015 Data Driven Innova,on for Growth and Well Being: OECD Report: Interim Synthesis Report 2014 The new data- driven era of scien7fic discovery
More informationMining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University http://www.mmds.org
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit
More informationrésumé de flux de données
résumé de flux de données CLEROT Fabrice fabrice.clerot@orange-ftgroup.com Orange Labs data streams, why bother? massive data is the talk of the town... data streams, why bother? service platform production
More informationAn Open Dynamic Big Data Driven Applica3on System Toolkit
An Open Dynamic Big Data Driven Applica3on System Toolkit Craig C. Douglas University of Wyoming and KAUST This research is supported in part by the Na3onal Science Founda3on and King Abdullah University
More informationSQream Technologies Ltd - Confiden7al
SQream Technologies Ltd - Confiden7al 1 Ge#ng Big Data Done On a GPU- Based Database Ori Netzer VP Product 26- Mar- 14 Analy7cs Performance - 3 TB, 18 Billion records SQream Database 400x More Cost Efficient!
More informationSummary of Cloud Compu.ng (CC) from the paper Abovce the Clouds: A Berkeley View of Cloud Compu.ng (Feb. 2009)
Summary of Cloud Compu.ng (CC) from the paper Abovce the Clouds: A Berkeley View of Cloud Compu.ng (Feb. 2009) Defini.ons (I) Cloud Compu)ng refers to both the applica)ons delivered as services over the
More informationData Warehousing. Yeow Wei Choong Anne Laurent
Data Warehousing Yeow Wei Choong Anne Laurent Databases Databases are developed on the IDEA that DATA is one of the cri>cal materials of the Informa>on Age Informa>on, which is created by data, becomes
More informationINCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION. Danyel Fisher Microso0 Research
INCREMENTAL, APPROXIMATE DATABASE QUERIES AND UNCERTAINTY FOR EXPLORATORY VISUALIZATION Danyel Fisher Microso0 Research Exploratory Visualiza9on Ini9al Query Process query Get a response Change parameters
More informationBig Data Visualiza9on
Big Data Visualiza9on Dr. Steve Cutchin Associate Professor Computer Science 2012 Boise State University 1 Computer Science Department 10 Faculty + 3 Lectures + 2 New hires. 400 Undergraduates Enrolled
More informationNextGen Infrastructure for Big DATA Analytics.
NextGen Infrastructure for Big DATA Analytics. So What is Big Data? Data that exceeds the processing capacity of conven4onal database systems. The data is too big, moves too fast, or doesn t fit the structures
More informationBig Data and Clouds: Challenges and Opportuni5es
Big Data and Clouds: Challenges and Opportuni5es NIST January 15 2013 Geoffrey Fox gcf@indiana.edu h"p://www.infomall.org h"p://www.futuregrid.org School of Informa;cs and Compu;ng Digital Science Center
More informationHortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc. 2011 2014. All Rights Reserved
Hortonworks & SAS Analytics everywhere. Page 1 A change in focus. A shift in Advertising From mass branding A shift in Financial Services From Educated Investing A shift in Healthcare From mass treatment
More informationInfrastructures for big data
Infrastructures for big data Rasmus Pagh 1 Today s lecture Three technologies for handling big data: MapReduce (Hadoop) BigTable (and descendants) Data stream algorithms Alternatives to (some uses of)
More informationData Mining for Knowledge Management. Mining Data Streams
Data Mining for Knowledge Management Mining Data Streams Themis Palpanas University of Trento http://dit.unitn.it/~themis Spring 2007 Data Mining for Knowledge Management 1 Motivating Examples: Production
More informationCloud Compu?ng & Big Data in Higher Educa?on and Research: African Academic Experience
3 rd SG13 Regional Workshop for Africa on ITU- T Standardiza?on Challenges for Developing Countries Working for a Connected Africa (Livingstone, Zambia, 23-24 February 2015) Cloud Compu?ng & Big Data in
More informationCS 4604: Introduc0on to Database Management Systems. B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1
CS 4604: Introduc0on to Database Management Systems B. Aditya Prakash Lecture #5: En-ty/Rela-onal Models- - - Part 1 Announcements- - - Project Goal: design a database system applica-on with a web front-
More informationCS 378 Big Data Programming. Lecture 5 Summariza9on Pa:erns
CS 378 Big Data Programming Lecture 5 Summariza9on Pa:erns Review Assignment 2 Ques9ons? If you d like to use guava (Google collec9ons classes) pom.xml available for assignment 2 Includes dependency for
More informationReal Time Analytics for Big Data. NtiSh Nati Shalom @natishalom
Real Time Analytics for Big Data A Twitter Inspired Case Study NtiSh Nati Shalom @natishalom Big Data Predictions Overthe next few years we'll see the adoption of scalable frameworks and platforms for
More informationLake Tuggeranong College Unit Outline
Lake Tuggeranong College Unit Outline COURSE: Informa*on Technology Unit: Computer Games Programming and Design Year: 2014 Session: Semester 2 Standard Units: 1.0 Class Code(s): 13639 (T) & 13684 (A) Teachers(s):
More informationHigh Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL. The Big Data Engine
High Performance Big Data Analy5cs powered by Unique Web Accelera5on and NoSQL Foster City, CA July 31, 2012 Big Data requires new thinking The challenges and opportuni5es of Big Data Big Data requires
More informationResearch at the Department of Computer Science and Software Engineering. Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014
Research at the Department of Computer Science and Software Engineering Professor Yong Yue BEng, PhD, CEng, FIET, FIMechE 17 October 2014 Research Areas Ar%ficial intelligence Robo%cs Data mining Image
More informationData Management in the Cloud: Limitations and Opportunities. Annies Ductan
Data Management in the Cloud: Limitations and Opportunities Annies Ductan Discussion Outline: Introduc)on Overview Vision of Cloud Compu8ng Managing Data in The Cloud Cloud Characteris8cs Data Management
More informationPhone Systems Buyer s Guide
Phone Systems Buyer s Guide Contents How Cri(cal is Communica(on to Your Business? 3 Fundamental Issues 4 Phone Systems Basic Features 6 Features for Users with Advanced Needs 10 Key Ques(ons for All Buyers
More informationUnderstanding Cloud Compu2ng Services. Rain in business success with amazing solu2ons in Cloud technology
Understanding Cloud Compu2ng Services Rain in business success with amazing solu2ons in Cloud technology What is Cloud Compu2ng? Cloud compu2ng encompasses various services and ac2vi2es carried out over
More informationChE-1800 H-2: Flowchart Diagrams (last updated January 13, 2013)
ChE-1800 H-2: Flowchart Diagrams (last updated January 13, 2013) This handout contains important information for the development of flowchart diagrams Common Symbols for Algorithms The first step before
More informationCONTENTS. Introduc on 2. Undergraduate Program 4. BSC in Informa on Systems 4. Graduate Program 7. MSC in Informa on Science 7
1 1 2 CONTENTS Introducon 2 Undergraduate Program 4 BSC in Informaon Systems 4 Graduate Program 7 MSC in Informaon Science 7 MSC in Health Informacs 13 2 3 Introducon The School of Informaon Science at
More informationMap- reduce, Hadoop and The communica3on bo5leneck. Yoav Freund UCSD / Computer Science and Engineering
Map- reduce, Hadoop and The communica3on bo5leneck Yoav Freund UCSD / Computer Science and Engineering Plan of the talk Why is Hadoop so popular? HDFS Map Reduce Word Count example using Hadoop streaming
More informationB669 Sublinear Algorithms for Big Data
B669 Sublinear Algorithms for Big Data Qin Zhang 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index of over 19 billion web pages : over 40 billion of
More informationData Mining. Supervised Methods. Ciro Donalek donalek@astro.caltech.edu. Ay/Bi 199ab: Methods of Computa@onal Sciences hcp://esci101.blogspot.
Data Mining Supervised Methods Ciro Donalek donalek@astro.caltech.edu Supervised Methods Summary Ar@ficial Neural Networks Mul@layer Perceptron Support Vector Machines SoLwares Supervised Models: Supervised
More informationContent Distribu-on Networks (CDNs)
Content Distribu-on Networks (CDNs) Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:0am in Architecture N101 hjp://www.cs.princeton.edu/courses/archive/spr12/cos461/ Second Half of the Course
More informationImproving Query Performance of Holistic Aggregate Queries for Real-Time Data Exploration
July, 2014 Master Thesis Improving Query Performance of Holistic Aggregate Queries for Real-Time Data Exploration Dennis Pallett (s0167304) Faculty of Electrical Engineering, Mathematics and Computer Science
More information978-1-4799-0913-1/14/$31.00 2014 IEEE
This paper introduces CMDB pa4erns as an approach to help address conceptual issues in CMDB implementa7ons and provide prac77oners with a common set of terms for useful designs. Configura7on Management
More informationFixed Scope Offering (FSO) for Oracle SRM
Fixed Scope Offering (FSO) for Oracle SRM Agenda iapps Introduc.on Execu.ve Summary Business Objec.ves Solu.on Proposal Scope - Business Process Scope Applica.on Implementa.on Methodology Time Frames Team,
More informationBIG DATA AND INVESTIGATIVE ANALYTICS
The New Fron+er BIG DATA AND INVESTIGATIVE ANALYTICS A Publication of Infobright Table of Contents Introduc+on 3 Chapter 1: What Is Inves+ga+ve Analy+cs?. 4 Chapter 2: Top Five Requirements for Inves+ga+ve
More informationAlgorithmic Aspects of Big Data. Nikhil Bansal (TU Eindhoven)
Algorithmic Aspects of Big Data Nikhil Bansal (TU Eindhoven) Algorithm design Algorithm: Set of steps to solve a problem (by a computer) Studied since 1950 s. Given a problem: Find (i) best solution (ii)
More informationInternet of Things (IoT) CSE237A Introduc1on to Embedded Compu1ng
Internet of Things (IoT) CSE237A Introduc1on to Embedded Compu1ng Outline Introduc1on to IoT Enabling technologies Open problems and future challenges Applica1ons 2 What is IoT? A phenomenon which connects
More informationIntroduc)on to Map- Reduce. Vincent Leroy
Introduc)on to Map- Reduce Vincent Leroy Sources Apache Hadoop Yahoo! Developer Network Hortonworks Cloudera Prac)cal Problem Solving with Hadoop and Pig Slides will be available at hgp://lig- membres.imag.fr/leroyv/
More informationB2B Offerings. Helping businesses op2mize. Infolob s amazing b2b offerings helps your company achieve maximum produc2vity
B2B Offerings Helping businesses op2mize Infolob s amazing b2b offerings helps your company achieve maximum produc2vity What is B2B? B2B is shorthand for the sales prac4ce called business- to- business
More informationHBase Schema Design. NoSQL Ma4ers, Cologne, April 2013. Lars George Director EMEA Services
HBase Schema Design NoSQL Ma4ers, Cologne, April 2013 Lars George Director EMEA Services About Me Director EMEA Services @ Cloudera ConsulFng on Hadoop projects (everywhere) Apache Commi4er HBase and Whirr
More informationLinux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster. A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech
Linux Clusters Ins.tute: Turning HPC cluster into a Big Data Cluster Fang (Cherry) Liu, PhD fang.liu@oit.gatech.edu A Partnership for an Advanced Compu@ng Environment (PACE) OIT/ART, Georgia Tech Targets
More information14.1 Rent-or-buy problem
CS787: Advanced Algorithms Lecture 14: Online algorithms We now shift focus to a different kind of algorithmic problem where we need to perform some optimization without knowing the input in advance. Algorithms
More informationHadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow
Hadoop Internals for Oracle Developers and DBAs Exploring the HDFS and MapReduce Data Flow Tanel Põder Enkitec h=p:// h=p://blog.tanelpoder.com @tanelpoder 1 Intro: About me Tanel Põder Former Oracle Database
More informationCloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman
Cloud and Big Data Summer School, Stockholm, Aug. 2015 Jeffrey D. Ullman 2 In a DBMS, input is under the control of the programming staff. SQL INSERT commands or bulk loaders. Stream management is important
More informationTheo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za
Theo JD Bothma Department of Informa1on Science theo.bothma@up.ac.za Reflec1ons on the role of corpora and big data in e- lexicography in rela1on to end user informa1on needs CILC 2015 7th Interna1onal
More informationWireless and Mobile Networks
Wireless and Mobile Networks Reading: Sec7ons 2.8 and 4.2.5 COS 461: Computer Networks Spring 2009 (MW 1:30 2:50 in COS 105) Mike Freedman Teaching Assistants: WyaO Lloyd and Jeff Terrace hop://www.cs.princeton.edu/courses/archive/spring09/cos461/
More informationCombining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007
Combining Sequence Databases and Data Stream Management Systems Technical Report Philipp Bichsel ETH Zurich, 2-12-2007 Abstract This technical report explains the differences and similarities between the
More informationLUNCH & LEARN ADWORDS (BASICS) Network808 & German Google Guy Daniel Hildebrandt
LUNCH & LEARN ADWORDS (BASICS) Network808 & German Google Guy Daniel Hildebrandt AGENDA What is the difference between SEO / SEM? What is Google AdWords? Important AdWords vocabulary Best structure for
More information8. 網路流量管理 Network Traffic Management
8. 網路流量管理 Network Traffic Management Measurement vs. Metrics end-to-end performance topology, configuration, routing, link properties state active measurements active routes active topology link bit error
More informationANALYTICAL TECHNIQUES FOR DATA VISUALIZATION
ANALYTICAL TECHNIQUES FOR DATA VISUALIZATION CSE 537 Ar@ficial Intelligence Professor Anita Wasilewska GROUP 2 TEAM MEMBERS: SAEED BOOR BOOR - 110564337 SHIH- YU TSAI - 110385129 HAN LI 110168054 SOURCES
More informationThe DATA Difference Targe.ng for Stronger ROI!
The DATA Difference Targe.ng for Stronger ROI! Presented by: Dr. John Leininger Department of Graphic Communica
More informationHow To Understand Cloud Compueng
Data Management in the Cloud Introduc)on (Lecture 1) Do one thing every day that scares you. Eleanor Roosevelt 1 Data Management in the Cloud LOGISTICS AND ORGANIZATION 2 Kris)n TuCe FAB 115-09 Personnel
More informationYOUR PROCESS MANAGEMENT AND CONTROLLING SUITE FOR MULTI-CHANNEL ONLINE MARKETING.!
YOUR PROCESS MANAGEMENT AND CONTROLLING SUITE FOR MULTI-CHANNEL ONLINE MARKETING.! AGENDA! 1. Challenges of Online Marke3ng 2. Applicata helps 3. Benefit and Pricing 4. About us! DIFFERENT STAKEHOLDER
More informationLoad Balancing in Stream Processing Engines. Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco
Engines Anis Nasir Coauthors: Gianmarco, Nicolas, David, Marco Stream Processing Engines Online Machine Learning Real Time Query Processing ConCnuous ComputaCon Distributed RPC 2 Stream Processing Engines
More informationDDC Sequencing and Redundancy
DDC Sequencing and Redundancy Presenter Sequencing Importance of sequencing Essen%al piece to designing and delivering a successful project Defines how disparate components interact to make up a system
More informationThe basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
More informationCloud Computing Summary and Preparation for Examination
Basics of Cloud Computing Lecture 8 Cloud Computing Summary and Preparation for Examination Satish Srirama Outline Quick recap of what we have learnt as part of this course How to prepare for the examination
More informationModeling and mining large scale biological seman0c networks using NEO4J
Modeling and mining large scale biological seman0c networks using NEO4J Junaid Gamieldien Principal Inves.gator Clinical Sequencing and Biomarker Discovery Neo4J Graph database Graph is composed of two
More informationD- t- Science, Big D- t- and Sta/s/cs: Can we all live together? Chalmers Ini-a-ve Seminar on Big Data Tuesday 25th March 2014
D- t- Science, Big D- t- and Sta/s/cs: Can we all live together? Chalmers Ini-a-ve Seminar on Big Data Tuesday 25th March 2014 1 Big data is like teenage sex: everyone talks about it, nobody really knows
More informationSocial Media Monitoring by Using Data Mining. Fuat Basık
Social Media Monitoring by Using Data Mining Fuat Basık Presentation Plan Introduc0on Mo0va0on Stream Processing Data Set Turkish Language Pre Processing and Stemming Term Frequency and Inverse Document
More informationUniversal hashing. In other words, the probability of a collision for two different keys x and y given a hash function randomly chosen from H is 1/m.
Universal hashing No matter how we choose our hash function, it is always possible to devise a set of keys that will hash to the same slot, making the hash scheme perform poorly. To circumvent this, we
More informationApache Hadoop: The Pla/orm for Big Data. Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah
Apache Hadoop: The Pla/orm for Big Data Amr Awadallah CTO, Founder, Cloudera, Inc. aaa@cloudera.com, twicer: @awadallah 1 The Problems with Current Data Systems BI Reports + Interac7ve Apps RDBMS (aggregated
More informationSDN Controller Requirement
SDN Controller Requirement draft-gu-sdnrg-sdn-controller-requirement-00 Rong Gu (Presenter) Chen Li China Mobile Background l Public Cloud && Private Cloud in China Mobile Public Cloud (ecloud.10086.cn)
More informationBig Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel
Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined
More informationFinding Value and Being Valuable in the Trough of Disillusionment
Finding Value and Being Valuable in the Trough of Disillusionment Jordan McIver uk.linkedin.com/in/jmcdatascience February 2016 Agenda Why you should stay awake What will it take Give me a break Agenda
More informationDistributed Aggregation in Cloud Databases. By: Aparna Tiwari tiwaria@umail.iu.edu
Distributed Aggregation in Cloud Databases By: Aparna Tiwari tiwaria@umail.iu.edu ABSTRACT Data intensive applications rely heavily on aggregation functions for extraction of data according to user requirements.
More informationCS 5150 So(ware Engineering So(ware Development in Prac9ce
Cornell University Compu1ng and Informa1on Science CS 5150 So(ware Engineering So(ware Development in Prac9ce William Y. Arms Overall Aim of the Course We assume that you are technically proficient. You
More informationCS 378 Big Data Programming. Lecture 2 Map- Reduce
CS 378 Big Data Programming Lecture 2 Map- Reduce MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is processed But viewed in small increments
More informationDistributed Data Management Summer Semester 2015 TU Kaiserslautern
Distributed Data Management Summer Semester 2015 TU Kaiserslautern Prof. Dr.-Ing. Sebastian Michel Databases and Information Systems Group (AG DBIS) http://dbis.informatik.uni-kl.de/ Distributed Data Management,
More informationDiscrete Optimization
Discrete Optimization [Chen, Batson, Dang: Applied integer Programming] Chapter 3 and 4.1-4.3 by Johan Högdahl and Victoria Svedberg Seminar 2, 2015-03-31 Todays presentation Chapter 3 Transforms using
More informationBig Data Technology Map-Reduce Motivation: Indexing in Search Engines
Big Data Technology Map-Reduce Motivation: Indexing in Search Engines Edward Bortnikov & Ronny Lempel Yahoo Labs, Haifa Indexing in Search Engines Information Retrieval s two main stages: Indexing process
More informationDecision Trees from large Databases: SLIQ
Decision Trees from large Databases: SLIQ C4.5 often iterates over the training set How often? If the training set does not fit into main memory, swapping makes C4.5 unpractical! SLIQ: Sort the values
More informationCIS 700: algorithms for Big Data
CIS 700: algorithms for Big Data Lecture 6: Graph Sketching Slides at http://grigory.us/big-data-class.html Grigory Yaroslavtsev http://grigory.us Sketching Graphs? We know how to sketch vectors: v Mv
More informationBig data coming soon... to an NSI near you. John Dunne. Central Statistics Office (CSO), Ireland John.Dunne@cso.ie
Big data coming soon... to an NSI near you John Dunne Central Statistics Office (CSO), Ireland John.Dunne@cso.ie Big data is beginning to be explored and exploited to inform policy making. However these
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationSOAL-SOAL MICROSOFT EXCEL 1. The box on the chart that contains the name of each individual record is called the. A. cell B. title C. axis D.
SOAL-SOAL MICROSOFT EXCEL 1. The box on the chart that contains the name of each individual record is called the. A. cell B. title C. axis D. legend 2. If you want all of the white cats grouped together
More informationCSE373: Data Structures & Algorithms Lecture 14: Hash Collisions. Linda Shapiro Spring 2016
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions Linda Shapiro Spring 2016 Announcements Friday: Review List and go over answers to Practice Problems 2 Hash Tables: Review Aim for constant-time
More information5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1
5 INTEGER LINEAR PROGRAMMING (ILP) E. Amaldi Fondamenti di R.O. Politecnico di Milano 1 General Integer Linear Program: (ILP) min c T x Ax b x 0 integer Assumption: A, b integer The integrality condition
More informationLecture 14: Managing Mobility. Rik Sarkar
Lecture 14: Managing Mobility Rik Sarkar Communica9on with mobile nodes How do cell phones work? How can we talk to mobile nodes without the cell infrastructure? How do mobile phones work? There are large
More informationEvaluation of NoSQL databases for large-scale decentralized microblogging
Evaluation of NoSQL databases for large-scale decentralized microblogging Cassandra & Couchbase Alexandre Fonseca, Anh Thu Vu, Peter Grman Decentralized Systems - 2nd semester 2012/2013 Universitat Politècnica
More informationFinancial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries
Financial, Telco, Retail, & Manufacturing: Hadoop Business Services for Industries Ho Wing Leong, ASEAN 1 Cloudera company snapshot Founded Company Employees Today World Class Support Mission CriQcal 2008,
More informationPromo%ng Your OCS Business through Digital & Social Media. Presented by: John Healy
Promo%ng Your OCS Business through Digital & Social Media Presented by: John Healy Who Is John Healy? jhealy@healyco.com 25+ years in marke;ng, communica;ons & PR consul;ng Specialty: Helping clients balance
More informationData Warehousing and Analytics Infrastructure at Facebook. Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com
Data Warehousing and Analytics Infrastructure at Facebook Ashish Thusoo & Dhruba Borthakur athusoo,dhruba@facebook.com Overview Challenges in a Fast Growing & Dynamic Environment Data Flow Architecture,
More informationReal Time Analy:cs for Big Data Lessons Learned from Facebook
SINGLE PLATFORM. COMPLETE SCALABILITY. Real Time Analy:cs for Big Data Lessons Learned from Facebook @uri1803 Head of Product GigaSpaces About Me MTBK Junky A Proud Dad Technology addict Head of Product
More informationDriving MySQL to Big Data Scale. Thomas Hazel Founder, Chief Scien@st thomas@deepis.com
Driving MySQL to Big Data Scale Thomas Hazel Founder, Chief Scien@st thomas@deepis.com Millions to Billions to Trillions Agenda Driving MySQL to Big Data Scale Market Trends Hardware Trends Current Computer
More informationHow To Make A Cloud Based Computer Power Available To A Computer (For Free)
Cloud Compu)ng Adam Belloum Ins)tute of Informa)cs University of Amsterdam a.s.z.belloum@uva.nl High Performance compu)ng Curriculum, Jan 2015 hgp://www.hpc.uva.nl/ UvA- SURFsara What is Cloud Compu)ng?
More informationAurora: a new model and architecture for data stream management
Aurora: a new model and architecture for data stream management Daniel J. Abadi 1, Don Carney 2, Ugur Cetintemel 2, Mitch Cherniack 1, Christian Convey 2, Sangdon Lee 2, Michael Stonebraker 3, Nesime Tatbul
More informationCS 378 Big Data Programming
CS 378 Big Data Programming Lecture 2 Map- Reduce CS 378 - Fall 2015 Big Data Programming 1 MapReduce Large data sets are not new What characterizes a problem suitable for MR? Most or all of the data is
More information