Teamproject & Project



Similar documents
A Survey on: Efficient and Customizable Data Partitioning for Distributed Big RDF Data Processing using hadoop in Cloud.

Big RDF Data Partitioning and Processing using hadoop in Cloud

Hadoop. MPDL-Frühstück 9. Dezember 2013 MPDL INTERN

HadoopRDF : A Scalable RDF Data Analysis System

HadoopSPARQL : A Hadoop-based Engine for Multiple SPARQL Query Answering

Hadoop Technology for Flow Analysis of the Internet Traffic

L1: Introduction to Hadoop

Industry 4.0 and Big Data

Semantic Web Standard in Cloud Computing

BiDAl: Big Data Analyzer for Cluster Traces

Workshop on Hadoop with Big Data

Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce

DATA MINING WITH HADOOP AND HIVE Introduction to Architecture

LDIF - Linked Data Integration Framework

JOURNAL OF COMPUTER SCIENCE AND ENGINEERING

Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science IBM Chief Scientist, Graph Computing. October 29th, 2015

Welcome to the unit of Hadoop Fundamentals on Hadoop architecture. I will begin with a terminology review and then cover the major components

How To Scale Out Of A Nosql Database

E6895 Advanced Big Data Analytics Lecture 4:! Data Store

Hadoop SNS. renren.com. Saturday, December 3, 11

Introduction to NoSQL Databases. Tore Risch Information Technology Uppsala University

Alejandro Vaisman Esteban Zimanyi. Data. Warehouse. Systems. Design and Implementation. ^ Springer

Big Data Analytics: Where is it Going and How Can it Be Taught at the Undergraduate Level?

Introduction to Hadoop HDFS and Ecosystems. Slides credits: Cloudera Academic Partners Program & Prof. De Liu, MSBA 6330 Harvesting Big Data

! E6893 Big Data Analytics Lecture 9:! Linked Big Data Graph Computing (I)

This exam contains 13 pages (including this cover page) and 18 questions. Check to see if any pages are missing.

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

Sentimental Analysis using Hadoop Phase 2: Week 2

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

BIG DATA - HADOOP PROFESSIONAL amron

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Image Search by MapReduce

A Brief Outline on Bigdata Hadoop

Analysis of MapReduce Algorithms

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Log Mining Based on Hadoop s Map and Reduce Technique

Hadoop & its Usage at Facebook

Lecture Data Warehouse Systems

BIG DATA IN BUSINESS ENVIRONMENT

Infomatics. Big-Data and Hadoop Developer Training with Oracle WDP

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

Integrating Open Sources and Relational Data with SPARQL

Introduction to Big data. Why Big data? Case Studies. Introduction to Hadoop. Understanding Features of Hadoop. Hadoop Architecture.

Hadoop. Sunday, November 25, 12

Analysis of Web Archives. Vinay Goel Senior Data Engineer

Software tools for Complex Networks Analysis. Fabrice Huet, University of Nice Sophia- Antipolis SCALE (ex-oasis) Team

Apache Hadoop FileSystem and its Usage in Facebook

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Big Data for Investment Research Management

Alexander Reinefeld, Thorsten Schütt, ZIB 1

Introduction to Hadoop

Hadoop Ecosystem B Y R A H I M A.

CAP4773/CIS6930 Projects in Data Science, Fall 2014 [Review] Overview of Data Science

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

Chapter 7. Using Hadoop Cluster and MapReduce

Data Refinery with Big Data Aspects

Open source Google-style large scale data analysis with Hadoop

Hadoop in the Enterprise

How To Write A Data Processing Pipeline In R

Apache Hama Design Document v0.6

Massive Cloud Auditing using Data Mining on Hadoop

Extreme Computing Second Assignment

ELIS Multimedia Lab. Linked Open Data. Sam Coppens MMLab IBBT - UGent

Hadoop and its Usage at Facebook. Dhruba Borthakur June 22 rd, 2009

COMP9321 Web Application Engineering

Architectures for massive data management

CSE 590: Special Topics Course ( Supercomputing ) Lecture 10 ( MapReduce& Hadoop)

Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware

A very short Intro to Hadoop

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, XLDB Conference at Stanford University, Sept 2012

How To Handle Big Data With A Data Scientist

How Companies are! Using Spark

Enhancing Map-Reduce Job Execution On Geo-Distributed Data Across Datacenters

DISCOVERING RESUME INFORMATION USING LINKED DATA

Open source large scale distributed data management with Google s MapReduce and Bigtable

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, UC Berkeley, Nov 2012

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Cloud Computing Summary and Preparation for Examination

Recommended Literature for this Lecture

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Online Content Optimization Using Hadoop. Jyoti Ahuja Dec

THE SEMANTIC WEB AND IT`S APPLICATIONS

Graph Database Performance: An Oracle Perspective

Application and practice of parallel cloud computing in ISP. Guangzhou Institute of China Telecom Zhilan Huang

Jeffrey D. Ullman slides. MapReduce for data intensive computing

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Bigdata : Enabling the Semantic Web at Web Scale

Using an In-Memory Data Grid for Near Real-Time Data Analysis

Semantically Enhanced Web Personalization Approaches and Techniques

Hadoop & its Usage at Facebook

Transcription:

2. November 2010 Teamproject & Project Organizer: Advisors: Prof. Dr. Georg Lausen Alexander Schätzle, Martin Przjyaciel-Zablocki, Thomas Hornung dbis

Study regulations Master: 16 ECTS 480 hours ~ 34h/week p.p. Bachelor: 6 ECTS 180 hours ~ 13h/week p.p. Team size: 3-4 students Project report Final presentation Workload of every student must be clearly distinguishable Project Requirements 2

Time & Place: Monday 14-17 pm (c.t.) Room: SR 00 007 (MMR), Building 106 Next Meeting: Monday, 8. November 2010 14-17 pm (c.t.) Room: SR 00 007 (MMR), Building 106 Further Schedule: Meeting with short presentations of all groups Further individual meetings upon consultation Schedule 3

Induction phase Until Tuesday, 2. November 2010 Today! Project placing + Classification of groups Short presentation 8. November 2010 Project introduction Milestones Internal work-sharing Implementation phase Programming & Documentation 10. / 17. January 2011: Status report of the Milestones (Meeting or Presentation) Final presentation 7. February 2011 Contribution of project report (14. February 2011) Schedule 4

Bachelor 5

Facebook (2010) 1 > 500 Million active users saving profiles, pictures, comments, news > 900 Million sites, groups, events, Usage: > 700 Billion minutes per month How can we handle and analyze such huge amounts of Data? Solution: Distributed-Computing? Source: (1) Facebook Press Room (22.09.2010) http://www.facebook.com/press/info.php?statistics 6

Analysis of social Networks Query Language with navigational capabilities Friend of a Friend (FOAF) Queries Search for shortest paths Means of expression Startnode (e.g. Chris ) Specification of edges (Location steps) (e.g. knows ) Filter (e.g. age = 18, gender = female) Shortest Paths 7

Data basis Graph of a social Network (Last.fm) Friendship relationships Properties of Users and Tracks Analyzing interesting characteristics Six-Degrees of Separation Representation: RDF-like Triples (no explicit URIs) Last.fm Musikdienst mit sozialem Netzwerk Freundschaften, Musiktitel, Hör-Profile, u.v.m.! 8

Peter knows Simon Peter knows Frank Peter country DE Peter age 25 Frank knows Chris Frank age 8... knows country age 8 60 DE Frank Chris Peter Simon 25 42 CH Klaus 9

Peter :: knows. Ergebnis Peter (knows) Frank Peter (knows) Simon Peter (knows) Klaus 8 knows country age 60 DE Frank Chris Peter Simon 25 42 CH Klaus 10

Peter :: knows > knows > age. Zwischenergebnisse Peter (knows) Frank Peter (knows) Klaus Peter (knows) Simon 8 knows country age 60 DE Frank Chris Peter Simon 25 42 CH Klaus 11

Peter :: knows > knows > age. Ergebnisse Peter (knows) Frank (knows) Chris (age) 60 Peter (knows) Klaus (knows) Simon (age) 42 Peter (knows) Simon 8 knows country age 60 DE Frank Chris Peter Simon 25 42 CH Klaus 12

Peter :: knows > country [equals(de)]. Zwischenergebnisse Peter (knows) Frank Peter (knows) Klaus Peter (knows) Simon 8 knows country age 60 DE Frank Chris Peter Simon 25 42 CH Klaus 13

Peter :: knows > country [equals(de)]. Ergebnis Peter (knows) Frank (country) DE Peter (knows) Klaus (country) CH Peter (knows) Simon (country) CH 8 knows country age 60 DE Frank Chris Peter Simon 25 42 CH Klaus 14

Peter :: knows(*3). Ergebnisse Peter (knows) Frank Peter (knows) Frank (knows) Chris Peter (knows) Klaus Peter (knows) Simon 8 knows country age 60 DE Frank Chris Peter Simon 25 42 CH Klaus 15

Realname Gender Country Age Duration User Track Album Artist Playcount 16

User knows: 3814884 topartists: 13101056 toptracks: 13264340 topalbums: 13130619 listenedto: 49830975 country: 180056 playcount: 269247 realname: 144040 gender: 268310 age: 153366 Album artist: 269900 tracks: 181624 playcount: 240347 Track artist: 271237 album: 181624 topfans: 8970531 duration: 271205 playcount: 271205 Artist tracks: 271234 album: 269898 topfans: 7439313 toptracks: 7728443 topalbums: 936749 similar: 59471056 17

Query Query Engine Parser/Interpreter Sequence of MapReduce Jobs Evaluation MapReduce-Framework HDFS Handling Large RDF Graphs with MapReduce 18

(1) Distributed Analysis of Social Networks Parsing of a Path Query Language Translating a Query in a Sequence of MapReduce-Jobs Storing (intermediate) results in the Cluster Execution of MapReduce-Jobs with Hadoop (2) Means of Expression Startnode Multiple Location Steps Filter Shortest Paths (3) Get experienced in handling MapReduce, HDFS, 19

Master (Master) Team Project 20

Study regulations Master: 16 ECTS 480 hours ~ 34h/week p.p. Recommendation: no parallel lectures! Team size: 3-4 students Project report Final presentation (Master) Team Project 21

Self-activated Investigation and Induction in the needed topics: Triple Stores SPARQL Resource Description Framework (RDF) MapReduce Hadoop Distributed Filesystem HBase Workload of every student must be clearly distinguishable (Master) Team Project 22

Goal Design and Implemenation of a distributed RDF Triple Store built on top of Hadoop (MapReduce-Framework) General Conditions SPARQL as Query Language (at least Basic Graph Patterns + Filter) Execution in the MapReduce-Framework (Hadoop) Storage strategy using HDFS or HBase (Master) Team Project 23

Now Group assignment: 3-4 students per Team Exchange contact informations (E-Mail, phone) Plan your next team internal meeting (as soon as possible) Until next meeting (8. November) Get to know the Project Task Investigation and conceptual Design Determine Milestones and Schedule (Recommendation: ~ 3 Milestones) Plan internal work-sharing (regarding individual skills) 8. November Short presentation of all groups (5-10 minutes) Content: Project introduction, Milestones and internal work-sharing (perhaps overview of the planned architecture) Next Meeting Monday, 8. November 2010 14-17 pm (c.t.) Room: SR 00 007 (MMR), Building 106