Einführungsveranstaltung Seminar Web Science und Labor Web-Technologien

Size: px
Start display at page:

Download "Einführungsveranstaltung Seminar Web Science und Labor Web-Technologien"

Transcription

1 Einführungsveranstaltung Seminar Web Science und Labor Web-Technologien Fachgebiet Wissensbasierte Systeme/Forschungszentrum L3S 16. Oktober 2015

2 Seminar Web Science Wer hat Interesse am Seminar? Bitte Name und Wunschthema in Liste eintragen Kontakt: Ablauf with your mentor, you search, select, read and discuss one or more recent conference papers at the end of the semester you give a minute presentation (including questions) on the selected papers in addition, you should attend the L3S research seminars, which are usually on Friday afternoon at 2:00 PM in the Multimedia Room of the Appelstraße 9A, 15th floor (they are about every two weeks, if you are sure that you will attend the seminar, please subscribe to the mailinglist https: //

3 Labor Web Technologien und Bachelor-/Master-Themen Agenda Ablauf des Labors Vorstellung von Themen siehe auch Verteilung von Themen

4 Ablauf individuelle Themen Bearbeitung einzeln oder in Kleingruppen regelmäßiger Kontakt zum Betreuer 6CP 180 Stunden Arbeitsaufwand Abschluss: Fragen? d.h., bei 14 Wochen pro Semester ca. 13 Stunden/Woche kurze schriftliche Dokumentation (max. 5 Seiten) Präsentation (5 Minuten)

5 Helge Holzmann Online Web-Archive-Search based on Twitter Labor Web Technologien JavaScript! Contact:

6 Andrea Ceroni Automatic Event Validation (Laboratory) Personal Photo Selection (Laboratory/Master) Contact:

7 Andrea Ceroni Automatic Event Validation Event Validation: determining whether an event occurs in a document or corpus (Kim Clijsters, Li Na, Melbourne) on [ to ]??? Recently we have automatized event validation, proposing a supervised model to predict the occurrence of events in a non-annotated corpus [1] Web Laboratory Goal: build a web user interface to showcase the method: Specifying events Retrieving web pages Applying automatic event validation Showing results [1] A. Ceroni, U. K. Gadiraju, M. Fisichella. Improving Event Detection by Automatically Assessing Validity of Event Occurrence in Text. In CIKM 2015.

8 Andrea Ceroni Personal Photo Selection (1) Photo taking is effortless and tolerated nearly everywhere We end up with hundreds of photos taken during one event (e.g. holiday trip) Recently, we have proposed a method to automatically select most important photos from personal collections [1], to keep them enjoyable and accessible Web Laboratory Goal 1: build a web user interface to showcase the method: Importing photo collections Performing automatic selections Allowing the user to revise the selection Master Thesis Web Laboratory If interested, we can think about possible thesis on the topic Goal 2: build a web user interface to acquire labeled data: Uploading photo collection Browsing them **Manually** selecting important photos [1] A. Ceroni, V. Solachidis, C. Niederée, O. Papadopoulou, N. Kanhabua, V. Mezaris. To Keep or not to Keep: An Expectation-oriented Photo Selection Method for Personal Photo Collections. In ICMR 2015

9 Andrea Ceroni Personal Photo Selection (2) Recent work have been conducted on automatically attaching tags and textual captions to images, using deep learning [1,2] This information can be incorporated in the selection model Some code has been made publicly available by Stanford University [3] Web Laboratory Goals: - provide re-use of the available code as a black box - Integrate the code in the selection model [1] A. Karpathy, F. Li. Deep Visual-Semantic Alignments for Generating Image Descriptions. In CoRR [2] J. Fu, T. Mei, K. Yang, H. Lu, Y. Rui. Tagging Personal Photos with Transfer Deep Learning. In WWW [3]

10 Ujwal Gadiraju Inducing competence-based self-selection of microtasks in the Crowd ABSTRACT: One of the primary concerns in paid microtask crowdsourcing systems is that of quality and reliability of the results produced. In this work, we aim to improve the effectiveness of the crowdsourcing paradigm (in terms of the quality of results produced, turnover time of the task) by inducing competence-based self-selection of microtasks among crowd workers. This means that workers would only work on the tasks that they believe they can successfully complete based on their competence/skills. VISION: In order to ensure that workers refrain from participating in microtasks that are beyond their competence, they first need to be aware of their limitations. By providing workers with an assessment of their competence in particular microtasks, we hypothesize that workers can better select microtasks which are suitable to their competence. Crowdsourcing marketplaces can greatly benefit from this, by training their workforce to progress towards higher competence and improved reputation. This in turn would help workers to qualify for a larger spectrum of tasks, resulting in a greater turnover for workers. Laboratory or Bachelor/Master thesis, Contact: gadiraju@l3s.de

11 Gerhard Gossen Evaluation of Crawler Queues Laboratory Web Technologies or Master Thesis Contact:

12 Asmelash Teka Hadgu Web Application for Exploring Scholarly Communication Ranking Scholarly Articles Labor Web Technologien Contact:

13 Asmelash Teka Hadgu Web Application for Exploring Scholarly Communication With the sheer amount of scientific publications coming out these days, it is easy to miss out relevant publications. This is hard because it is not easy to track what is being published by going into different digital libraries. The aim of this project is to design an adaptive web application that brings an enjoyable experience to explore scientific articles (mainly titles, abstract, authors, Twitter mentions, etc.). Core functionalities: browse recent/popular/seminal scientific articles show related entities (abstract, author(s), venue, related articles) browse publications by user browse tweets mentioning articles explore other related tweets Provide a REST API for other apps to leverage data. Preferred Tools: Python, Javascript (D3.js), HTML5 What s in it for you? Accelerated learning through coding A potential for a large scale application Demo publication

14 Asmelash Teka Hadgu Ranking Scholarly Articles The goal of this project is to compute the query-independent importance of scholarly articles, using a huge academic graph. In particular, we re interested in developing novel methods that give the best static rank values (e.g., better than PageRank) for scientific articles in a machine learning to rank framework by generate features that tell good (poor) quality papers. Examples: Consider reputation of venues Weighted citations Leveraging social signals (mentions on Twitter) Tools: Python, R, or Scala. Spark is a plus What s in it for you? Dealing with web scale academic graph data Potential for a publication

15 Christoph Hube social networks and dynamic graphs (Laboratory/Bachelor/Master) event detection and prediction using stream data for the financial domain (Laboratory/Bachelor/Master) Contact:

16 Christoph Hube Qualimaster o European Project, runs , qualimaster.eu o Real Time Stream Data Analysis (esp. in the financial domain) o Example Tasks: Implement an Algorithm for Event Detection/Event Prediction (Laborprojekt) Create an Application for Dynamic Graphs (Bachelor, Master, ITIS)» Contact: hube@l3s.de QualiMaster Project, GA Meeting, September

17 Robert Jäschke Themen für das Labor Web Technologien im Umfeld von BibSonomy ( 1. verteilte Batch-Infrastruktur für Nutzerstatistiken (Java) 2. Anbindung für Jekyll-Scholar (Ruby) 3. Add-On für GoogleDocs (JavaScript) 4. Import von Publikationsmetadaten aus ORCID (Java) Bachelor- bzw. Masterarbeit: 1. Zeitliche Klassifikation von archivierten Webseiten siehe Kontakt:

18 Robert Jäschke

19 Philipp Kemkes, Ivana Marenzi, Zeon Trevor Fernando Integration of an online text editor (like Etherpad) into Learnweb Laboratory Project or Bachelor/Master thesis The aim of this project/thesis is to replace Google docs in Learnweb. We are looking for a feature rich text editor that allows real-time collaboration and extensive logging. First the student should compare existing open source solutions [1]. Finally the selected software must be integrated into Learnweb [2] this includes a single sign on mechanism, an interface for the creation and deletion of documents and aggregation of usage logs. Optional topic extension: Implementation of a general solution to integrate collaboration tools based on sandstorm.io [1] [2] Contact: kemkes@l3s.de, marenzi@l3s.de, fernando@l3s.de

20 Philipp Kemkes, Ivana Marenzi, Zeon Trevor Fernando 1

21 Philipp Kemkes, Ivana Marenzi, Zeon Trevor Fernando LearnWeb Project Integration of Open Source Feature Rich Text Editor Editor Features Real-time collaboration support Logging capabilities Ease of integration Ease of maintenance Single sign-on mechanism Creation and deletion of documents Etherpad Contact: Dr Ivana Marenzi 2

22 Tuan Tran Scalable Ad-hoc Entity Linking of Wikipedia Revision Using Apache Spark and Hedera This project aims at building a large scale system that can efficiently extract entities from Wikipedia Revision History Dataset, using Apache Spark and Hedera frameworks. The student(s) will implement state-of-the-art algorithms of entity linking and apply in large scale (650 GB in.bz2 compression), taking into account time constraints and memory requirements. These algorithms typically rely on graph-based methods, optimizing the coherence between nodes as well as the evolving of their attributes, as information are updated in Wikipedia over time. The student(s) will have an opportunity to gain hands-on experience with big data experts, and to work with the big computing cluster in L3S. The project will benefit several L3S projects (Alexandria, Eumssi, etc.), and will target a scientific publication beginning of next summer. Contact: ttran@l3s.de

23 Tu Ngoc Nguyen ehumanities Toolbox web laboratory or ITIS project Contact:

24 Überblick (L = Laboratory, B = Bachelor, M = Master) Andrea Ceroni (ceroni@l3s.de) 1. Automatic Event Validation (L) / 2. Personal Photo Selection (L, M) Ujwal Gadiraju (gadiraju@l3s.de) Competence in Crowd Microtasks (L, B, M) Asmelash Teka Hadgu (teka@l3s.de) 1. Web Application for Exploring Scholarly Communication (L) 2. Ranking Scholarly Articles (L) Helge Holzmann (holzmann@l3s.de): Online Web-Archive-Search w Twitter (L) Christoph Hube (hube@l3s.de) 1. social networks and dynamic graphs (L, B, M) 2. event detection+prediction with streams for financial data (L, B, M) Robert Jäschke (jaeschke@l3s.de) 1. verteilte Batch-Infrastruktur für Nutzerstatistiken (L) 2. BibSonomy-Anbindung für Jekyll-Scholar (L) 3. BibSonomy-Add-On für GoogleDocs (L) 4. Import von Publikationsmetadaten aus ORCID (L) 5. Zeitliche Klassifikation von archivierten Webseiten (B, M) Philipp Kemkes, Ivana Marenzi, Zeon Trevor Fernando (kemkes@l3s.de) 1. Integration of an online text editor into Learnweb (L, B, M) Tu Ngoc Nguyen (tunguyen@l3s.de): ehumanities Toolbox (L, M) Gerhard Gossen (gossen@l3s.de): Evaluation of Crawler Queues (L, M) Tuan Tran (ttran@l3s.de) 1. Scalable Ad-hoc Entity Linking of Wikipedia Revision (M)

25 Verteilung der Themen kurzer Überblick per Handzeichen falls ein Thema überlaufen: bitte einigen Eintragen in Liste: Thema + Alternativthema auswählen selbständig Betreuer kontaktieren Deadline: Freitag,

COMP9321 Web Application Engineering

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411

More information

Big Data and Analytics: Challenges and Opportunities

Big Data and Analytics: Challenges and Opportunities Big Data and Analytics: Challenges and Opportunities Dr. Amin Beheshti Lecturer and Senior Research Associate University of New South Wales, Australia (Service Oriented Computing Group, CSE) Talk: Sharif

More information

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme

Big Data Analytics. Prof. Dr. Lars Schmidt-Thieme Big Data Analytics Prof. Dr. Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany 33. Sitzung des Arbeitskreises Informationstechnologie,

More information

Web Archiving and Scholarly Use of Web Archives

Web Archiving and Scholarly Use of Web Archives Web Archiving and Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 April 2013 Overview 1. Introduction 2. Access and usage: UK Web Archive 3. Scholarly feedback on

More information

TYPESAFE TOGETHER - SUBSCRIBER TRAINING. Training Classes

TYPESAFE TOGETHER - SUBSCRIBER TRAINING. Training Classes TYPESAFE TOGETHER - SUBSCRIBER TRAINING Training Classes As your business goes Reactive, a ton of development work lays ahead. Now, more than ever, the knowledge and skills of your staff has a direct impact

More information

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics

E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics E6895 Advanced Big Data Analytics Lecture 3:! Spark and Data Analytics Ching-Yung Lin, Ph.D. Adjunct Professor, Dept. of Electrical Engineering and Computer Science Mgr., Dept. of Network Science and Big

More information

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel Big Data and Analytics: Getting Started with ArcGIS Mike Park Erik Hoel Agenda Overview of big data Distributed computation User experience Data management Big data What is it? Big Data is a loosely defined

More information

CRITEO INTERNSHIP PROGRAM 2015/2016

CRITEO INTERNSHIP PROGRAM 2015/2016 CRITEO INTERNSHIP PROGRAM 2015/2016 A. List of topics PLATFORM Topic 1: Build an API and a web interface on top of it to manage the back-end of our third party demand component. Challenge(s): Working with

More information

Project Ideas. Semester long projects of medium scope. TAs presenting project ideas today. Students can submit their own ideas

Project Ideas. Semester long projects of medium scope. TAs presenting project ideas today. Students can submit their own ideas Project Ideas Semester long projects of medium scope TAs presenting project ideas today Students can submit their own ideas Send to cs161projectidea@gmail.com To be approved by staff Short presentation

More information

Ninja Webtechnologies. Eray Basar, 9elements

Ninja Webtechnologies. Eray Basar, 9elements Ninja Webtechnologies Eray Basar, 9elements Webdeveloper vs. Security Engineers Webdeveloper vs. Security Engineers Introduction Past and Present Evolution Past and Present Evolution 9elements Web Hardware

More information

Introduction to Spark

Introduction to Spark Introduction to Spark Shannon Quinn (with thanks to Paco Nathan and Databricks) Quick Demo Quick Demo API Hooks Scala / Java All Java libraries *.jar http://www.scala- lang.org Python Anaconda: https://

More information

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage

Big Graph Analytics on Neo4j with Apache Spark. Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage Big Graph Analytics on Neo4j with Apache Spark Michael Hunger Original work by Kenny Bastani Berlin Buzzwords, Open Stage My background I only make it to the Open Stages :) Probably because Apache Neo4j

More information

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin Weil @kevinweil

Analyzing Big Data at. Web 2.0 Expo, 2010 Kevin Weil @kevinweil Analyzing Big Data at Web 2.0 Expo, 2010 Kevin Weil @kevinweil Three Challenges Collecting Data Large-Scale Storage and Analysis Rapid Learning over Big Data My Background Studied Mathematics and Physics

More information

Tutorial for Assignment 2.0

Tutorial for Assignment 2.0 Tutorial for Assignment 2.0 Web Science and Web Technology Summer 2012 Slides based on last years tutorials by Chris Körner, Philipp Singer 1 Review and Motivation Agenda Assignment Information Introduction

More information

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

A Novel Cloud Based Elastic Framework for Big Data Preprocessing School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

More information

Fogbeam Vision Series - The Modern Intranet

Fogbeam Vision Series - The Modern Intranet Fogbeam Labs Cut Through The Information Fog http://www.fogbeam.com Fogbeam Vision Series - The Modern Intranet Where It All Started Intranets began to appear as a venue for collaboration and knowledge

More information

CiteSeer x in the Cloud

CiteSeer x in the Cloud Published in the 2nd USENIX Workshop on Hot Topics in Cloud Computing 2010 CiteSeer x in the Cloud Pradeep B. Teregowda Pennsylvania State University C. Lee Giles Pennsylvania State University Bhuvan Urgaonkar

More information

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader

A Performance Evaluation of Open Source Graph Databases. Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader A Performance Evaluation of Open Source Graph Databases Robert McColl David Ediger Jason Poovey Dan Campbell David A. Bader Overview Motivation Options Evaluation Results Lessons Learned Moving Forward

More information

Technology overview for the HPE Living Progress Challenge

Technology overview for the HPE Living Progress Challenge Technology overview for the HPE Living Progress Challenge Sean Hughes Senior Manager, Developer Relations December 15, 2015 Developers and the Living progress challenge Turning ideas into apps through

More information

Brave New World: Hadoop vs. Spark

Brave New World: Hadoop vs. Spark Brave New World: Hadoop vs. Spark Dr. Kurt Stockinger Associate Professor of Computer Science Director of Studies in Data Science Zurich University of Applied Sciences Datalab Seminar, Zurich, Oct. 7,

More information

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business

BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business BUDT 758B-0501: Big Data Analytics (Fall 2015) Decisions, Operations & Information Technologies Robert H. Smith School of Business Instructor: Kunpeng Zhang (kzhang@rmsmith.umd.edu) Lecture-Discussions:

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks vision is to empower anyone to easily build and deploy advanced analytics solutions. The company was founded by the team who created Apache Spark, a powerful

More information

Spark and the Big Data Library

Spark and the Big Data Library Spark and the Big Data Library Reza Zadeh Thanks to Matei Zaharia Problem Data growing faster than processing speeds Only solution is to parallelize on large clusters» Wide use in both enterprises and

More information

The Big Data Paradigm Shift. Insight Through Automation

The Big Data Paradigm Shift. Insight Through Automation The Big Data Paradigm Shift Insight Through Automation Agenda The Problem Emcien s Solution: Algorithms solve data related business problems How Does the Technology Work? Case Studies 2013 Emcien, Inc.

More information

Copyright 2010, Oracle. All rights reserved.

Copyright 2010, Oracle. All rights reserved. OPN Partner Workshop zur Oracle BI Enterprise Edition 11g Die Referenten Carsten Frisch Oracle BI PreSales Consultant 14 Jahre für Oracle im Bereich Business Intelligence & Data Warehousing

More information

GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC.

GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC. GROW WITH BIG DATA Third Eye Consulting Services & Solutions LLC. Connected Cars Driving Us to a Better Us - In Real Time What is a Connected Car? Connected Car - Definition A connected car is a car that

More information

Copyright 2013 Splunk Inc. Introducing Splunk 6

Copyright 2013 Splunk Inc. Introducing Splunk 6 Copyright 2013 Splunk Inc. Introducing Splunk 6 Safe Harbor Statement During the course of this presentation, we may make forward looking statements regarding future events or the expected performance

More information

Keyphrase Extraction for Scholarly Big Data

Keyphrase Extraction for Scholarly Big Data Keyphrase Extraction for Scholarly Big Data Cornelia Caragea Computer Science and Engineering University of North Texas July 10, 2015 Scholarly Big Data Large number of scholarly documents on the Web PubMed

More information

Mobile Storage and Search Engine of Information Oriented to Food Cloud

Mobile Storage and Search Engine of Information Oriented to Food Cloud Advance Journal of Food Science and Technology 5(10): 1331-1336, 2013 ISSN: 2042-4868; e-issn: 2042-4876 Maxwell Scientific Organization, 2013 Submitted: May 29, 2013 Accepted: July 04, 2013 Published:

More information

Client Overview. Engagement Situation. Key Requirements

Client Overview. Engagement Situation. Key Requirements Client Overview Our client is one of the leading providers of business intelligence systems for customers especially in BFSI space that needs intensive data analysis of huge amounts of data for their decision

More information

A Quick Introduction to Google's Cloud Technologies

A Quick Introduction to Google's Cloud Technologies A Quick Introduction to Google's Cloud Technologies Chris Schalk Developer Advocate @cschalk Anatoli Babenia Agenda Introduction Introduction to Google's Cloud Technologies App Engine Review Google's new

More information

Senior Business Intelligence/Engineering Analyst

Senior Business Intelligence/Engineering Analyst We are very interested in urgently hiring 3-4 current or recently graduated Computer Science graduate and/or undergraduate students and/or double majors. NetworkofOne is an online video content fund. We

More information

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia Monitis Project Proposals for AUA September 2014, Yerevan, Armenia Distributed Log Collecting and Analysing Platform Project Specifications Category: Big Data and NoSQL Software Requirements: Apache Hadoop

More information

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113

Web Mining Seminar CSE 450. Spring 2008 MWF 11:10 12:00pm Maginnes 113 CSE 450 Web Mining Seminar Spring 2008 MWF 11:10 12:00pm Maginnes 113 Instructor: Dr. Brian D. Davison Dept. of Computer Science & Engineering Lehigh University davison@cse.lehigh.edu http://www.cse.lehigh.edu/~brian/course/webmining/

More information

Big Data Analytics Hadoop and Spark

Big Data Analytics Hadoop and Spark Big Data Analytics Hadoop and Spark Shelly Garion, Ph.D. IBM Research Haifa 1 What is Big Data? 2 What is Big Data? Big data usually includes data sets with sizes beyond the ability of commonly used software

More information

Oracle Database Public Cloud Services

Oracle Database Public Cloud Services Oracle Database Public Cloud Services A Strategy and Technology Overview Bob Zeolla Principal Sales Consultant Oracle Education & Research November 23, 2015 Safe Harbor Statement The following is intended

More information

SharePoint & Azure: Digital Asset Management

SharePoint & Azure: Digital Asset Management SharePoint & Azure: Digital Asset Management Project Leadership Microsoft Solutions Provider Proven Results www.attunix.com Introduction Attunix Corporation: A Bellevue, WA based business & technology

More information

SEO FACTORS & TRENDS January 2010

SEO FACTORS & TRENDS January 2010 SEO FACTORS & TRENDS January 2010 Welcome to the Bruce Clay SEO Factors and Trends Report The requirements to rank highly in search engines are constantly changing and the pace of change is accelerating.

More information

How To Make Sense Of Data With Altilia

How To Make Sense Of Data With Altilia HOW TO MAKE SENSE OF BIG DATA TO BETTER DRIVE BUSINESS PROCESSES, IMPROVE DECISION-MAKING, AND SUCCESSFULLY COMPETE IN TODAY S MARKETS. ALTILIA turns Big Data into Smart Data and enables businesses to

More information

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Investigating Hadoop for Large Spatiotemporal Processing Tasks Investigating Hadoop for Large Spatiotemporal Processing Tasks David Strohschein dstrohschein@cga.harvard.edu Stephen Mcdonald stephenmcdonald@cga.harvard.edu Benjamin Lewis blewis@cga.harvard.edu Weihe

More information

Tables in the Cloud. By Larry Ng

Tables in the Cloud. By Larry Ng Tables in the Cloud By Larry Ng The Idea There has been much discussion about Big Data and the associated intricacies of how it can be mined, organized, stored, analyzed and visualized with the latest

More information

Miguel Ortiz, Sr. Systems Engineer. Globanet

Miguel Ortiz, Sr. Systems Engineer. Globanet Miguel Ortiz, Sr. Systems Engineer Globanet Agenda Who is Globanet? Archiving Processes and Standards How Does Data Archiving Help Data Management? Data Archiving to Meet Downstream ediscovery Needs Timely

More information

Google Apps Education Edition. 2011 Kommits Conference

Google Apps Education Edition. 2011 Kommits Conference Google Apps Education Edition 2011 Kommits Conference Agenda 1 21st Century Students and Schools 2 Google Apps for Education 3 Myth Busting 21 st Century Students and Schools Growing Divides... A Different

More information

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group

HiBench Introduction. Carson Wang (carson.wang@intel.com) Software & Services Group HiBench Introduction Carson Wang (carson.wang@intel.com) Agenda Background Workloads Configurations Benchmark Report Tuning Guide Background WHY Why we need big data benchmarking systems? WHAT What is

More information

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS

ISSN: 2320-1363 CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS CONTEXTUAL ADVERTISEMENT MINING BASED ON BIG DATA ANALYTICS A.Divya *1, A.M.Saravanan *2, I. Anette Regina *3 MPhil, Research Scholar, Muthurangam Govt. Arts College, Vellore, Tamilnadu, India Assistant

More information

The Big Data Revolution: welcome to the Cognitive Era.

The Big Data Revolution: welcome to the Cognitive Era. The Big Data Revolution: welcome to the Cognitive Era. Yves Eychenne, Cloud Advisor, IBM Email: yves.eychenne@fr.ibm.com @yeychenne 2015 INTERNATIONAL BUSINESS MACHINES CORPORATION Agenda Big Data and

More information

Using Cloud Services for Building Next Generation Mobile Apps

Using Cloud Services for Building Next Generation Mobile Apps Using Cloud Services for Building Next Generation Mobile Apps appcelerator.com Executive Summary Enterprises are in the midst of a major transformation as it relates to their interaction with customers,

More information

Big Data Research in the AMPLab: BDAS and Beyond

Big Data Research in the AMPLab: BDAS and Beyond Big Data Research in the AMPLab: BDAS and Beyond Michael Franklin UC Berkeley 1 st Spark Summit December 2, 2013 UC BERKELEY AMPLab: Collaborative Big Data Research Launched: January 2011, 6 year planned

More information

Big Data and Open Data

Big Data and Open Data Big Data and Open Data Bebo White SLAC National Accelerator Laboratory/ Stanford University!! bebo@slac.stanford.edu dekabytes hectobytes Big Data IS a buzzword! The Data Deluge From the beginning of

More information

Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology

Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology Master Degree Project Ideas (Fall 2014) Proposed By Faculty Department of Information Systems College of Computer Sciences and Information Technology 1 P age Dr. Maruf Hasan MS CIS Program Potential Project

More information

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data

Let the data speak to you. Look Who s Peeking at Your Paycheck. Big Data. What is Big Data? The Artemis project: Saving preemies using Big Data CS535 Big Data W1.A.1 CS535 BIG DATA W1.A.2 Let the data speak to you Medication Adherence Score How likely people are to take their medication, based on: How long people have lived at the same address

More information

A neo4j powered social networking and Question & Answer application to enhance scientific communication. René Pickhardt, Heinrich Hartmann

A neo4j powered social networking and Question & Answer application to enhance scientific communication. René Pickhardt, Heinrich Hartmann A neo4j powered social networking and Question & Answer application to enhance scientific communication. René Pickhardt, Heinrich Hartmann related-work.net Roadmap Introduction Data structures for Q &

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

A Strategic Approach to Unlock the Opportunities from Big Data

A Strategic Approach to Unlock the Opportunities from Big Data A Strategic Approach to Unlock the Opportunities from Big Data Yue Pan, Chief Scientist for Information Management and Healthcare IBM Research - China [contacts: panyue@cn.ibm.com ] Big Data or Big Illusion?

More information

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging

Outline. High Performance Computing (HPC) Big Data meets HPC. Case Studies: Some facts about Big Data Technologies HPC and Big Data converging Outline High Performance Computing (HPC) Towards exascale computing: a brief history Challenges in the exascale era Big Data meets HPC Some facts about Big Data Technologies HPC and Big Data converging

More information

A Brief Introduction to Apache Tez

A Brief Introduction to Apache Tez A Brief Introduction to Apache Tez Introduction It is a fact that data is basically the new currency of the modern business world. Companies that effectively maximize the value of their data (extract value

More information

Beyond The Web Drupal Meets The Desktop (And Mobile) Justin Miller Code Sorcery Workshop, LLC http://codesorcery.net/dcdc

Beyond The Web Drupal Meets The Desktop (And Mobile) Justin Miller Code Sorcery Workshop, LLC http://codesorcery.net/dcdc Beyond The Web Drupal Meets The Desktop (And Mobile) Justin Miller Code Sorcery Workshop, LLC http://codesorcery.net/dcdc Introduction Personal introduction Format & conventions for this talk Assume familiarity

More information

Recent Advances in Business Computing and Operations Research (RAIBCOR)

Recent Advances in Business Computing and Operations Research (RAIBCOR) Recent Advances in Business Computing and Operations Research (RAIBCOR) Prof. Dr. Stefan Bock 23. April 2010 Business Computing and Operations Research Agenda Rahmenzeitplan Grundsätzliches Vorstellung

More information

Databricks. A Primer

Databricks. A Primer Databricks A Primer Who is Databricks? Databricks was founded by the team behind Apache Spark, the most active open source project in the big data ecosystem today. Our mission at Databricks is to dramatically

More information

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015

Computer-Based Text- and Data Analysis Technologies and Applications. Mark Cieliebak 9.6.2015 Computer-Based Text- and Data Analysis Technologies and Applications Mark Cieliebak 9.6.2015 Data Scientist analyze Data Library use 2 About Me Mark Cieliebak + Software Engineer & Data Scientist + PhD

More information

Assignment 5: Visualization

Assignment 5: Visualization Assignment 5: Visualization Arash Vahdat March 17, 2015 Readings Depending on how familiar you are with web programming, you are recommended to study concepts related to CSS, HTML, and JavaScript. The

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

SparkLab May 2015 An Introduction to

SparkLab May 2015 An Introduction to SparkLab May 2015 An Introduction to & Apostolos N. Papadopoulos Assistant Professor Data Engineering Lab, Department of Informatics, Aristotle University of Thessaloniki Abstract Welcome to SparkLab!

More information

Analyzing Big Data with AWS

Analyzing Big Data with AWS Analyzing Big Data with AWS Peter Sirota, General Manager, Amazon Elastic MapReduce @petersirota What is Big Data? Computer generated data Application server logs (web sites, games) Sensor data (weather,

More information

Responsive, resilient, elastic and message driven system

Responsive, resilient, elastic and message driven system Responsive, resilient, elastic and message driven system solving scalability problems of course registrations Janina Mincer-Daszkiewicz, University of Warsaw jmd@mimuw.edu.pl Dundee, 2015-06-14 Agenda

More information

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture Apps and data source extensions with APIs Future white label, embed or integrate Power BI Deploy Intelligent

More information

Scholarly Use of Web Archives

Scholarly Use of Web Archives Scholarly Use of Web Archives Helen Hockx-Yu Head of Web Archiving British Library 15 February 2013 Web Archiving initiatives worldwide http://en.wikipedia.org/wiki/file:map_of_web_archiving_initiatives_worldwide.png

More information

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1

Spark ΕΡΓΑΣΤΗΡΙΟ 10. Prepared by George Nikolaides 4/19/2015 1 Spark ΕΡΓΑΣΤΗΡΙΟ 10 Prepared by George Nikolaides 4/19/2015 1 Introduction to Apache Spark Another cluster computing framework Developed in the AMPLab at UC Berkeley Started in 2009 Open-sourced in 2010

More information

Connecting Basic Research and Healthcare Big Data

Connecting Basic Research and Healthcare Big Data Elsevier Health Analytics WHS 2015 Big Data in Health Connecting Basic Research and Healthcare Big Data Olaf Lodbrok Managing Director Elsevier Health Analytics o.lodbrok@elsevier.com t +49 89 5383 600

More information

Google Apps Powered by Import House IT Solutions

Google Apps Powered by Import House IT Solutions Google Apps Powered by Import House IT Solutions Business challenges are changing Information overload Volume of information is increasing radically Sort, file, and find is a broken paradigm Collaboration

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics

More information

Big Data Analytics. Lucas Rego Drumond

Big Data Analytics. Lucas Rego Drumond Big Data Analytics Lucas Rego Drumond Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 36 Outline

More information

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia

Unified Big Data Processing with Apache Spark. Matei Zaharia @matei_zaharia Unified Big Data Processing with Apache Spark Matei Zaharia @matei_zaharia What is Apache Spark? Fast & general engine for big data processing Generalizes MapReduce model to support more types of processing

More information

Creation of Focused Web Archives for Scientists

Creation of Focused Web Archives for Scientists Creation of Focused Web Archives for Scientists, Thomas Risse and Gerhard Gossen L3S Research Center, Hannover, Germany ALEXANDRIA Workshop 15 / 16 September 2014 Hannover 15.09.2014 1 Web Archiving Web

More information

Seminar Algorithms of the Internet

Seminar Algorithms of the Internet HEINZ NIXDORF INSTITUT Seminar Algorithms of the Internet 2004-04-19 1 Motivation The Internet is the public global wide-area interconnection network for computers grows exponentially evolves The evolution

More information

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data

Search in BigData2 - When Big Text meets Big Graph 1. Introduction State of the Art on Big Data Search in BigData 2 - When Big Text meets Big Graph Christos Giatsidis, Fragkiskos D. Malliaros, François Rousseau, Michalis Vazirgiannis Computer Science Laboratory, École Polytechnique, France {giatsidis,

More information

Collaborative Open Market to Place Objects at your Service

Collaborative Open Market to Place Objects at your Service Collaborative Open Market to Place Objects at your Service D6.2.1 Developer SDK First Version D6.2.2 Developer IDE First Version D6.3.1 Cross-platform GUI for end-user Fist Version Project Acronym Project

More information

Extended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1

Extended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1 Extended Abstract Advancement through technology? The analysis of journalistic online-content by using automated tools 1 Jörg Haßler, Marcus Maurer & Thomas Holbach 1. Introduction Without any doubt, the

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015

Hadoop MapReduce and Spark. Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Hadoop MapReduce and Spark Giorgio Pedrazzi, CINECA-SCAI School of Data Analytics and Visualisation Milan, 10/06/2015 Outline Hadoop Hadoop Import data on Hadoop Spark Spark features Scala MLlib MLlib

More information

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization

IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource Virtualization 2011 International Conference on Information and Electronics Engineering IPCSIT vol.6 (2011) (2011) IACSIT Press, Singapore IMAV: An Intelligent Multi-Agent Model Based on Cloud Computing for Resource

More information

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group

MLg. Big Data and Its Implication to Research Methodologies and Funding. Cornelia Caragea TARDIS 2014. November 7, 2014. Machine Learning Group Big Data and Its Implication to Research Methodologies and Funding Cornelia Caragea TARDIS 2014 November 7, 2014 UNT Computer Science and Engineering Data Everywhere Lots of data is being collected and

More information

Course Catalogue. Masters Programme Human-Computer Interaction (MINF-M-120-MCI) 120 credit points

Course Catalogue. Masters Programme Human-Computer Interaction (MINF-M-120-MCI) 120 credit points Course Catalogue Masters Programme Human-Computer Interaction (MINF-M-120-MCI) 120 credit points According to the Examination Regulations from 25.09.2012 Version(2014/12/18) 1 About the Programme of Studies

More information

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney Introduction to Hadoop New York Oracle User Group Vikas Sawhney GENERAL AGENDA Driving Factors behind BIG-DATA NOSQL Database 2014 Database Landscape Hadoop Architecture Map/Reduce Hadoop Eco-system Hadoop

More information

Chapter 7. Using Hadoop Cluster and MapReduce

Chapter 7. Using Hadoop Cluster and MapReduce Chapter 7 Using Hadoop Cluster and MapReduce Modeling and Prototyping of RMS for QoS Oriented Grid Page 152 7. Using Hadoop Cluster and MapReduce for Big Data Problems The size of the databases used in

More information

Internship Opportunities Xerox Research Centre India (XRCI), Bangalore Analytics Research Group

Internship Opportunities Xerox Research Centre India (XRCI), Bangalore Analytics Research Group Analytics Research Group The Analytics Research Group in Xerox Research Centre India (XRCI) is seeking bright Undergraduate, Masters and PhD students for research internships to participate in exciting

More information

PaaS - Platform as a Service Google App Engine

PaaS - Platform as a Service Google App Engine PaaS - Platform as a Service Google App Engine Pelle Jakovits 14 April, 2015, Tartu Outline Introduction to PaaS Google Cloud Google AppEngine DEMO - Creating applications Available Google Services Costs

More information

Streamlining the Process of Business Intelligence with JReport

Streamlining the Process of Business Intelligence with JReport Streamlining the Process of Business Intelligence with JReport An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) Product Summary from 2014 EMA Radar for Business Intelligence Platforms for Mid-Sized Organizations

More information

SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901.

SOA, case Google. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901. Faculty of technology management 07.12.2009 Information Technology Service Oriented Communications CT30A8901 SOA, case Google Written by: Sampo Syrjäläinen, 0337918 Jukka Hilvonen, 0337840 1 Contents 1.

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Quick History of Data Management 4000 B C Manual recording From tablets to papyrus to paper A. Payberah 2014 2

More information

SECURITY AND REGULATORY COMPLIANCE OVERVIEW

SECURITY AND REGULATORY COMPLIANCE OVERVIEW Powering Cloud IT SECURITY AND REGULATORY COMPLIANCE OVERVIEW Executive Summary BetterCloud provides critical insights, automated management, and intelligent data security for cloud office platforms. As

More information

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services

Data Analytics at NERSC. Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services Data Analytics at NERSC Joaquin Correa JoaquinCorrea@lbl.gov NERSC Data and Analytics Services NERSC User Meeting August, 2015 Data analytics at NERSC Science Applications Climate, Cosmology, Kbase, Materials,

More information

Toward a community enhanced programming education

Toward a community enhanced programming education Toward a community enhanced programming education Ryo Suzuki University of Tokyo Tokyo, Japan 1253852881@mail.ecc.utokyo.ac.jp Permission to make digital or hard copies of all or part of this work for

More information

the missing log collector Treasure Data, Inc. Muga Nishizawa

the missing log collector Treasure Data, Inc. Muga Nishizawa the missing log collector Treasure Data, Inc. Muga Nishizawa Muga Nishizawa (@muga_nishizawa) Chief Software Architect, Treasure Data Treasure Data Overview Founded to deliver big data analytics in days

More information

Google Cloud Data Platform & Services. Gregor Hohpe

Google Cloud Data Platform & Services. Gregor Hohpe Google Cloud Data Platform & Services Gregor Hohpe All About Data We Have More of It Internet data more easily available Logs user & system behavior Cheap Storage keep more of it 3 Beyond just Relational

More information

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract W H I T E P A P E R Deriving Intelligence from Large Data Using Hadoop and Applying Analytics Abstract This white paper is focused on discussing the challenges facing large scale data processing and the

More information

Cymon.io. Open Threat Intelligence. 29 October 2015 Copyright 2015 esentire, Inc. 1

Cymon.io. Open Threat Intelligence. 29 October 2015 Copyright 2015 esentire, Inc. 1 Cymon.io Open Threat Intelligence 29 October 2015 Copyright 2015 esentire, Inc. 1 #> whoami» Roy Firestein» Senior Consultant» Doing Research & Development» Other work include:» docping.me» threatlab.io

More information

Exploring Big Data in Social Networks

Exploring Big Data in Social Networks Exploring Big Data in Social Networks virgilio@dcc.ufmg.br (meira@dcc.ufmg.br) INWEB National Science and Technology Institute for Web Federal University of Minas Gerais - UFMG May 2013 Some thoughts about

More information

Concept and Project Objectives

Concept and Project Objectives 3.1 Publishable summary Concept and Project Objectives Proactive and dynamic QoS management, network intrusion detection and early detection of network congestion problems among other applications in the

More information

A Close Look at Drupal 7

A Close Look at Drupal 7 smart. uncommon. ideas. A Close Look at Drupal 7 Is it good for your bottom line? {WEB} MEADIGITAL.COM {TWITTER} @MEADIGITAL {BLOG} MEADIGITAL.COM/CLICKOSITY {EMAIL} INFO@MEADIGITAL.COM Table of Contents

More information