DataStorm: Large-Scale Data Management in Cloud Environments



Similar documents
1st SEMESTER (beginning in September) Code Course Year ECTS Degree Lecturer Group(s) in English

Value of IEEE s Online Collections

SONAE PREPARING FUTURE GROWTH

LIST OF ATTORNEYS. Maio Island

REACTION Workshop Overview Porto, FEUP. Mário J. Silva IST/INESC-ID, Portugal REACTION

Technical Presentations. Arian Pasquali, FEUP, REACTION Data Collection Plataform David Batista, INESC-ID, Sematic Relations Extraction REACTION

Funding and Human Resources

How To Build A Portuguese Web Search Engine

Portuguese Research Institutions in History

2015 IEEE International Conference on Autonomous Robot Systems and Competitions (ICARSC 2015) Vila Real, Portugal 8-10 April 2015

Corticeira Amorim, S.G.P.S., S.A.

How To Get A Degree In Education

General Meeting s Preparatory Information

MANAGEMENT FUNDAMENTALS

PRINCIPLES AND PRACTICES OF MEDICAL DEVICE DEVELOPMENT

PhD Program in Electrical and Computer Engineering

INTERNATIONAL SEMINAR. International Criminal Law, the International Criminal Court and the Perspective of the Portuguese Speaking African Countries

PROTOCOL ON THE MARKETING OF COMPLEX FINANCIAL PRODUCTS

Students from the 1st pgfma with their teachers.

Presentation of Nova Doctoral School why, what for and how. João Crespo

New Shareholders Agreement and Qualified Holding

IST/INESC-ID. R. Alves Redol 9 Sala Lisboa PORTUGAL

Orthogonal ray imaging: from dose monitoring in external beam therapy to low-dose morphologic imaging with scanned megavoltage X-rays

Curriculum Vitae Helena Galhardas, PhD August 2014

Enhancing Health and. Information Systems and Technologies for. Social Care. Reference. Polytechnic Institute of Leiria, Portugal

Conference Organizer

Reference Architecture, Requirements, Gaps, Roles

3 rd National Conference on Science and Technology

JUDO th Académica s Treinos Formação: International Training Camp August - Coimbra. Over 400 Participants

Session 1 Peripheral arterial disease and ulcer: basic principles

Advanced Training and Industrial Research for Complex Engineering Systems, A+

Big Data Challenges. technology basics for data scientists. Spring Jordi Torres, UPC - BSC

2 nd Workshop on the Economics of ICTs

How To Understand And Understand Cultural Quarter

PROGRAM. May 29 th, Wednesday. Conference The Sea at EU 2020 Strategy

From Drug Discovery to First in Humans

VISION AND OBJECTIVES

Oncology Meetings: Gastric Cancer State of Art March 27 and 28th, 2014

and Knowledge Management

JUDO !!!! !!!! 26 th Académica s. International Training Camp August - Coimbra. Over 400 Participants

Student Number Dissertation Seminar "Entrepreneurship and Development" with Susana Frazão Pinheiro

Rail Brazil Tech Business Summit Location: Expo Center Norte November São Paulo

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

BIGS: A Framework for Large-Scale Image Processing and Analysis Over Distributed and Heterogeneous Computing Resources

Task 3 Web Community Sensing

4th LISBON VASCULAR FORUM 4º FORUM VASCULAR DE LISBOA. LISBON MARRIOTT HOTEL 13 and 14 DECEMBER 2013 PROGRAMA PRELIMINAR PRELIMINARY PROGRAM

The XLDB Group at CLEF 2004

The FenixEdu Project: an Open-Source Academic Information Platform

Part time teaching staff

Hexaware E-book on Predictive Analytics

International Journal of Innovative Research in Computer and Communication Engineering

NET SERVIÇOS DE COMUNICAÇÃO S.A. CORPORATE TAX ID (CNPJ) # / NIRE # PUBLICLY TRADED COMPANY

training programme in pharmaceutical medicine Regulatory affairs

PRODUCT RESEARCH & DEVELOPMENT PROCESS

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Manifest for Big Data Pig, Hive & Jaql

LIAAD Artificial Intelligence and Decision Support Lab of INESC TEC. João Mendes Moreira

THE ECONOMIC AND FINANCIAL CRISIS In EUROPE: On the Road to Recovery? European Lawyers Union XXVII General Congress. Lisbon, June 2013

Network for Sustainable Ultrascale Computing (NESUS)

Report ThinkBike Workshop Lisboa ThinkBike workshop

Isabel Cristina Ayres da Silva Maringelli

Massive Cloud Auditing using Data Mining on Hadoop

FIELD OF STUDY COURSE PROPOSAL LAIS 333 A. Cover Application 1. Proposed field of study: FSLT. 2. Course number: LAIS 333

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Habitâmega Group GRANITOS, S. A.

The Brazilian Academy of Sciences

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

Florianópolis, March 21, Elizabeth Wegner Karas Organizing Committee

Internacional Congress

VERSION 1.1 SEPTEMBER 14, 2014 IGELU 2014: USE.PT UPDATE REPORT NATIONAL/REGIONAL USER GROUP REPRESENTATIVES MEETING PRESENTED BY: PAULO LOPES

CURRICULUM VITAE FERNANDO LUÍS TODO-BOM FERREIRA DA COSTA

Proceedings of the International Conference:

7 Principles of the IoT

MARCO Paulo ABRUNHOSA Cardoso

Sterling Business Intelligence

TRUSTED ARCHIVE OVERVIEW

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

ÍNDICE PARTE II CORPORATE GOVERNANCE ASSESSMENT 45

ADVANCE PROGRAM. IEEE BigData Coimbra Satellite Session International BigData Coimbra Satellite Session

TerraLib as an Open Source Platform for Public Health Applications. Karine Reis Ferreira

Big Data Challenges in Bioinformatics

CITIES AND CLIMATE CHANGE PROGRAMME

A Visual Tagging Technique for Annotating Large-Volume Multimedia Databases

Transcription:

DataStorm: Large-Scale Data Management in Cloud Environments INESC-ID Data Management & Information Retrieval Group 1st DataStorm Workshop DataStorm W01:

Outline Task H1 1 Task H1: Data Acquisition and Information Extraction 2 Task V4: Cultural Data Resources and Data Processing Infrastructure DataStorm W01:

Outline Task H1 1 Task H1: Data Acquisition and Information Extraction 2 Task V4: Cultural Data Resources and Data Processing Infrastructure DataStorm W01:

Task H1: Data Acquisition and Information Extraction Goals: Exploit textual information present in digital media I.e., extracting structured data from natural language text Do this in the Terabyte scale And across all vertical tasks DataStorm W01:

Methodology Task H1 Focus on effectiveness Deal with the tradeoff between complexity and quality Leverage on the amount of data available Focus on efficiency Explore automatic optimization of extraction tasks Explore massive parallelization of extraction tasks DataStorm W01:

Participants Task H1 Pável Pereira Calado; Bruno Emanuel da Graça Martins; Helena Isabel de Jesus Galhardas; Helena Sofia Andrade Nunes Pereira Pinto; José Luis Brinquete Borbinha; Mário Jorge Costa Gaspar Silva; Paula Cristina Quaresma da Fonseca Carvalho; Paulo Jorge Fernandes Carreira Gonçalo Fernandes Simões; Ivo Miguel da Quinta Anastácio; Luís Miguel Gomes dos Santos Reis Leitão + 1 BIM + 2 BIC DataStorm W01:

Schedule Task H1 Horizontal task: spans the entire project DataStorm W01:

Outline Task H1 Task V4 1 Task H1: Data Acquisition and Information Extraction 2 Task V4: Cultural Data Resources and Data Processing Infrastructure DataStorm W01:

Task V4 Task V4: Cultural Data Resources and Data Processing Infrastructure Goal: Focus: Large-scale data analytics on Web archive collections Detecting, resolving and tracking named entities in Web document Extracting contextual information Retrieval and visualization of information DataStorm W01:

Task V4 Open Problems in Task V4 (and H1) Information Extraction and Retrieval (in general) Improvements to current IE/IR techniques Application of IE to a diverse environment such as the Web Large-scale Information Extraction and Retrieval Application to large and dynamic data repositories Parallelization/optimization of current IE/IR algorithms DataStorm W01:

Task V4 Available Tools and Techniques Starting points: The plethora of existing IE/IR solutions Many existing large-scale parallelization solutions Work within the team: Optimization of IE execution plans Application of IE/IR in several contexts (social networks, geographic information, bibliography, etc.) Work on Web data extraction DataStorm W01:

Areas of Research Task V4 Continuing work on optimization of IE execution plans On-line machine learning algorithms (applied to IE/IR) Parallelization of IE algorithms Large-scale data visualization Large-scale data analysis DataStorm W01:

Beyond the Web... and beyond task V4? Task H1 Task V4 Digital libraries Geographic information systems Social networks Messages and communication DataStorm W01:

In sum... Task H1: Data Acquisition and Information Extraction Task H4: Cultural Data Resources and Data Processing Infrastructure Challenges and opportunities: Application of IE/IR techniques to cultural data resources Adaptation of IE/IR techniques to large-scale data problems DataStorm W01:

Questions? DataStorm W01: