15.00 15.30 30 XML enabled databases. Non relational databases. Guido Rotondi

Similar documents
ESS event: Big Data in Official Statistics. Antonino Virgillito, Istat

A Novel Cloud Based Elastic Framework for Big Data Preprocessing

ESS event: Big Data in Official Statistics

Advanced Big Data Analytics with R and Hadoop

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

How to Enhance Traditional BI Architecture to Leverage Big Data

How to use Big Data in Industry 4.0 implementations. LAURI ILISON, PhD Head of Big Data and Machine Learning

ANALYTICS IN BIG DATA ERA

Big Data With Hadoop

Reference Architecture, Requirements, Gaps, Roles

BIG DATA Alignment of Supply & Demand Nuria de Lama Representative of Atos Research &

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

ANALYTICS IN BIG DATA ERA

Offload Enterprise Data Warehouse (EDW) to Big Data Lake. Ample White Paper

How To Handle Big Data With A Data Scientist

Transforming the Telecoms Business using Big Data and Analytics

Luncheon Webinar Series May 13, 2013

Big Data Analytics Nokia

Hadoop and Map-Reduce. Swati Gore

ICT Perspectives on Big Data: Well Sorted Materials

MicroStrategy Course Catalog

THE DEVELOPER GUIDE TO BUILDING STREAMING DATA APPLICATIONS

SEIZE THE DATA SEIZE THE DATA. 2015

HOW TO DO A SMART DATA PROJECT

Making Sense of Big Data in Insurance

Big Data JAMES WARREN. Principles and best practices of NATHAN MARZ MANNING. scalable real-time data systems. Shelter Island

BIG DATA TECHNOLOGY. Hadoop Ecosystem

Demonstration of SAP Predictive Analysis 1.0, consumption from SAP BI clients and best practices

The University of Jordan

Data Virtualization and ETL. Denodo Technologies Architecture Brief

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Challenges for Data Driven Systems

Modernization of European Official Statistics through Big Data methodologies and best practices: ESS Big Data Event Roma 2014

Implement Hadoop jobs to extract business value from large and varied data sets

Evaluating NoSQL for Enterprise Applications. Dirk Bartels VP Strategy & Marketing

QLIKVIEW DEPLOYMENT FOR BIG DATA ANALYTICS AT KING.COM

Big Data Challenges in Bioinformatics

ANALYTICS CENTER LEARNING PROGRAM

DATAOPT SOLUTIONS. What Is Big Data?

Native Connectivity to Big Data Sources in MicroStrategy 10. Presented by: Raja Ganapathy

Placing Big Data in Official Statistics: A Big Challenge?

BIG DATA What it is and how to use?

Chapter 7. Using Hadoop Cluster and MapReduce

SQL Server 2005 Features Comparison

Keywords Big Data, NoSQL, Relational Databases, Decision Making using Big Data, Hadoop

Data Lake In Action: Real-time, Closed Looped Analytics On Hadoop

Application of Predictive Analytics for Better Alignment of Business and IT

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Big Data Architect Certification Self-Study Kit Bundle

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Designing Agile Data Pipelines. Ashish Singh Software Engineer, Cloudera

Performance and Scalability Overview

Formal Methods for Preserving Privacy for Big Data Extraction Software

Ramesh Bhashyam Teradata Fellow Teradata Corporation

Discovering Business Insights in Big Data Using SQL-MapReduce

90% of your Big Data problem isn t Big Data.

Reduce and manage operating costs and improve efficiency. Support better business decisions based on availability of real-time information

Understanding the Value of In-Memory in the IT Landscape

Big Data, Fast Data, Complex Data. Jans Aasman Franz Inc

BIG DATA IN BUSINESS ENVIRONMENT

A Scalable Data Transformation Framework using the Hadoop Ecosystem

Detecting Anomalous Behavior with the Business Data Lake. Reference Architecture and Enterprise Approaches.

Why NoSQL? Your database options in the new non- relational world IBM Cloudant 1

Data-intensive HPC: opportunities and challenges. Patrick Valduriez

Chapter 6 Basics of Data Integration. Fundamentals of Business Analytics RN Prasad and Seema Acharya

Analance Data Integration Technical Whitepaper

INTRODUCTION TO APACHE HADOOP MATTHIAS BRÄGER CERN GS-ASE

Complex, true real-time analytics on massive, changing datasets.

More Data in Less Time

European Archival Records and Knowledge Preservation Database Archiving in the E-ARK Project

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 29-1

Data Warehousing and Data Mining in Business Applications

BIG DATA & DATA SCIENCE

Architectures for Big Data Analytics A database perspective

IBM AND NEXT GENERATION ARCHITECTURE FOR BIG DATA & ANALYTICS!

ANALYTICS STRATEGY: creating a roadmap for success

Deploy. Friction-free self-service BI solutions for everyone Scalable analytics on a modern architecture

Big Data Management and Analytics

Big Data and Analytics in Government

Hadoop. Sunday, November 25, 12

Map-Reduce for Machine Learning on Multicore

Embedded inside the database. No need for Hadoop or customcode. True real-time analytics done per transaction and in aggregate. On-the-fly linking IP

SQL + NOSQL + NEWSQL + REALTIME FOR INVESTMENT BANKS

Global Data Management

Analance Data Integration Technical Whitepaper

How To Use A Data Center With A Data Farm On A Microsoft Server On A Linux Server On An Ipad Or Ipad (Ortero) On A Cheap Computer (Orropera) On An Uniden (Orran)

Best Practices for Hadoop Data Analysis with Tableau

News and trends in Data Warehouse Automation, Big Data and BI. Johan Hendrickx & Dirk Vermeiren

Big Data Analytics - Accelerated. stream-horizon.com

IST722 Data Warehousing

Associate Prof. Dr. Victor Onomza Waziri

Transcription:

Programme of the ESTP training course on BIG DATA EFFECTIVE PROCESSING AND ANALYSIS OF VERY LARGE AND UNSTRUCTURED DATA FOR OFFICIAL STATISTICS Rome, 5 9 May 2014 Istat Piazza Indipendenza 4, Room Vanoni A laboratory approach in managing very large datasets, which are emerging as primary sources feeding most up to date statistical processes. Students will be introduced to the appropriate use of technology for managing the ETL processes resulting from collecting and feeding data from large structured and unstructured data sources. The course also provides a collection of methods and techniques to integrate the sources, to compare the archives against reference metadata sets and to discover and eventually resolve source anomalies. The attendee will be introduced in the theoretical fundamentals, which underlie any presented methodology and will finally be brought to a real implementation by using innovative techniques and algorithms. Day 1, 5 May 2014 Old and new data manipulation paradigms 9.00-9.15 15 Opening 9.15 9.45 30 Too big to ignore: a matter of balance. Evolution in data management; scenario. 9.45 10.15 30 The need for alternative computing paradigms. Antonino Virgillito 10.15 11.00 45 Classification of data sources. 11.15 11.45 30 The Internet of Things. 11.45-12.30 45 Case study: synthesising a Big Data driven framework. Diego Zardetto 12.30 13.00 30 Sharing experiences, expectations and critical aspects. Giulio Barcaroli 13.00 13.30 30 International activities on Big Data in Official Statistics Carlo Vaccari 14.30 15.00 30 XML as integration paradigm. Service Oriented Architecture.

15.00 15.30 30 XML enabled databases. Non relational databases. 15.45 16.15 30 Handling XML sources. Non structured XML Tables. 16.15 16.45 30 Dealing with XSD schemas. Structured XML Tables. 16.45 17.15 30 Merging XML data in the business process: the Resource Description Framework. 17.15-17.30 15 Conclusions

Day 2, 6 May 2014 A roadmap toward Big Data 9.00-9.15 15 Opening 9.15 10.00 45 The Map Reduce programming model. Antonino Virgillito 10.00 11.00 60 The World of Hadoop. Antonino Virgillito 11.15 12.15 60 NoSQL databases 12.15 12.45 30 Robust concurrent computing architectures and the Byzantine agreement problem. Single Point Of Control. Single Point Of Failure. 12.45 13.30 45 Using Big Data technologies (part one): massive computing. Antonino Virgillito 14.30 15.30 60 Using Big Data technologies (part two): dealing with unstructured data examples and applications. 15.45 16.30 45' 16.30 17.15 45' Implementing the Map Reduce programming model on a parallel enabled database: aggregating functions. Profiling the Map Reduce model on a real enterprise infrastructure. Implementing and evaluating simple Map Reduce algorithms. 17.15-17.30 15 Conclusions

Day 3, 7 May 2014 Big Data in Official Statistics 9.00-9.15 15 Opening 9.15 10.00 45 10.00 11.00 60 Introduction to Big Data in Official Statistic. The concept of Big Data; overview of Big Data sources. Methodological issues in using Big Data for Official Statistics. Antonino Virgillito Giulio Barcaroli 11.15 12.15 60 IT Issues in using Big Data for Official Statistics. 12.15 13.30 75 Using mobile phones for analyzing mobility of city users. Antonino Virgillito 14.30 15.30 60 Improving Labor Force Survey estimates by the effective usage of Google Trends. 15.45 16.45 60 16.45 17.15 30 Internet as a data source: web scraping and text mining for estimating ICT usage by enterprises and public Institutions. Privacy, Security and Safety: Recipes for securing data, recipes for disclosure control, trusted computing. 17.15-17.30 15 Conclusions

Day 4, 8 May 2014 Improving data availability and processing efficiency 9.00-9.15 15 Opening 9.15 10.00 45 10.00 11.00 60 Data location and partitioning. Indexing. Problem splitting. Actor systems. Storage virtualisation. Examples of improving data location and partitioning. Effective usage of indexes. 11.15 12.15 60 12.15 13.00 45 Improving database (serial) operations. Code profiling. Bulk operations. Pipelined functions. Sustained data streaming. Partition swapping. External tables in performing fast bulk operations. Application of a pipelined function to an ETL process. Managing changes of a big micro data set. 13.00 13.30 30 Quasi real time analytics. Diego Zardetto 14.30 15.30 60 Fundamentals of parallel computing. Definitions, metrics, workload, critical aspects. Distributed vs Symmetric Multi Processing. 15.45 16.30 45 16.30 17.15 45 Parallel database operations. Scheduled concurrent tasks. Parallel enabled pipelined functions. Parallel queries. Embedded relational objects, aggregating functions. Self-made parallelism vs controlled tasks.benefits of parallel data streaming. Multipath data querying. Embedded relational objects. Design of central aggregating functions. 17.15-17.30 15 Conclusions

Day 5, 9 May 2014 The analysis of massive datasets 9.00-9.15 15 Opening 9.15 10.15 60 Geometric interpretation of data structures and the introduction of regular languages and expressions. 10.15 11.00 45 Getting involved with regular expressions. 11.15 12.00 45 Mapping techniques for studying anomalies in structured data: Probabilistic ranking of event patterns. 12.00 12.45 45 Stochastic characterisation of unstructured data sets. 12.45 13.30 45 Characteristics of a Big Data Analysis Framework: a distributed approach 14.30-15.30 60 Inference techniques used for Official Statistics (Part-1) Diego Zardetto 15.45 16.45 60 Inference techniques used for Official Statistics. (Part-2) Diego Zardetto 16.45 17.00 15 Where can we go from here? Golden rules. 17.00 17.30 30 Final remarks Giulio Barcaroli Antonino Virgillito