Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics



Similar documents
Trends and Research Opportunities in Spatial Big Data Analytics and Cloud Computing NCSU GeoSpatial Forum

The 4 Pillars of Technosoft s Big Data Practice

BIG DATA IN THE CLOUD : CHALLENGES AND OPPORTUNITIES MARY- JANE SULE & PROF. MAOZHEN LI BRUNEL UNIVERSITY, LONDON

Oracle Big Data SQL Technical Update

How To Handle Big Data With A Data Scientist

Talend Real-Time Big Data Sandbox. Big Data Insights Cookbook

Simplifying Big Data Analytics: Unifying Batch and Stream Processing. John Fanelli,! VP Product! In-Memory Compute Summit! June 30, 2015!!

Talend Big Data. Delivering instant value from all your data. Talend

What do Big Data & HAVEn mean? Robert Lejnert HP Autonomy

Where is... How do I get to...

Search and Real-Time Analytics on Big Data

HYPER-CONVERGED INFRASTRUCTURE STRATEGIES

Big Data: Overview and Roadmap eglobaltech. All rights reserved.

Extend your analytic capabilities with SAP Predictive Analysis

Dominik Wagenknecht Accenture

HDP Hadoop From concept to deployment.

Comprehensive Analytics on the Hortonworks Data Platform

Making Sense of Big Data in Insurance

The Lab and The Factory

Addressing Open Source Big Data, Hadoop, and MapReduce limitations

Big Data, Cloud Computing, Spatial Databases Steven Hagan Vice President Server Technologies

Microsoft Big Data Solutions. Anar Taghiyev P-TSP

End to End Solution to Accelerate Data Warehouse Optimization. Franco Flore Alliance Sales Director - APJ

How To Make Sense Of Data With Altilia

Oracle Big Data Spatial & Graph Social Network Analysis - Case Study

Big Data and Analytics: Getting Started with ArcGIS. Mike Park Erik Hoel

Big Data Analytics for Space Exploration, Entrepreneurship and Policy Opportunities. Tiffani Crawford, PhD

Big Data & Security. Aljosa Pasic 12/02/2015

Big Data Challenges and Success Factors. Deloitte Analytics Your data, inside out

Navigating Big Data business analytics

Increase Agility and Reduce Costs with a Logical Data Warehouse. February 2014

Datenverwaltung im Wandel - Building an Enterprise Data Hub with

THE STATE OF GEO BIG DATA IN OPEN SOURCE. Rob Emanuele

Industry 4.0 and Big Data

Hadoop Data Hubs and BI. Supporting the migration from siloed reporting and BI to centralized services with Hadoop

5 Keys to Unlocking the Big Data Analytics Puzzle. Anurag Tandon Director, Product Marketing March 26, 2014

The Future of Data Management

Hadoop Evolution In Organizations. Mark Vervuurt Cluster Data Science & Analytics

From Spark to Ignition:

Managing Cloud Server with Big Data for Small, Medium Enterprises: Issues and Challenges

The Next Wave of Data Management. Is Big Data The New Normal?

ANALYTICS CENTER LEARNING PROGRAM

COMP9321 Web Application Engineering

tuplejump The data engineering platform

Deploying Big Data to the Cloud: Roadmap for Success

Ubuntu and Hadoop: the perfect match

Big Data Explained. An introduction to Big Data Science.

Big Data Integration: A Buyer's Guide

Big Data and Advanced Analytics Applications and Capabilities Steven Hagan, Vice President, Server Technologies

Architectural patterns for building real time applications with Apache HBase. Andrew Purtell Committer and PMC, Apache HBase

Oracle Big Data Building A Big Data Management System

Native Connectivity to Big Data Sources in MSTR 10

Mike Maxey. Senior Director Product Marketing Greenplum A Division of EMC. Copyright 2011 EMC Corporation. All rights reserved.

Machina Research. Where is the value in IoT? IoT data and analytics may have an answer. Emil Berthelsen, Principal Analyst April 28, 2016

Changing the Equation on Big Data Spending

Elasticsearch on Cisco Unified Computing System: Optimizing your UCS infrastructure for Elasticsearch s analytics software stack

Big Data: Are You Ready? Kevin Lancaster

Converged, Real-time Analytics Enabling Faster Decision Making and New Business Opportunities

Big Data Solutions. Portal Development with MongoDB and Liferay. Solutions

Investigating Hadoop for Large Spatiotemporal Processing Tasks

Introduction to Hadoop. New York Oracle User Group Vikas Sawhney

Are You Big Data Ready?

Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January Website:

How To Make Data Streaming A Real Time Intelligence

Big Data and Analytics: Challenges and Opportunities

Data Virtualization A Potential Antidote for Big Data Growing Pains

Data Integration Checklist

ArcGIS. Server. A Complete and Integrated Server GIS

BIG DATA AND THE ENTERPRISE DATA WAREHOUSE WORKSHOP

Non-Stop Hadoop Paul Scott-Murphy VP Field Techincal Service, APJ. Cloudera World Japan November 2014

Smart Cities require Geospatial Data Providing services to citizens, enterprises, visitors...

Choosing The Right Big Data Tools For The Job A Polyglot Approach

Big Data and New Paradigms in Information Management. Vladimir Videnovic Institute for Information Management

Big Data: A Storage Systems Perspective Muthukumar Murugan Ph.D. HP Storage Division

Data Refinery with Big Data Aspects

How Transactional Analytics is Changing the Future of Business A look at the options, use cases, and anti-patterns

Decoding the Big Data Deluge a Virtual Approach. Dan Luongo, Global Lead, Field Solution Engineering Data Virtualization Business Unit, Cisco

Session 1: IT Infrastructure Security Vertica / Hadoop Integration and Analytic Capabilities for Federal Big Data Challenges

Big Data: Opportunities & Challenges, Myths & Truths 資 料 來 源 : 台 大 廖 世 偉 教 授 課 程 資 料

Pulsar Realtime Analytics At Scale. Tony Ng April 14, 2015

Improving Data Processing Speed in Big Data Analytics Using. HDFS Method

SURVEY REPORT DATA SCIENCE SOCIETY 2014

BIG DATA & DATA SCIENCE

Cloud-based Infrastructures. Serving INSPIRE needs

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Klarna Tech Talk: Mind the Data! Jeff Pollock InfoSphere Information Integration & Governance

Roadmap Talend : découvrez les futures fonctionnalités de Talend

Executive Summary... 2 Introduction Defining Big Data The Importance of Big Data... 4 Building a Big Data Platform...

Hortonworks & SAS. Analytics everywhere. Page 1. Hortonworks Inc All Rights Reserved

Data Warehousing in the Age of Big Data

W H I T E P A P E R. Deriving Intelligence from Large Data Using Hadoop and Applying Analytics. Abstract

Architecting for Big Data Analytics and Beyond: A New Framework for Business Intelligence and Data Warehousing

Oracle Database 12c Plug In. Switch On. Get SMART.

The 3 questions to ask yourself about BIG DATA

Big Data Use Case: Business Analytics

Transcription:

Leveraging Big Data Technologies to Support Research in Unstructured Data Analytics BY FRANÇOYS LABONTÉ GENERAL MANAGER JUNE 16, 2015 Principal partenaire financier WWW.CRIM.CA

ABOUT CRIM Applied research centre in IT Dual mission: Provide expertise in IT to support enterprises and organisations in developing innovative products and solutions Contribute to the creation of new knowledge through scientific activities and publications Major financial partner 2

THREE MAJOR AREAS OF EXPERTISE 1 2 3 INTERACTION AND HUMAN-SYSTEMS INTERFACES Voice, movement, emotions Augmented reality User activity-related aspects ADVANCED DATA ANALYTICS Analysis and processing of video, imagery, audio, text Semantics, natural language processing Geospatial imaging ADVANCED ARCHITECTURES AND TECHNOLOGIES FOR DEVELOPMENT AND TESTING Client / cloud / mobile architectural approaches Test modeling and automation Code generation, model inference Development, test and technological management methodologies 3

THE BIG DATA HYPE From Gartner: At CRIM, since many years: Volume: we have dealing with large data sets: videos, satellite imagery, large text corpus Variety: we have been processing multi-modal data sets (text, images, audio, video) Velocity: we have been working on analyzing continuous data streams (surveillance) Visualisation: we have been investigating and developing human-machine interfaces Value (actionable items): we have been developing intelligent decision-support systems SO WHAT IS IT ALL ABOUT? 4

BIG DATA TECHNOLOGIES Open up new possibilities to solve complex problems in much simpler ways than before Hadoop and other related technologies: No limitation on computing resources No need to worry about scaling up NoSQL and other related technologies: No need to know in advance the relations between the elements in a database Capacity to combine as needed various heterogeneous data sources Dynamic data processing (streams): Going away from the batch processing approach Capacity to develop more adaptive and reactive systems Emergence of machine-to-machine / connected objects / Internet of things applications Data centers and cloud technologies Data storage and file management is simplified Promising technologies which do not offer yet simple, stable and mature solutions. 5

CRIM AND BIG DATA To continue developing our expertise by leveraging Big Data technologies in advanced analytics, but also in human-systems interactions and in architectures and advanced technologies for software development and testing New ways to think about complex problems Emphasis on problems involving unstructured data Empirical knowledge of Big Data technologies to accompany enterprises and organisations Application-driven with concrete use-case Looking for the 5 th V: Value We prefer talking about SMART DATA Multidisciplinary approach: Data science Advanced analytics / machine learning Visualisation and interaction Business analysts Governance and data quality Product management Data governance Architecture and software development 6

SMART DATA: ADVANCED ANALYTICS How to make it happen? What will happen? Prescriptive Why? Predictive value What happened? Descriptive Diagnostic difficulty 7

THE A 2 DI PROJECT (ADVANCED ANALYTICS FOR DATA INTELLIGENCE) Goals Develop a practical expertise with Big Data Technologies (analytics, interaction, visualisation) Consolidate CRIM s advanced analytics components Build concrete use-case that can be used as an interactive «Vitrine technologique» Foster multidisciplinary projects Develop new collaborations and partnerships 8

THE A 2 DI INFRASTRUCTURE Data collection and preparation Storage Data enrichment Metadata Analytics, data mining, machine learning, inference, fusion, statistical, heuristics Visualisation Decision support Configurable environment: specific deployments for selected use-cases Openstack Hadoop / Spark Data analytics tools Partners and external environment 9

DATA SET FROM OCEAN NETWORKS CANADA Video & audio streams Manual annotations, log files Spectrogram, echo sounder, hydrophone Streaming Data Text Data Multi dimensional Time Series Geo Spatial Video & Image Audio Relational Social Network RT Monitoring Vertical profiling system, sonar Navigation information, bathymetry, maps Fixed cameras and cameras mounted on a rover Narrative description Ontologies 10

USE-CASE # 1 Key word detection from the audio information of submarine maintenance videos Approximately 300 hours to process Specialized vocabulary in biology and submarine navigation Apache High level library for the processing of very large data sets Developed at AMPLab in 2009 (Berkley) Generalized MapReduce paradigm: 30x faster, with low latency for streaming applications Distributed in-memory computing Now more popular than Hadoop Native integration with: Hadoop, ElasticSearch, Cassandra, RDBMS, Play!, etc 11

ELASTIC SEARCH Distributed search engine NoSQL document database High-availability Linear horizontal scalability Widely used in industry: Features: Full-text advanced search (Lucene) Geospatial queries Approximate string matching Real-time analytics Native integration with: Hadoop (HDFS), Spark, etc 12

USE-CASE # 2 Integration of geolocation data Keywords position Rover position Satellite imagery Sonar location Spatio-temporal layer for Accumulo (NoSQL) GeoMesa + Accumulo = big-data + PostGIS + PostgreSQL Storage, querying and processing of vector spatial-temporal big-data OGC standards support: WMS, WFS, WPS Use-cases: Density heatmaps Batch or streaming analytics Spatio-temporal predictive analytics Native integration with: Spark (analytics et clustering) GeoServer (webmapping) et OpenLayers (frontend) GeoTrellis for raster geospatial data (satellite imagery, etc ) 13

USE-CASE # 3 Keyword search enhancement with ontologies from Web resources Natural langage processing 14

PLATFORM DEMONSTRATION 15

ANOTHER BIG DATA PROJECT VESTA Video Evaluation System for Task Analysis LEADS research network : Learning Environment Across Disciplines Education sciences: How do students learn? 6 universities et 11 partner organizations (Canada) 13 universities et 4 partner organization (North Amercia, Europe, Australia) Led by Dr Susanne Lajoie (McGill University) 16

LEADS CONTEXT Video analysis of students in learning situations Video content: typically one student, many tasks Audio content: Think aloud, reading, conversation, answering questions Video Local sources Access rights management Manual transcripts Manual coding Data sharing 17

THE VESTA PLATFORM FEATURES A Web-based platform relying on some of the most recent HTML5 features 5 semi-automated annotation services Speaker identification Transcription Audio-text correspondence Transition detection (video) Face detection 3 utility services Annotation storage Load balancing / task dispatching Multimedia file storage Access rights management taking into account ethics approval for research protocols 18

THE VESTA PLATEFORM 19

CONCLUSIONS Big Data offer a huge potential, largely underexploited at this time Like numerous fundamental changes, expect a long journey Establish an ambitious vision, accomplish modest first steps but with a tangible value There is no one size fits all approach; it must be tailored to the specific use cas The question is not Too Big or not Too Big, what is important is data intelligence ( Smart Data ) that brings concrete value to the organisation Big Data technologies can also be used in other contexts On top of technological challenges, human challenges will dominate and determine the success or failure of specific initiatives. 20

PITFALLS A wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the over abundance of information sources that might consume it. Herbert Simons: Designing organization to an information-rich World; 1 Do not plan enough Plan too much Weak commitment Thinking it will be easy to implement Minimise issues related to change management 21