Image mining technologies and industrial challenges. Sébastien GILLES, Ph.D. Chief scientist & co-founder sg@ltutech.com www.ltutech.



Similar documents
Industrial Challenges for Content-Based Image Retrieval

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

Accelerating Hadoop MapReduce Using an In-Memory Data Grid

SAP HANA SAP s In-Memory Database. Dr. Martin Kittel, SAP HANA Development January 16, 2013

Oracle BI EE Implementation on Netezza. Prepared by SureShot Strategies, Inc.

Databricks. A Primer

Achieving Zero Downtime and Accelerating Performance for WordPress

Glassfish Architecture.

Oracle Warehouse Builder 10g

Oracle Identity Analytics Architecture. An Oracle White Paper July 2010

An Oracle White Paper July Oracle Primavera Contract Management, Business Intelligence Publisher Edition-Sizing Guide

Cloud Based Application Architectures using Smart Computing

Five Steps to Integrate SalesForce.com with 3 rd -Party Systems and Avoid Most Common Mistakes

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

RevoScaleR Speed and Scalability

Managing your Red Hat Enterprise Linux guests with RHN Satellite

WebSphere Application Server - Introduction, Monitoring Tools, & Administration

An Oracle White Paper May Oracle Tuxedo: An Enterprise Platform for Dynamic Languages

Lustre Networking BY PETER J. BRAAM

Portable Scale-Out Benchmarks for MySQL. MySQL User Conference 2008 Robert Hodges CTO Continuent, Inc.

Reporting component for templates, reports and documents. Formerly XML Publisher.

Client Overview. Engagement Situation. Key Requirements

Oracle Primavera P6 Enterprise Project Portfolio Management Performance and Sizing Guide. An Oracle White Paper October 2010

Scalability and BMC Remedy Action Request System TECHNICAL WHITE PAPER

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Oracle BI Publisher Enterprise Cluster Deployment. An Oracle White Paper August 2007

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

The basic data mining algorithms introduced may be enhanced in a number of ways.

Big Data Analytics - Accelerated. stream-horizon.com

How To Make Sense Of Data With Altilia

Protegrity Data Security Platform

2012 LABVANTAGE Solutions, Inc. All Rights Reserved.

Data sharing in the Big Data era

Lab Management, Device Provisioning and Test Automation Software

DATABASES AND ERP SELECTION: ORACLE VS SQL SERVER

TESTING AND OPTIMIZING WEB APPLICATION S PERFORMANCE AQA CASE STUDY

APPLICATION COMPLIANCE AUDIT & ENFORCEMENT

Databricks. A Primer

Mission-Critical Database with Real-Time Search for Big Data

Elettra DAta analysis Tool: a data webhousing tool for heterogeneous log analysis

Learn Oracle WebLogic Server 12c Administration For Middleware Administrators

Running Oracle s PeopleSoft Human Capital Management on Oracle SuperCluster T5-8 O R A C L E W H I T E P A P E R L A S T U P D A T E D J U N E

Clustering and Queue Replication:

Stingray Traffic Manager Sizing Guide

VERITAS Business Solutions. for DB2

Outdated Architectures Are Holding Back the Cloud

A framework for web-based product data management using J2EE

Practical Cassandra. Vitalii

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

StreamBase High Availability

HP SiteScope software

Internet Engineering: Web Application Architecture. Ali Kamandi Sharif University of Technology Fall 2007

scalability OneBridge

An Oracle White Paper June High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

EMA Radar for Workload Automation (WLA): Q2 2012

An Oracle White Paper November Leveraging Massively Parallel Processing in an Oracle Environment for Big Data Analytics

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

An Oracle White Paper May Oracle Audit Vault and Database Firewall 12.1 Sizing Best Practices

CA Repository for Distributed. Systems r2.3. Benefits. Overview. The CA Advantage

Data Virtualization A Potential Antidote for Big Data Growing Pains

Motivation Definitions EAI Architectures Elements Integration Technologies. Part I. EAI: Foundations, Concepts, and Architectures

LinuxWorld Conference & Expo Server Farms and XML Web Services

Qlik Sense Enabling the New Enterprise

Managing a local Galaxy Instance. Anushka Brownley / Adam Kraut BioTeam Inc.

Bryan Tuft Sr. Sales Consultant Global Embedded Business Unit

Chapter 1 - Web Server Management and Cluster Topology

OWB Users, Enter The New ODI World

Amazon Redshift & Amazon DynamoDB Michael Hanisch, Amazon Web Services Erez Hadas-Sonnenschein, clipkit GmbH Witali Stohler, clipkit GmbH

Using MySQL for Big Data Advantage Integrate for Insight Sastry Vedantam

Oracle Database Security and Audit

Oracle9i Application Server: Options for Running Active Server Pages. An Oracle White Paper July 2001

Distributed Computing and Big Data: Hadoop and MapReduce

High Availability with Elixir

Mike Chyi, Micro Focus Solution Consultant May 12, 2010

Actian SQL in Hadoop Buyer s Guide

SERVICE ORIENTED ARCHITECTURE

SAP Solutions High Availability on SUSE Linux Enterprise Server for SAP Applications

Oracle Database 11g Comparison Chart

Zynga Analytics Leveraging Big Data to Make Games More Fun and Social

Ecomm Enterprise High Availability Solution. Ecomm Enterprise High Availability Solution (EEHAS) Page 1 of 7

be architected pool of servers reliability and

APPLICATION VISIBILITY AND CONTROL

SOFT 437. Software Performance Analysis. Ch 5:Web Applications and Other Distributed Systems

iservdb The database closest to you IDEAS Institute

Reconciliation and best practices in a configuration management system. White paper

( ) ( ) TECHNOLOGY BRIEF. XTNDConnect Server: Scalability SCALABILITY REFERS TO HOW WELL THE SYSTEM ADAPTS TO INCREASED DEMANDS AND A GREATER

Microsoft SQL Server 2008 R2 Enterprise Edition and Microsoft SharePoint Server 2010

Confidentio. Integrated security processing unit. Including key management module, encryption engine and random number generator

a division of Technical Overview Xenos Enterprise Server 2.0

Scaling Web Applications on Server-Farms Requires Distributed Caching

A Survey Study on Monitoring Service for Grid

An Overview of SAP BW Powered by HANA. Al Weedman

WITH BIGMEMORY WEBMETHODS. Introduction

Streamlining the Process of Business Intelligence with JReport

From Business Intelligence to Location Intelligence with the Lily Library

Oracle Real-Time Scheduler Benchmark

Affinity Aware VM Colocation Mechanism for Cloud

How In-Memory Data Grids Can Analyze Fast-Changing Data in Real Time

ActiveVOS Clustering with JBoss

Transcription:

Image mining technologies and industrial challenges Sébastien GILLES, Ph.D. Chief scientist & co-founder sg@ltutech.com www.ltutech.com

Context LTU has successfully deployed image mining softwares in very demanding industrial environments - Large data volumes, high throughput - Clients use the product and build value on it - Mission-critical tasks - Requirements: security, availability, failover, fast response time and of course quality There are complex issues that need to be solved: - Adapt rapidly to changing conditions Market Economical, social environment Technology - Design a generic and modular technology for multiple reuse Standalone application OEM Integration

LTU corporate background Founded in 1999, LTU Technologies is a software company focused on image mining technologies. - Capitalizing on 10 years of high-profile research of the founders at MIT Media Lab, Oxford, INRIA - 20 employees, headquartered in Paris, office in Washington D.C. Market verticals - Law enforcement: World-wide deployment at top-level intelligence divisions (incl. FBI). Child exploitation, Stolen Art, Counterfeiting, ID theft. - Industrial property: Patent offices, IP-protection companies run LTU. Trademark search, Brand protection, Counterfeiting. - Asset management: e-commerce, publishing companies Intra/Extra/Inter-net integration to Digital Asset Management softwares.

LTU DNA vs. MD-5 or SHA-1 DNA MD-5 or SHA-1 Image Database photo11.jpg 100% 98% 97% 85% 80% photo11.jpg Young_girl.bmp Duplicates Clones Similar images

Investigating with Image-Seeker Detection & upload of seized images Local, Na or Int l agencie On the ground Image seizure (HDD or Internet) Case: Barney Victim? Abuse? Date: 6/2/2004 Victim Identification Link to other cases Image Database + Case information AO763456 XW787346 Anne Smith Action Validation Evidence of Abuse Retrieval of series of images AB763235 KW787386 KX9826563

A Layered Software Architecture Client-specific system built by LTU or partner HTTP Image-Seeker HTML GUI Apache PHP server PHP plugin API PHP Plugins SDK layer (php/java/perl/c/ ) Java Clients Windows Clients -Verity -Investigation soft. -MAM software XML/HTTP Image-Seeker HTTP API LTU Image-Seeker platform Distributable and redundant (CORBA) Modulararchitecture (polymorphic java components) -.net web service -GUI -3rd party system LTU Components Java API LTU algorithmic Components Image Processing, DNA computation, Data Retrieval, Automatic Keywording Storage/Search Components Textual Search, Data Storage (databases/files) -Oracle -PostgreSQL -Verity -

Industrial challenges for DB General system design considerations - Multimedia indexing require CPU-intensive processes - Forget about DB server running MM indexing process Oracle s plug-ins useless for large-scale multimedia indexing - N processing nodes and M storages nodes Ensure a true «operational» scalability - Distributed, clustered architecture (failover) - Bottleneck: maintenance Re-index 1M images while maintaining QoS? Synchronization of N image repositories and data warehouses Adapt data models to multimedia data - Increased complexity with video, images, web pages, etc. - Issue: models are task-dependent, due to performance issues. Increase performance, reduce response-time - DBMS remain slow to access data: need for memory caching - Fast Nearest-Neighbour search in high dimensions and large volumes

Industrial challenges for Computer Vision Real-life images are not «clean» - Highly variable acquisition conditions (uncontrolled lighting, etc.) - Multiple imaging defects (focus, over/under-exposition, etc.) - Collection-dependent artefacts Client requirements vary a lot - Heterogeneous image types (pictures, drawings, logos, etc.) - Highly variable definition of «what matters in an image» - Global vs. Local analysis Performance/Quality tradeoffs - DNA extraction+classification to be performed in near real-time - Search to be performed in near real-time - If asked, clients tend to favour quality vs. performance (insight: Moore law)

Conclusions Industry offers many practical, technical challenges Research solves many theoretical problems A gap remains, that needs to be bridged: a full client solution generally involves several technologies - NLP, Image Analysis, AI, etc. (e.g: trademark search) This means 3 main challenges! - Integrating those technologies is challenge#1 - Addressing real-life data, scenarios and usages is challenge#2 - Optimizing large-scale, complex systems is challenge#3 Issues: - Difficulty of obtaining large volumes real-life data for academics - Software re-use too rare in academics Ideas: - Validate research algorithms on professional platforms (LTU?) - Develop common benchmarking efforts on real data