Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.



Similar documents
Relational Database Systems 2 1. System Architecture

Multimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig

Data Warehousing. Jens Teubner, TU Dortmund Winter 2015/16. Jens Teubner Data Warehousing Winter 2015/16 1

Image Databases INLS 623 Lina Huang Elise Moore Ketan Palshikar Nevin Yang

Bases de données avancées Bases de données multimédia

Search and Information Retrieval

Discovering Computers Chapter 3 Application Software

MultiMedia and Imaging Databases

Course DSS. Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Multimodal Biometrics R&D Efforts to Exploit Biometric Transaction Management Systems

Topics in basic DBMS course

GRAPHICAL USER INTERFACE, ACCESS, SEARCH AND REPORTING

ANALYTICS IN BIG DATA ERA

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization

Introduction to Data Mining

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

HELP DESK SYSTEMS. Using CaseBased Reasoning

How To Build A Decision Support System

Chapter 5. Warehousing, Data Acquisition, Data. Visualization

Industrial Challenges for Content-Based Image Retrieval

Chapter 3. Application Software. Chapter 3 Objectives. Application Software. Application Software. Application Software. What is application software?

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

CLOUD COMPUTING CONCEPTS FOR ACADEMIC COLLABORATION

M3039 MPEG 97/ January 1998

Information Technology Career Field Pathways and Course Structure

Standardized Multimedia Retrieval in Distributed Heterogenous Database Systems. Dr. Mario Döller

Report on the Dagstuhl Seminar Data Quality on the Web

Data and Analysis. Informatics 1 School of Informatics, University of Edinburgh. Part III Unstructured Data. Ian Stark. Staff-Student Liaison Meeting

Chapter 3. Application Software. Chapter 3 Objectives. Application Software

Content-Based Image Retrieval

Open issues and research trends in Content-based Image Retrieval

Modern Databases. Database Systems Lecture 18 Natasha Alechina

Introduzione alle Biblioteche Digitali Audio/Video

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Digital Libraries and Content Management

COIS Databases

Architecture and Implementation of Database Systems

MULTIMEDIA MINING RESEARCH AN OVERVIEW

Information Management

Current Page Location. Tips for Authors and Creators of Digital Content: Using your Institution's Repository: Using Version Control Software:

HYPER MEDIA MESSAGING

Creating and Using Databases for Android Applications

Working with Windows Movie Maker

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

Mining the Software Change Repository of a Legacy Telephony System

Big Data: Image & Video Analytics

Introduction. A. Bellaachia Page: 1

CSE 132A. Database Systems Principles

Big Data and Analytics: Challenges and Opportunities

1 File Processing Systems

Optimization of Image Search from Photo Sharing Websites Using Personal Data

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

A semantic extension of a hierarchical storage management system for small and medium-sized enterprises.

Mining Text Data: An Introduction

Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.

Data Warehousing. Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Oracle 11g New Features - OCP Upgrade Exam

Multimedia Database Systems: Where are we now?

PATROL From a Database Administrator s Perspective

Similarity Search in a Very Large Scale Using Hadoop and HBase

Digital Asset Management 数 字 媒 体 资 源 管 理 任 课 老 师 : 张 宏 鑫

Data Storage 3.1. Foundations of Computer Science Cengage Learning

Guidelines on Information Deliverables for Research Projects in Grand Canyon National Park

Milestone Edge Storage with flexible retrieval

What is a database? COSC 304 Introduction to Database Systems. Database Introduction. Example Problem. Databases in the Real-World

The Scientific Data Mining Process

Module 4 Creation and Management of Databases Using CDS/ISIS

Master s Program in Information Systems

ANIMATION a system for animation scene and contents creation, retrieval and display

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Computers Are Your Future Eleventh Edition Chapter 5: Application Software: Tools for Productivity

DATABASES AND DATABASE USERS

Chapter Topics. Technology in Action. Productivity Software. Application Software 12/18/2008. System Software vs. Application Software

IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications

Chapter 3 Data Warehouse - technological growth

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Databases in Organizations

XML Databases 6. SQL/XML

Information Retrieval and Web Search Engines

UTILIZING COMPOUND TERM PROCESSING TO ADDRESS RECORDS MANAGEMENT CHALLENGES

Investigation of Cloud Computing: Applications and Challenges

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition

Data Warehousing and Data Mining

MMGD0203 Multimedia Design MMGD0203 MULTIMEDIA DESIGN. Chapter 3 Graphics and Animations

n Assignment 4 n Due Thursday 2/19 n Business paper draft n Due Tuesday 2/24 n Database Assignment 2 posted n Due Thursday 2/26

OVERVIEW OF JPSEARCH: A STANDARD FOR IMAGE SEARCH AND RETRIEVAL

Semantic Search in Portals using Ontologies

Transcription:

Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de

0 Organizational Issues Lecture 21.10.2014 03.02.2015 11:30 14:00 (approx. 2 lecture hours with a break) Exercises, detours, and (non-mandatory) homework 4 or 5 Credits depending on the examination rules Exams Oral exam Achieving more than 50% in homework points is advised Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 2

0 Organizational Issues Recommended literature Schmitt: Ähnlichkeitssuche in Multimedia-Datenbanken, Oldenbourg, 2005 Steinmetz: Multimedia-Technologie: Grundlagen, Komponenten und Systeme, Springer, 1999 Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 3

0 Organizational Issues Castelli/Bergman: Image Databases, Wiley, 2002 Khoshafian/Baker: Multimedia and Imaging Databases, Morgan Kaufmann, 1996 Sometimes: original papers (on our Web page) Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 4

0 Organizational Issues Course Web page http://www.ifis.cs.tu-bs.de/teaching/ws-1415/mmdb Contains slides, exercises, related papers and a video of the lecture Any questions? Just drop us an email Relational Database Systems 1 Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 5

1 Introduction 1 Introduction 1.1 What are multimedia databases? 1.2 Multimedia database applications 1.3 Evaluation of retrieval techniques Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 6

1.1 Multimedia Databases What are multimedia databases (MMDB)? Databases + multimedia = MMDB We already know databases, so what is multimedia? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 7

1.1 Basic Definitions Multimedia The concept of multimedia expresses the integration of different digital media types The integration is usually performed in a document Basic media types are text, image, vector graphics, audio, and video Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 8

1.1 Data Types Text Text data, Spreadsheets, E-Mail, Image Photos (Bitmaps), Vector graphics, CAD, Audio Speech- and music records, annotations, wave files, MIDI, MP3, Video Dynamical image record, frame-sequences, MPEG, AVI, Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 9

1.1 Documents Document types Media objects are documents which are of only one type (not necessarily text) Multimedia objects are general documents which allow an arbitrary combination of different types Multimedia data is transferred through the use of a medium Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 10

1.1 Basic Definitions Medium A medium is a carrier of information in a communication connection It is independent of the transported information The used medium can also be changed during information transfer Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 11

1.1 Medium Example Book Communication between author and reader Independent from content Hierarchically built on text and images Reading out loud represents medium change to sound/audio Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 12

1.1 Medium Classification Based on receiver type Visual/optical medium Acoustic mediums Haptical medium through tactile senses Olfactory medium through smell Gustatory medium through taste Based on time Dynamic Static Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 13

1.1 Multimedia Databases We now have seen what multimedia is and how it is transported (through some medium) But why do we need databases? Most important operations of databases are data storage and data retrieval Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 14

1.1 Multimedia Databases Persistent storage of multimedia data, e.g.: Text documents Vector graphics, CAD Images, audio, video Content-based retrieval Efficient content-based search Standardization of meta-data (e. g., MPEG-7, MPEG-21) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 15

1.1 Multimedia Databases Stand-alone vs. database storage model? Special retrieval functionality as well as corresponding optimization can be provided in both cases But in the second case we also get the general advantages of databases Declarative query language Orthogonal combination of the query functionality Query optimization, Index structures Transaction management, recovery... Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 16

1.1 Historical Overview 1960 1970 Retrieval procedures for text documents (Information Retrieval) Relational Databases and SQL 1980 Presence of multimedia objects intensifies 1990 SQL-92 introduces BLOBs First Multimedia-Databases 2000 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 17

1.1 Commercial Systems Relational Databases use the data type BLOB (binary large object) Un-interpreted data Retrieval through metadata like e.g., file name, size, author, Object-relational extensions feature enhanced retrieval functionality Semantic search IBM DB2 Extenders, Oracle Cartridges, Integration in DB through UDFs, UDTs, Stored Procedures, Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 18

1.1 Requirements Requirements for multimedia databases (Christodoulakis, 1985) Classical database functionality Maintenance of unformatted data Consideration of special storage and presentation devices Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 19

1.1 Requirements To comply with these requirements the following aspects need to be considered Software architecture new or extension of existing databases? Content addressing identification of the objects through content-based features Performance improvements using indexes, optimization, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 20

1.1 Requirements User interface how should the user interact with the system? Separate structure from content! Information extraction (automatic) generation of content-based features Storage devices very large storage capacity, redundancy control and compression Information retrieval integration of some extended search functionality Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 21

1.1 Retrieval Retrieval: choosing between data objects. Based on a SELECT condition (exact match) or a defined similarity connection (best match) Retrieval may also cover the delivery of the results to the user Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 22

1.1 Retrieval Closer look at the search functionality Semantic search functionality Orthogonal integration of classical and extended functionality Search does not directly access the media objects Extraction, normalization and indexing of contentbased features Meaningful similarity/distance measures Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 23

1.1 Content-based Retrieval Retrieve all images showing a sunset! What exactly do these images have in common? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 24

1.1 Schematic View Usually 2 main steps Example: image databases Image collection Digitization Image analysis and feature extraction Image database Creating the database Image query Digitization Querying the database Image analysis and feature extraction Similarity search Search result Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 25

1.1 Detailed View Query Result 3. Query preparation 5. Result preparation Query plan & feature values Result data 4. Similarity computation & query processing Feature values Raw & relational data 2. Extraction of features Raw data MM-Database 1. Insert into the database MM-Objects + relational data Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 26

Relevance feedback 1.1 More Detailed View Query Result Query preparation Result preparation Normalization Segmentation Feature extraction Optimization Medium transformation Format transformation Feature values Query plan Result data Similarity computation Query processing Feature values Feature index Feature extraction Feature recognition Feature preparation MM-Database BLOBs/CLOBs Pre-processing Decomposition Normalization Segmentation Relational DB Metadata Profile Structure data MM-Objects Relational data Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 27

1.2 Applications Lots of multimedia content on the Web Social networking e.g., Facebook, MySpace, Hi5, etc. Photo sharing e.g., Flickr, Photobucket, Instagram, Picasa, etc. Video sharing e.g., YouTube, Metacafe, blip.tv, Liveleak, etc. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 28

1.2 Applications Cameras are everywhere In London there are at least 500,000 cameras in the city, and one study showed that in a single day a person could expect to be filmed 300 times Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 29

1.2 Applications Picasa face recognition Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 30

1.2 Applications Picasa, face recognition example Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 31

1.2 Applications Picasa, learning phase Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 32

1.2 Applications Picasa example Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 33

1.2 Sample Scenario Consider a police investigation of a large-scale drug operation Possible generated data: Video data captured by surveillance cameras Audio data captured Image data consisting of still photographs taken by investigators Structured relational data containing background information Geographic information system data Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 34

1.2 Sample Scenario Possible queries Image query by keywords: police officer wants to examine pictures of Tony Soprano Query: retrieve all images from the image library in which Tony Soprano appears" Image query by example: the police officer has a photograph and wants to find the identity of the person in the picture He hopes that someone else has already tagged another photo of this person Query: retrieve all images from the database in which the person appearing in the (currently displayed) photograph appears Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 35

1.2 Sample Scenario Video Query: (Murder case) The police assumes that the killer must have interacted with the victim in the near past Query: Find all video segments from last week in which Jerry appears Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 36

1.2 Sample Scenario Heterogeneous Multimedia Query: Find all individuals who have been photographed with Tony Soprano and who have been convicted of attempted murder in New Jersey and who have recently had electronic fund transfers made into their bank accounts from ABC Corp. Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 37

1.2 Characteristics so there are different types of queries what about the MMDB characteristics? Static: high number of search queries (read access), few modifications of the data Dynamic: often modifications of the data Passive: database reacts only at requests from outside Active: the functionality of the database leads to operations at application level Standard search: queries are answered through the use of metadata e.g., Google-image search Retrieval functionality: content based search on the multimedia repository e.g., Picasa face recognition Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 38

1.2 Example Passive static retrieval Art historical use case Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 39

1.2 Example Coat of arms: Possible hit in a multimedia database Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 40

1.2 Example Active dynamic retrieval Wetter warning through evaluation of satellite photos Extraction Typhoon-Warning for the Philippines Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 41

1.2 Example Standard search Queries are answered through the use of metadata e.g., Google-image search Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 42

1.2 Example Retrieval functionality Content based e.g., Picasa face recognition Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 43

1.3 Retrieval Evaluation Basic evaluation of retrieval techniques Efficiency of the system Efficient utilization of system resources Scalable also over big collections Effectivity of the retrieval process High quality of the result Meaningful usage of the system What is more important? An effective retrieval process or an efficient one? Depends on the application! Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 44

1.3 Evaluating Efficiency Characteristic values to measure efficiency are e.g.: Memory usage CPU-time Number of I/O-Operations Response time Depends on the (Hardware-) environment Goal: the system should be efficient enough! Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 45

1.3 Evaluating Effectivity Measuring effectivity is more difficult and always depending on the query We need to define some query-dependent evaluation measures! Objective quality metrics Independent from the querying interface and the retrieval procedure Allows for comparing different systems/algorithms Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 46

1.3 Evaluating Effectivity Effectivity can be measured regarding an explicit query Main focus on evaluating the behavior of the system with respect to a query Relevance of the result set But effectivity also needs to consider implicit information needs Main focus on evaluating the usefulness, usability and user friendliness of the system Not relevant for this lecure! Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 47

1.3 Relevance Relevance as a measure for retrieval: each document will be binary classified as relevant or irrelevant with respect to the query This classification is manually performed by experts The response of the system to the query will be compared to this classification Compare the obtained response with the ideal result Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 48

1.3 Involved Sets Then apply the automatic retrieval system: collection searched for (= relevant) found (= query result) Experts say: this is relevant The automatic retrieval says: this is relevant Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 49

1.3 False Positives False positives: irrelevant documents, classified as relevant by the system False alarms collection fd ca fa searched for found Needlessly increase the result set Usually inevitable (ambiguity) Can be easily eliminated by the user cd Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 50

1.3 False Negatives False negatives: relevant documents classified by the system as irrelevant False dismissals Dangerous, since they can t be detected easily by the user Are there better documents in the collection which the system didn t return? False alarms are usually not as bad as false dismissals fd searched for collection ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 51

1.3 Remaining Sets Correct positives (correct alarms) All documents correctly classified by the system as relevant Correct negatives (correct dismissals) All documents correctly classified by the system as irrelevant All sets are disjunctive and their reunion is the collection entire document collection fd searched for ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 52

1.3 Overview Confusion matrix: visualizes the effectivity of an algorithm Systemevaluation Userevaluation relevant relevant ca irrelevant fd irrelevant fa cd Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 53

1.3 Interpretation Relevant results = fd + ca Handpicked by experts! Retrieved results = ca + fa Retrieved by the system collection fd searched for ca fa found cd Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 54

1.3 Precision Precision measures the ratio of correctly returned documents relative to all returned documents P = ca / (ca + fa) Value between [0, 1] (1 representing the best value) High number of false alarms mean worse results fd searched for collection ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 55

1.3 Recall Recall measures the ratio of correctly returned documents relative to all relevant documents R = ca / (ca + fd) collection Value between [0, 1] (1 representing the best value) High number of false drops mean worse results fd searched for ca cd fa found Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 56

1.3 Precision-Recall Analysis Both measures only make sense, if considered at the same time E.g., get perfect recall by simply returning all documents, but then the precision is extremely low Can be balanced by tuning the system E.g., smaller result sets lead to better precision rates at the cost of recall Usually the average precision-recall of more queries is considered (macro evaluation) Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 57

1.3 Actual Evaluation collection Alarms (returned elements) fd ca fa divided in ca and fa searched for found Precision is easy to calculate cd Dismissals (not returned elements) are not so trivial to divide in cd und fd, because the entire collection has to be classified Recall is difficult to calculate Standardized Benchmarks Provided connections and queries Annotated result sets Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 58

1.3 Example Query fa ca fd cd P R Q 1 8 2 6 4 0,2 0,25 Q 2 2 8 2 8 0,8 0,8 Average 0,5 0,525 Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 59

1.3 Representation Precision-Recall-Curves Average precision of the system 3 at a recall-level of 0,2 System 1 System 2 System 3 Which system is the best? What is more important: recall or precision? Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 60

1 Next lecture Retrieval of images by color Introduction to color spaces Color histograms Matching Multimedia Databases Wolf-Tilo Balke Institut für Informationssysteme TU Braunschweig 61