NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE. Venu Govindaraju



Similar documents
Non-negative Matrix Factorization (NMF) in Semi-supervised Learning Reducing Dimension and Maintaining Meaning

Machine Learning: Overview

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Big Data: Rethinking Text Visualization

Visualization methods for patent data

Some Research Challenges for Big Data Analytics of Intelligent Security

: Introduction to Machine Learning Dr. Rita Osadchy

Machine Learning Introduction

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Cell Phone based Activity Detection using Markov Logic Network

Detection. Perspective. Network Anomaly. Bhattacharyya. Jugal. A Machine Learning »C) Dhruba Kumar. Kumar KaKta. CRC Press J Taylor & Francis Croup

CS Master Level Courses and Areas COURSE DESCRIPTIONS. CSCI 521 Real-Time Systems. CSCI 522 High Performance Computing

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Introduction to Pattern Recognition

Learning outcomes. Knowledge and understanding. Competence and skills

Novelty Detection in image recognition using IRF Neural Networks properties

Learning is a very general term denoting the way in which agents:

Reinventing Business Intelligence through Big Data

Information Management course

AP Physics 1 and 2 Lab Investigations

Software Development Training Camp 1 (0-3) Prerequisite : Program development skill enhancement camp, at least 48 person-hours.

Cursive Handwriting Recognition for Document Archiving

Signature Segmentation from Machine Printed Documents using Conditional Random Field

Chapter 6. The stacking ensemble approach

Steven C.H. Hoi School of Information Systems Singapore Management University

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Applications of Deep Learning to the GEOINT mission. June 2015

The Artificial Prediction Market

Big Data: Image & Video Analytics

Big Data and Marketing

Statistical Models in Data Mining

DATA MINING TECHNIQUES AND APPLICATIONS

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

How To Make Sense Of Data With Altilia

Doctor of Philosophy in Computer Science

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

The Scientific Data Mining Process

p. 3 p. 4 p. 93 p. 111 p. 119

Graduate Co-op Students Information Manual. Department of Computer Science. Faculty of Science. University of Regina

MS1b Statistical Data Mining

Object Recognition. Selim Aksoy. Bilkent University

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Digital Identity & Authentication Directions Biometric Applications Who is doing what? Academia, Industry, Government

Introduction to Data Mining Techniques

Introduction to Data Mining

Introduction. A. Bellaachia Page: 1

Data, Measurements, Features

Introduction to Machine Learning Using Python. Vikram Kamath

The Cyber Threat Profiler

Big Data in the Mathematical Sciences

Machine Learning with MATLAB David Willingham Application Engineer

Data Mining Part 5. Prediction

Statistics for BIG data

An Introduction to Data Mining

Why is Internal Audit so Hard?

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

Master of Science in Computer Science

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Feature Selection vs. Extraction

Big Data and Analytics: Challenges and Opportunities

Machine Learning and Data Mining. Fundamentals, robotics, recognition

WHAT IS A SITE MAP. Types of Site Maps. vertical. horizontal. A site map (or sitemap) is a

Role of Social Networking in Marketing using Data Mining

A Semantic Model for Multimodal Data Mining in Healthcare Information Systems. D.K. Iakovidis & C. Smailis

Simplified Machine Learning for CUDA. Umar

2015 Workshops for Professors

Keywords image processing, signature verification, false acceptance rate, false rejection rate, forgeries, feature vectors, support vector machines.

Introduction to Machine Learning Lecture 1. Mehryar Mohri Courant Institute and Google Research

Using multiple models: Bagging, Boosting, Ensembles, Forests

Introduction to Data Mining

Environmental Remote Sensing GEOG 2021

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Foundations of Artificial Intelligence. Introduction to Data Mining

Data Mining Analytics for Business Intelligence and Decision Support

A Systemic Artificial Intelligence (AI) Approach to Difficult Text Analytics Tasks

Winter 2016 Course Timetable. Legend: TIME: M = Monday T = Tuesday W = Wednesday R = Thursday F = Friday BREATH: M = Methodology: RA = Research Area

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Network Machine Learning Research Group. Intended status: Informational October 19, 2015 Expires: April 21, 2016

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Clustering Big Data. Anil K. Jain. (with Radha Chitta and Rong Jin) Department of Computer Science Michigan State University November 29, 2012

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

High Productivity Data Processing Analytics Methods with Applications

Establishing the Uniqueness of the Human Voice for Security Applications

MACHINE LEARNING BASICS WITH R

Online Farsi Handwritten Character Recognition Using Hidden Markov Model

How To Cluster

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Machine learning for algo trading

Search Taxonomy. Web Search. Search Engine Optimization. Information Retrieval

LOCAL SURFACE PATCH BASED TIME ATTENDANCE SYSTEM USING FACE.

Azure Machine Learning, SQL Data Mining and R

EHR CURATION FOR MEDICAL MINING

Sentiment Analysis. D. Skrepetos 1. University of Waterloo. NLP Presenation, 06/17/2015

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

Using Data Mining for Mobile Communication Clustering and Characterization

Advanced analytics at your hands

large-scale machine learning revisited Léon Bottou Microsoft Research (NYC)

Programming Exercise 3: Multi-class Classification and Neural Networks

Is a Data Scientist the New Quant? Stuart Kozola MathWorks

COMP9321 Web Application Engineering

Transcription:

NAVIGATING SCIENTIFIC LITERATURE A HOLISTIC PERSPECTIVE Venu Govindaraju

BIOMETRICS DOCUMENT ANALYSIS PATTERN RECOGNITION 8/24/2015 ICDAR- 2015 2 Towards a Globally Optimal Approach for Learning Deep Unsupervised Models Organizing Multiple Experts for Efficient Pattern Recognition Active Pattern Recognition Using Genetic Programming A Complexity Framework for Combination of Classifiers in Verification and Identification Systems Image Processing using Ontology Concepts for Image Segmentation Language Motivated Approaches for Human Action Recognition and Spotting Intrusion Detection using Spatial Information and Behavioral Biometrics Integrating Minutiae Based Fingerprint Matching with Local Correlation Methods Integrating Facial Expressions and Skin Texture in Dace Recognition Stochastic Modeling of High-level Structures in Handwritten Word Recognition Statistical Techniques for Efficient Indexing and Retrieval of Document Images Probabilistic Random Field based Text Identification Enhancing Cyber Security through the use of Synthetic Handwritten CAPTCHAs Language Models and Automatic Topic Categorization for Information Retrieval in Handwritten Documents Methods for Biomedical Image Content Extraction Toward Improved Multimodal Retrieval of Biomedical Articles A Novel Multi-sample Fusion Methodology for Improving Biometric Recognition Enhancement and Retrieval of Low Quality Handwritten Documents A Stochastic Framework for Font Independent Devanagari OCR A Semi Supervised Framework for Handwritten Document Analysis Bayesian Background Models for Retrieval of Handwritten Documents Accents in Handwriting: A Hierarchical Bayesian Approach to Handwriting Analysis Hierarchical and Dynamic-Relational Models for Handwriting Recognition Multilingual Word Spotting in Offline Handwritten Documents A Framework for Fingerprint Enhancement and Feature Detection Minutia-Based Partial Fingerprint Recognition Sequential Pattern Classification without Explicit Feature Extraction Automatic Recognition of Handwritten Medical Forms for Search Engines Exploiting the Gap between Human and Machine Abilities in Handwriting Recognition for Web Security Applications Face Modeling and Biometric Anti-spoofing using Probability Distribution Transfer Learning A Framework for Efficient Fingerprint Identification using a Minutiae Tree

8/24/2015 ICDAR- 2015 3

8/24/2015 ICDAR- 2015 4 1990-2015 Key Innovations Grand Challenges 2015 +

8/24/2015 ICDAR- 2015 5 Old Order - DIA UW English Document Image Database (Phillips, Technical report, 1996, citations: 29) Layout analysis Word, line, zone segmentation, etc. Performance: >90% accuracy Logical structure analysis Text entity, reading order Performance >99% accuracy Text recognition Performance >99% accuracy

8/24/2015 ICDAR- 2015 6 Scientific Process nanos gigantum humeris insidentes Past/Current L E A R N Future D I S C O V E R Wisdom Judgment Understanding STATE OF THE ART DATA Data Numbers Symbols Knowledge Insights Theory Information Experiment Processed Framework data Who/What/When Analysis Meaning KNOWLEDGE

8/24/2015 ICDAR- 2015 7 When knowledge becomes data Variety Big Data Velocity Volume Veracity Source (4Vs of Big Data) : IBM

8/24/2015 ICDAR- 2015 8 Scientific Literature 2009 estimate: 50 million articles; 28 thousand journals 1.8M articles added every year. 23 million articles (Just biomedical literature) Volume 45 million articles Variety 40 million articles Unknown (peer reviewed only) Veracity Roughly, papers double every 10-15 years! [Meadows, 1998, p.16] Velocity

4Vs 8/24/2015 ICDAR- 2015 9 Big Data Side Effects Challenges for ICDAR Community Volume Velocity Variety Veracity References Reinvention Replicability Reputation 4Rs

References Volume Challenge 8/24/2015 ICDAR- 2015 10

Reinvention Velocity Challenge 8/24/2015 ICDAR- 2015 11

Replicability Veracity Challenge 8/24/2015 ICDAR- 2015 12 Nullius in verba On the word of no one" or "Take nobody's word for it"

Reputation Veracity Challenge 8/24/2015 ICDAR- 2015 13 How Many Scientists Does It Take to Write a Paper?

Challenges 8/24/2015 ICDAR- 2015 14

8/24/2015 ICDAR- 2015 15 GRAND CHALLENGE BIG DATA VOLUME VELOCITY VARIETY VERACITY References Reinvention Replicability Cognitive burden Reinventing wheel Authenticity

8/24/2015 ICDAR- 2015 16 Addressing the Cognitive Burden Volume Scientific articles Web Blogs Integrated learning Slides Tutorials Video talks

8/24/2015 ICDAR- 2015 17 Addressing Reinventing the wheel? Velocity Least square with linear constraints: one type of quadratic program in mathematics Isotonic regression: in statistics Trapezoid rule: calculus 17 th century Tai's Model, 1994, 254 citations 17

8/24/2015 ICDAR- 2015 18 Addressing Replicability Velocity Dataset UNLV/ISRI 64 pages, 6796 blocks Heuristics parameters Vertical: 15 pixels Horizontal: 50/30 pixels Classes: Text, Table, Caption, Figs Classifier: Support Vector Machines IBM Journal 1982 Accuracy: 91.73 % at block level

8/24/2015 ICDAR- 2015 19 Addressing Authenticity Veracity Datasets Public Benchmark Published Reputation Authors Lab Journal Experiments Comparative results CODE available!! Citations Only Counts?

8/24/2015 ICDAR- 2015 20 Veracity All citations are not equal Which citation is more trustworthy? Sentiment analysis: Targeted NLP

Veracity Dataset linkages 8/24/2015 ICDAR- 2015 21 MNIST: 60k training, 10k testing images Gradient-based learning applied to document recognition, Lecun et al 1998 (Citations: 3547) Ciresan et al. 2012 Training on automatically augmented dataset: During training the digits are randomly distorted The MCDNN has a very low 0.23% error rate

8/24/2015 ICDAR- 2015 22 OUR NEXT FRONTIER Tables, Graphs Equations Targeted NLP Keyword spotting Multimedia

8/24/2015 ICDAR- 2015 23 Tables Analysis 4Vs

8/24/2015 ICDAR- 2015 24 Graphs Analysis 4Vs

8/24/2015 ICDAR- 2015 25 Equations Analysis Velocity Domain awareness Matrix representation Operators Symbols Document awareness Dependent variables independent variables regression coefficients, error

26 Query Interface Face Recognition FAR (0.0-0.2) vs FRR CVPR Science Advanced Search Options X-Y Plots Tables Nature Figures Original Paper Original Paper Original Paper

8/24/2015 ICDAR- 2015 27 Query: Line graph X Axis Label: AdaBoost iterations Range: 0-5000 Unit: - Y Axis Label: Misclassification Error Range: 0.15-0.30 Unit: - Legend: - Number of lines: 2 Retrieve similar graphs

8/24/2015 ICDAR- 2015 28 Holistic View Linkages SUPERVISED SEMI-SUPERVISED Online cursive handwriting recognition using speech recognition methods;, John Makhoul, Richard Schwartz, and George Chou ICASSP 1994

8/24/2015 ICDAR- 2015 29 Accelerated Discovery +30% DBN +10% Dropout HMM Speech RNN? Handwriting?

8/24/2015 ICDAR- 2015 30 1990-2015 Key Innovations Grand Challenges 2015 +

Handwriting Recognition Key Innovations Lexicons Fusion Retrieval Security

8/24/2015 ICDAR- 2015 32 Summary Grand Challenges 4Vs of Scientific Big Data 4 Rs: References, Reinvention, Replicability, Reputation Grand Opportunities Accelerated Discovery : Supervised linkages, heuristics; Integrate learning channels Key Innovations Handwriting Recognition: Lexicons; Fusion; Retrieval; Security

8/24/2015 ICDAR- 2015 33 Special Thanks to All my students and colleagues especially to colleagues Srirangaraj Setlur and Ifeoma Nwogu

Thank You venu@cubs.buffalo.edu