Role of Automation in the Forensic Examination of Handwritten Items

Similar documents

Document Image Retrieval using Signatures as Queries

Big Data, Machine Learning, Causal Models

Sample Size Designs to Assess Controls

The Basics of Graphical Models

2nd International Workshop on Automated Forensic Handwriting Analysis (AFHA) 2013

The Role of Size Normalization on the Recognition Rate of Handwritten Numerals

Appendices master s degree programme Artificial Intelligence

Master's projects at ITMO University. Daniil Chivilikhin PhD ITMO University

Using Lexical Similarity in Handwritten Word Recognition

What Makes a Good Resource Data Handling Murder Investigation

Statistical Analysis of Signature Features with Respect to Applicability in Off-line Signature Verification

Introduction to Pattern Recognition

SWGFAST. Defining Level Three Detail

UNDERGRADUATE DEGREE PROGRAMME IN COMPUTER SCIENCE ENGINEERING SCHOOL OF COMPUTER SCIENCE ENGINEERING, ALBACETE

Bayesian networks - Time-series models - Apache Spark & Scala

SUN PRAIRIE AREA SCHOOL DISTRICT COURSE POWER STANDARDS. Curriculum Area: Science Course Length: Semester

COMMUNITY UNIT SCHOOL DISTRICT 200. Course Description

What mathematical optimization can, and cannot, do for biologists. Steven Kelk Department of Knowledge Engineering (DKE) Maastricht University, NL

Number of hours in the semester L Ex. Lab. Projects SEMESTER I 1. Economy Philosophy Mathematical Analysis Exam

Handwriting Analysis 2005, 2004, 2002, 1993 by David A. Katz. All rights reserved.

Now we begin our discussion of exploratory data analysis.

Machine Learning Introduction

Georgia Perimeter College Common Course Outline

ANALYZING THE BENEFITS OF USING TABLET PC-BASED FLASH CARDS APPLICATION IN A COLLABORATIVE LEARNING ENVIRONMENT A Preliminary Study

Comparing Artificial Intelligence Systems for Stock Portfolio Selection

TIETS34 Seminar: Data Mining on Biometric identification

Strategic White Paper

Likelihood Approaches for Trial Designs in Early Phase Oncology

Machine Learning.

Online Farsi Handwritten Character Recognition Using Hidden Markov Model

Cursive Handwriting Recognition for Document Archiving

Statistical Models in Data Mining

CRIME SCENE INVESTIGATION THROUGH DNA TRACES USING BAYESIAN NETWORKS

Signature Segmentation from Machine Printed Documents using Conditional Random Field

A Basic Introduction to Missing Data

The Use of Hybrid Regulator in Design of Control Systems

MTH 140 Statistics Videos

Algorithms are the threads that tie together most of the subfields of computer science.

MORPHO CRIMINAL JUSTICE SUITE

2 Signature-Based Retrieval of Scanned Documents Using Conditional Random Fields

Organizing Your Approach to a Data Analysis

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Mining. Practical. Data. Monte F. Hancock, Jr. Chief Scientist, Celestech, Inc. CRC Press. Taylor & Francis Group

Statistical Validation and Data Analytics in ediscovery. Jesse Kornblum

Machine Learning: Overview

Decision Trees and Networks

Bayesian Network Modeling of Hangul Characters for On-line Handwriting Recognition

Viewpoint ediscovery Services

Statistics 3202 Introduction to Statistical Inference for Data Analytics 4-semester-hour course

Forensic Science Curriculum

Likelihood: Frequentist vs Bayesian Reasoning

Certifications and Standards in Academia. Dr. Jane LeClair, Chief Operating Officer National Cybersecurity Institute

SYSM 6304: Risk and Decision Analysis Lecture 5: Methods of Risk Analysis

Online 12 - Sections 9.1 and 9.2-Doug Ensley

Appendices master s degree programme Human Machine Communication

Lecture 9: Introduction to Pattern Analysis

Position Classification Flysheet for Computer Science Series, GS Table of Contents

COURSE DESCRIPTION. Course Number: NM: RISD: 13109A, 13109B. Successful completion of Forensics I (C or better)

Learning is a very general term denoting the way in which agents:

Schedule Risk Analysis Simulator using Beta Distribution

A Regime-Switching Relative Value Arbitrage Rule

The Science of Forensics

DATA MINING IN FINANCE

Bayesian Machine Learning (ML): Modeling And Inference in Big Data. Zhuhua Cai Google, Rice University

A new normalization technique for cursive handwritten words

Measurement and Metrics Fundamentals. SE 350 Software Process & Product Quality

Imputing Values to Missing Data

Part III: Machine Learning. CS 188: Artificial Intelligence. Machine Learning This Set of Slides. Parameter Estimation. Estimation: Smoothing

Form Design Guidelines Part of the Best-Practices Handbook. Form Design Guidelines

An Introduction to Using WinBUGS for Cost-Effectiveness Analyses in Health Economics

Cloud Security Introduction and Overview

COMPUTATIONAL METHODS FOR A MATHEMATICAL THEORY OF EVIDENCE

SAS Certificate Applied Statistics and SAS Programming

Rolle s Theorem. q( x) = 1

Using Artificial Intelligence to Manage Big Data for Litigation

Cool Tools for PROC LOGISTIC

SURVEY REPORT DATA SCIENCE SOCIETY 2014

Algorithmic Trading Session 1 Introduction. Oliver Steinki, CFA, FRM

COMPETENCE ASSURANCE FOR ENFSI FORENSIC PRACTITIONERS

Dan French Founder & CEO, Consider Solutions

Disguised Writings By Steve Cain Board Certified Forensic Document Examiner

Introduction to Markov Chain Monte Carlo

Intro to GIS Winter Data Visualization Part I

Credal Networks for Operational Risk Measurement and Management

Multi-Class and Structured Classification

Transcription:

Role of Automation in the Forensic Examination of Handwritten Items Sargur Srihari Department of Computer Science & Engineering University at Buffalo, The State University of New York NIST, Washington DC, June 3, 2013 1

Plan of Discussion Computational Thinking (CT) What, Why, How, Limitations Reverse Engineering of QDE Automation Tools Individualizing Characteristics Opinion Adequacy Summary and Discussion 2

Computational Thinking (CT) What is it? A way to solve problems, design systems, and understand human behavior Draw on concepts of computer science Why? To flourish in today's world, CT is the way to think and understand the world 3 S. Papert, J. Wing

How is CT done? 1. Abstraction to understand and solve problems more effectively 2. Algorithmic Thinking and Mathematics to develop efficient, fair, and secure solutions 3. Understand scale Efficiency Economic and social reasons 4

CT and Law Long Dream: Logical rules to automate verdict Napoleonic Code (1804) Minimize discretion, maximize predictability of outcome Flounders: vagueness of words and variation of real world Expert system replacements of judiciary Poor record both of success and of uptake Better Inroads: Legal reasoning systems Merely assist in legal decisions E.g., Construct hypotheses for evidence in a crime scene Remind detectives of hypotheses might have missed 5 Mind-expanding avoids pitfalls of mind-narrowing

CT and Forensics CT useful in domains where: Human judgement is involved Knowledge engineering can be performed Starting point for creating artificial intelligence Within forensics: Impression evidence Handwriting, latent prints, footwear marks Handwriting: Success demonstrated in recognition 6

Knowledge Engineering for FDE CAT Principle Comparability Adequacy Time Contemporaneous Characteristics Class Individualizing Seven S s Size, slant, spacing, shading, system, speed, strokes 7

FDE- Exam. of HW Items (ASTM) Determine if Q v Q, K v K, or Q v K For Q and K: Quality (copies?) Distorted (disguised) Type, Range Individualizing characteristics? Comparable? else new K & repeat Differences/similarities for conclusion (5 or 9-pt) Identification, Highly probable same, Probably did, Indications did, No conclusion Indications didn t, Probably didn t, Highly probable didn t, Elimination 8

Word Cloud of ASTM Procedure 9

Pseudo-code for Interactive Forensic Examination (ifox) 10

Tools for Steps in FDE Procedure Quality Distortion Range 1.Individualizing characteristics 2.Comparability (Type) 3.Comparison (Opinion) 4.Adequacy Details Next 11

Individualizing Characteristics? 12

Characteristics of ``th 13

Rarity: measure of Individualization 0.0304 High Probability Low Probability 0.0051 0.00167 14

Statistical Models of Characteristics # Probabilities for full joint distribution= 4,799 No. of Parameters if we assume independence= 19 # Probabilities= 287,999 (cursive) 809,999 (hand-print) 15

What if we assume independence? True Joint Probabilities: Prob (height,weight) P(a,b) b 0 (heavy) b 1 (light) P(a) (height) a 0 (tall) 0.6 0.05 0.65 a 1 (short) 0.05 0.3 0.35 P(b) (weight) 0.65 0.35 P(tall, light) < P(short,light) 0.05<0.3 Short & light six times more probable than tall and light: Correct! Assuming Independence P(a,b) b 0 (heavy) b 1 (light) P(a) (height) a 0 (tall) 0.42 0.23 0.65 a 1 (short) 0.23 0.12 0.35 P(b) (weight) 0.65 0.35 P(tall,light) > P(short,light) 0.23 >0.12 Tall & light, twice probability of short & light: Wrong!

Compromise Solution: PGMs 100 parameters 99 parameters (cursive) 17 77 parameters (hand-print)

Learning PGMs from Data Bayesian Networks (directed graphs) Markov Networks (undirected graphs) Learning algorithms: Determine pairwise independences using chi-squared tests Determine quality of model using log-loss Problem is NP-hard use approximate solutions 18

Type Determination Cursive vs. Handprint f 1 : Discreteness Ratio of isolated character count(icc) to word count (WC) f 2 : Loopiness 19 Ratio of interior to exterior contours

Opinion 20

Adequacy 21

Availability of Automation Tools Interactive tools rather than automation Incorporate CT Abstraction: Aids to organize thought process Algorithms and Mathematics Scalability Potential of Large quantities quickly analyzed Mind Expanding Probability allows considering characteristics otherwise ignored, or discounting identified ones Value of small amounts of information 22

Status of Automation Tools Interactive tools rather than automation Work in Progress Characteristics Data needs to be collected Learning statistical models Learning PGMs is current topic in ML Inference algorithms Type determination Opinion Mapping 23

Summary Computational Thinking + Forensics = Computational Forensics Solve using Abstraction Algorithms Mathematics Scale

Summary Reverse engineering of QDE Available in ASTM standards, other QD literature Steps amenable to automation tools Data Collection Modeling distribution of characteristics Type determination Likelihood Ratios (Opinion) Confidence Intervals (Adequacy)

How does this fit with fully Automated Systems? Systems such as FISH, CEDAR-FOX and FLASH-ID narrow down possibilities Interactive systems (ifox) will assist the document examiner in going the last mile E.g., associate probabilities with their observations 26