Style Characterization of Machine Printed Texts



Similar documents
Opening the psychological black box in genetic counseling

Implementations of tests on the exogeneity of selected. variables and their Performance in practice ACADEMISCH PROEFSCHRIFT

Understanding Crowd Behaviour Simulating Situated Individuals

Item Analysis of Single-Peaked Response Data. The Psychometric Evaluation of Bipolar Measurement Scales

Cover Page. The handle holds various files of this Leiden University dissertation.

Cover Page. The handle holds various files of this Leiden University dissertation.

M.A.Lips. Copyright 2014, M.A.Lips, Amsterdam, the Netherlands

Authenticity and Architecture

Enhancing the wellbeing of older people in Mauritius

Cutaneous B-cell lymphoma: classification, prognostic factors and management recommendations. Nancy J. Senff

New Edge. Technology and Spirituality in the San Francisco Bay Area DORIEN ZANDBERGEN

Implantable Cardioverter Defibrillator Treatment: Benefits and Pitfalls in the Currently Indicated Population. Carel Jan Willem Borleffs

Changes in Pilot Control Behaviour across Stewart Platform Motion Systems

Measuring The Costs Of Chapter 11 Cases

Illustrations: Your Illustrator Cover design: Joke Herstel, Wenk

Cover Page. The handle holds various files of this Leiden University dissertation

Students Conceptions of Constructivist Learning. Sofie Loyens

Shop floor design: layout, investments, cross-training, and labor allocation. Jos Bokhorst

Making the title page for your dissertation a practical guide

Small artery tone under control of the endothelium

Economics of Insurance against Climate Change. W.J. Wouter Botzen

PhD Regulations. enacted by order of the Doctorate Board, February 14, 2014

Why participation works

ELECTRODYNAMICS OF SUPERCONDUCTING CABLES IN ACCELERATOR MAGNETS

The Political Economy of Trade Liberalization in Developing Countries: The Sri Lankan Case

Parkinson s Disease: Deep Brain Stimulation and FDOPA-PET

A COLLECTION OF PROBLEMS IN CREDIT RISK MODELING. Konrad Banachewicz

Nyenrode PhD and EDP Rules and Regulations

EX-POST LIABILITY RULES IN MODERN PATENT LAW. Ex-post aansprakelijkheidsregels in een modern octrooirecht

High Performance Human Resource Management in Ireland and the Netherlands: Adoption and Effectiveness

SAFE BUT SORRY. Theory, assesment, and treatment of pain-related fear in chronic low back pain patients

Tasks, hierarchies, and flexibility

voorwerk :03 Pagina 1

LEERSTIJLEN EN STUREN VAN LEERPROCESSEN IN HET HOGER ONDERWIJS. Naar procesgerichte instructie in zelfstandig denken

Determining the Cross-Channel Effects of Informational Web Sites. Marije L. Teerling

De groepsleider als evenwichtskunstenaar

DETAINED ABROAD. Assisting Dutch nationals in foreign detention. Femke Hofstee-van der Meulen

Cognitive Models for Training Simulations. Annerieke Heuvelink

Analysis and Transformation of Source Code by Parsing and Rewriting

How To Design A System For A Task

Workflow Support for the Healthcare Domain

How To Design A Network Management System

Water Use of Forests in the Netherlands

Techniques for Understanding Legacy Software Systems

Websites for children: Search strategies and interface design Three studies on children s search performance and evaluation. Hanna Jochmann-Mannak

Software Evolution Visualization

Perceived Health Status after Kidney Transplantation. Jaroslav Rosenberger

XRPC. Efficient Distributed Query Processing on Heterogeneous XQuery Engines. Zhang Ying

Minimal Models in Semantics and Pragmatics

Configuration management for models: Generic methods for model comparison and model co-evolution

Modeling of Decision Making Processes in Supply Chain Planning Software. a theoretical and empirical assessment of i2 TradeMatrix

Application of Wave Field Synthesis in Videoconferencing

social medical care homelessness before and during in amsterdam Igor van Laere

Managing Team Performance:

Event-driven control in theory and practice

Motor Learning in ACL Injury Prevention. Anne Benjaminse

The influence of induction programs on beginning teachers well-being and professional development

Ad den Otter. Team Communication

Theses Books Layout Manual

Mirjam A.A. Huis in t Veld

RENSKE HOEFMAN THE IMPACT OF CAREGIVING

TOP. to the. Rescue. Bas Lijnse. Task-Oriented Programming For Incident Response Applications

Explaining medical practice variation Social organization and institutional mechanisms

Lean Leadership Health Care: enhancing peri-operative processes in a hospital

Service Inventory Management. solution techniques for inventory systems without backorders

Instructional Design and Media Selection. Eisa H.R. Hasan. Also available in print:

Transcription:

Style Characterization of Machine Printed Texts Andrew D. Bagdanov

This book is typeset by the author using L A TEX2 ε. The main body of the text is set using the Computer Modern family of fonts. The images and figures are included in the text in encapsulated Postscript format TM Adobe Systems Incorporated. Printing: Febodruk BV, Enschede, The Netherlands. Copyright c 2004 by Andrew D. Bagdanov. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without written permission from the author. ISBN 90-5776-122-X

Style Characterization of Machine Printed Texts ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Amsterdam, op gezag van de Rector Magnificus prof. mr. P.F. van der Heijden ten overstaan van een door het College voor Promoties ingestelde commissie, in het openbaar te verdedigen in de Aula der Universiteit op woensdag 26 mei 2004 te 10.00 uur door Andrew David Bagdanov geboren te Burbank, Californië, Verenigde Staten van Amerika

Promotiecommissie: Promotor: Co-promotor: Prof. dr. ir. A.W.M. Smeulders dr. M. Worring Overige leden: Prof. dr. P. van Emde Boas Prof. dr. ir. F.C.A. Groen Prof. dr. G. Nagy Prof. dr. ir. R.J.H. Scha dr. T. Gevers dr. R. Hamberg dr. ir. H.J.A.M. Heijmans Faculteit: Faculteit der Natuurwetenschappen, Wiskunde & Informatica Kruislaan 403 1098 SJ Amsterdam Nederland The work described in this thesis was supported by the ICES-KIS MIA-project and Océ Nederland. Advanced School for Computing and Imaging The work described in this thesis has been carried out within graduate school ASCI, at the Intelligent Sensory Information Systems group of the University of Amsterdam. ASCI dissertation series number 102.

This dissertation is dedicated to my mother and the memory of my father.

Contents 0 Prelude 1 1 Introduction 3 1.1 Elementsofstyle... 3 1.2 Contextandscope... 4 1.2.1 Context... 4 1.2.2 Scope... 7 1.3 Organization of this thesis...... 11 2 Characterizing layout style using first order Gaussian graphs 13 2.1 Definitions and basic concepts.... 14 2.1.1 First order Gaussian graphs... 14 2.1.2 Technicalities... 17 2.1.3 Reflections... 20 2.2 Clustering and classification..... 21 2.2.1 Hierarchical clustering of FOGGs... 21 2.2.2 Classification using first order Gaussian graphs... 23 2.3 Experiments... 25 2.3.1 Test data... 25 2.3.2 Classifiers... 26 2.3.3 Experimental results...... 27 2.3.4 Computational efficiency.... 28 2.3.5 Analysis... 29 2.4 Discussion... 31 3 Multi-scale visual style characterization with rectangular granulometries 35 3.1 Documentgenre... 37 3.2 Granulometries... 39 3.3 Document representation... 41 3.3.1 Rectangular size distributions... 42 3.3.2 Efficiency... 43 3.3.3 Feature space reduction and interpretation... 44 3.4 Experimentalresults... 46 3.4.1 Genre classification... 47 3.4.2 Document image retrieval... 47 i

ii CONTENTS 3.5 Discussion... 49 4 Probing textual style with local vertical granulometries 51 4.1 Another look at granulometries.... 52 4.1.1 The key observation...... 52 4.1.2 Introducing localization.... 54 4.2 An efficient word spotter... 56 4.3 AGenerative model... 59 4.3.1 Glyph distributions...... 59 4.3.2 Agenerative word model... 61 4.4 Experiments and illustration..... 62 4.4.1 Typeface classification..... 63 4.4.2 Word spotting... 65 4.5 Discussion... 69 5Autocorrelation-driven restoration of scanned color halftones 71 5.1 Halftone process color... 72 5.1.1 Halftone color reproduction... 72 5.1.2 Scanned color halftones... 73 5.2 Diffusion of scanned color halftones... 76 5.2.1 Linear diffusion filtering... 77 5.2.2 Nonlinear diffusion filtering... 77 5.3 Measuring local autocorrelation... 79 5.4 Experiments... 82 5.5 Discussion... 85 6 A functional approach to software design in image processing research environments 89 6.1 Introduction... 89 6.2 Acritique of pure reason... 90 6.2.1 Analysis... 92 6.3 Design considerations... 94 6.3.1 Goals... 94 6.3.2 Choice of language... 94 6.3.3 Previous work... 96 6.4 Architecture... 97 6.5 Primitive types and operations.... 97 6.5.1 Types and typing... 98 6.5.2 Primitive image operations...102 6.6 Backend substitution...107 6.7 Casestudies...112 6.7.1 Linear scalespace...113 6.7.2 Complete lattice morphology...117 6.7.3 An algebraic expression compiler...122 6.8 Discussion...128

CONTENTS iii 7 Summary and concluding remarks 131 7.1 Summary...131 7.2 Concludingremarks...133 Bibliography 135 Samenvatting 143 Acknowledgements 145