Next Generation Artificial Vision Systems Reverse Engineering the Human Visual System Anil Bharath Maria Petrou Imperial College London ARTECH H O U S E BOSTON LONDON artechhouse.com
Contents Preface xiii CHAPTER 1 The Human Visual System: An Engineering Challenge 1 1.1 Introduction 1 1.2 Overview of the Human Visual System 2 1.2.1 The Human Eye 3 1.2.1.1 Issues to Be Investigated 8 1.2.2 Lateral Geniculate Nucleus (LGN) 10 1.2.3 The VI Region of the Visual Cortex 12 1.2.3.1 Issues to Be Investigated 14 1.2.4 Motion Analysis and V5 15 1.2.4.1 Issues to Be Investigated 15 1.3 Conclusions 15 References 17 The Physiology and Psychology of Vision 19 CHAPTER 2 Retinal Physiology and Neuronal Modeling 21 2.1 Introduction 21 2.2 Retinal Anatomy 21 2.3 Retinal Physiology 25 2.4 Mathematical Modeling Single Cells of the Retina 27 2.5 Mathematical Modeling The Retina and Its Functions 28 2.6 A Flexible, Dynamical Model of Retinal Function 30 2.6.1 Foveal Structure 31 2.6.2 Differential Equations 32 2.6.3 Color Mechanisms 34 2.6.4 Foveal Image Representation 36 2.6.5 Modeling Retinal Motion 37 2.7 Numerical Simulation Examples 38 2.7.1 Parameters and Visual Stimuli 38 2.7.2 Temporal Characteristics 39 2.7.3 Spatial Characteristics 41 2.7.4 Color Characteristics 43 2.8 Conclusions 45 References 46 v
VI Contents CHAPTER 3 Ä Review of VI 3.1 Introduction 3.2 Two Aspects of Organization and Functions in VI 3.2.1 Single-Neuron Responses 3.2.2 Organization of Individual Cells in VI 3.2.2.1 Orientation Selectivity 3.2.2.2 Color Selectivity 3.2.2.3 Scale Selectivity 3.2.2.4 Phase Selectivity 3.3 Computational Understanding of the Feed Forward VI 3.3.1 VI Cell Interactions and Global Computation 3.3.2 Theory and Model of Intracortical Interactions in VI 3.4 Conclusions References CHAPTER 4 Testing the Hypothesis That VI Creates a Bottom-Up Saliency Map 4.1 4.2 4.3 Introduction Materials and Methods Results 4.3.1 Interference by Task-Irrelevant Features 4.3.2 The Color-Orientation Asymmetry in Interference 4.3.3 Advantage for Color-Orientation Double Feature but Not Orientation-Orientation Double Feature 4.3.4 Emergent Grouping of Orientation Features by Spatial Configurations 4.4 Discussion 4.5 Conclusions References 51 51 52 52 53 55 56 57 58 58 59 61 62 63 69 69 73 75 76 81 84 87 92 98 99 The Mathematics of Vision 103 CHAPTER 5 VI Wavelet Models and Visual Inference 5.1 Introduction 5.1.1 Wavelets 5.1.2 Wavelets in Image Analysis and Vision 5.1.3 Wavelet Choices 5.1.4 Linear vs Nonlinear Mappings 5.2 A Polar Separable Complex Wavelet Design 105 105 105 107 107 112 113
Contents VII 5.2.1 Design Overview 113 5.2.2 Filter Designs: Radial Frequency 114 5.2.3 Angular Frequency Response 116 5.2.4 Filter Kernels 118 5.2.5 Steering and Orientation Estimation 119 5.3 The Use of Vl-Like Wavelet Models in Computer Vision 120 5.3.1 Overview 120 5.3.2 Generating Orientation Maps 121 5.3.3 Corner Likelihood Response 123 5.3.4 Phase Estimation 123 5.4 Inference from Vl-Like Representations 124 5.4.1 Vector Image Fields 125 5.4.2 Formulation of Detection 126 5.4.3 Samplingof (B,X) 127 5.4.4 The Notion of "Expected" Vector Fields 128 5.4.5 An Analytic Example: Uniform Intensity Circle 129 5.4.6 Vector Model Plausibility and Extension 129 5.4.7 Vector Fields: A Variable Contrast Model 130 5.4.8 Plausibility by Demonstration 131 5.4.9 Plausibility from Real Image Data 132 5.4.10 Divisive Normalization 133 5.5 Evaluating Shape Detection Algorithms 135 5.5.1 Circle-and-Square Discrimination Test 135 5.6 Grouping Phase-Invariant Feature Maps 138 5.6.1 Keypoint Detection Using DTCWT 138 5.7 Summary and Conclusions 140 References 141 CHAPTER 6 Beyond the Representation of Images by Rectangular Grids 145 6.1 Introduction 145 6.2 Linear Image Processing 145 6.2.1 Interpolation of Irregularly Sampled Data 146 6.2.1.1 Kriging 146 6.2.1.2 Iterative Error Correction 151 6.2.1.3 Normalized Convolution 153 6.2.2 DFT from Irregularly Sampled Data 156 6.3 Nonlinear Image Processing 157 6.3.1 Vl-Inspired Edge Detection 158 6.3.2 Beyond the Conventional Data Representations and Object Descriptors 162 6.3.2.1 The Trace Transform 162 6.3.2.2 Features from the Trace Transform 165
VIII Contents 6.4 Reverse Engineering Some Aspect of the Human Visual System 167 6.5 Conclusions 168 References 169 CHAPTER 7 Reverse Engineering of Human Vision: Hyperacuity and Super-Resolution 171 7.1 Introduction 171 7.2 Hyperacuity and Super-Resolution 172 7.3 Super-Resolution Image Reconstruction Methods 173 7.3.1 Constrained Least Squares Approach 174 7.3.2 Projection onto Convex Sets 177 7.3.3 Maximum A Posteriori Formulation 180 7.3.4 Markov Random Field Prior 180 7.3.5 Comparison of the Super-Resolution Methods 183 7.3.6 Image Registration 183 7.4 Applications of Super-Resolution 184 7.4.1 Application in Minimally Invasive Surgery 184 7.4.2 Other Applications 187 7.5 Conclusions and Further Challenges 188 References 188 CHAPTER 8 Eye Tracking and Depth from Vergence 191 8.1 Introduction 191 8.2 Eye-Tracking Techniques 192 8.3 Applications of Eye Tracking 195 8.3.1 Psychology/Psychiatry and Cognitive Sciences 195 8.3.2 Behavior Analysis 196 8.3.3 Medicine 197 8.3.4 Human-Computer Interaction 199 8.4 Gaze-Contingent Control for Robotic Surgery 200 8.4.1 Ocular Vergence for Depth Recovery 202 8.4.2 Binocular Eye-Tracking Calibration 204 8.4.3 Depth Recovery and Motion Stabilization 206 8.5 Discussion and Conclusions 209 References 210 CHAPTER 9 Motion Detection and Tracking by Mimicking Neurological Dorsal/Ventral Pathways 21 7 9.1 Introduction 217 9.2 Motion Processing in the Human Visual System 218 9.3 Motion Detection 219
Contents IX 9.3.1 Temporal Edge Detection 221 9.3.2 Wavelet Decomposition 224 9.3.3 The Spatiotemporal Haar Wavelet 225 9.3.4 Computational Cost 230 9.4 Dual-Channel Tracking Paradigm 230 9.4.1 Appearance Model 231 9.4.2 Early Approaches to Prediction 232 9.4.3 Tracking by Blob Sorting 233 9.5 Behavior Recognition and Understanding 237 9.6 A Theory of Tracking 239 9.7 Concluding Remarks 241 References 242 UESUÜUJJ Hardware Technologies for Vision 249 CHAPTER 10 Organic and Inorganic Semiconductor Photoreceptors Mimicking the Human Rods and Cones 251 10.1 Introduction 251 10.2 Phototransduction in the Human Eye 253 10.2.1 The Physiology of the Eye 253 10.2.2 Phototransduction Cascade 255 10.2.2.1 Light Activation of the Cascade 257 10.2.2.2 Deactivation of the Cascade 258 10.2.3 Light Adaptation of Photoreceptors: Weber-Fechner's Law 258 10.2.4 Some Engineering Aspects of Photoreceptor Cells 259 10.3 Phototransduction in Silicon 260 10.3.1 CCD Photodetector Arrays 262 10.3.2 CMOS Photodetector Arrays 263 10.3.3 Color Filtering 265 10.3.4 Scaling Considerations 268 10.4 Phototransduction with Organic Semiconductor Devices 269 10.4.1 Principles of Organic Semiconductors 270 10.4.2 Organic Photodetection 271 10.4.3 Organic Photodiode Structure 273 10.4.4 Organic Photodiode Electronic Characteristics 274 10.4.4.1 Photocurrent and Efficiency 274 10.4.4.2 The Equivalent Circuit and Shunt Resistance 277 10.4.4.3 Spectral Response Characteristics 281 10.4.5 Fabrication 281 10.4.5.1 Contact Printing 282 10.4.5.2 Printing on CMOS 284 10.5 Conclusions 285 References 286
X Contents CHAPTER 11 Analog Retinomorphic Circuitry to Perform Retinal and Retinal-Inspired Processing 289 11.1 Introduction 289 11.2 Principles of Analog Processing 290 11.2.1 The Metal Oxide Semiconductor Field Effect Transistor 292 11.2.1.1 Transistor Operation 293 11.2.1.2 nmos and pmos Devices 293 11.2.1.3 Transconductance Characteristics 293 11.2.1.4 Inversion Characteristics 294 11.2.1.5 MOSFET Weak Inversion and Biological Gap Junctions 295 11.2.2 Analog vs Digital Methodologies 296 11.3 Photo Electric Transduction 296 11.3.1 Logarithmic Sensors 297 11.3.2 Feedback Buffers 298 11.3.3 Integration-Based Photodetection Circuits 298 11.3.4 Photocurrent Current-Mode Readout 300 11.4 Retinimorphic Circuit Processing 300 11.4.1 Voltage Mode Resistive Networks 301 11.4.1.1 Limitations with This Approach 303 11.4.2 Current Mode Approaches to Receptive Field Convolution 303 11.4.2.1 Improved Horizontal Cell Circuitry 305 11.4.2.2 Novel Bipolar Circuitry 305 11.4.2.3 Bidirectional Current Mode Processing 306 11.4.2.4 Dealing with Multiple High Impedance Processing Channels 307 11.4.2.5 The Current Comparator 310 11.4.3 Reconfigurable Fields 312 11.4.4 Intelligent Ganglion Cells 314 11.4.4.1 ON-OFF Ganglion Cells 315 11.4.4.2 Pulse Width Encoding 316 11.5 Address Event Representation 317 11.5.1 The Arbitration Tree 318 11.5.2 Collisions 322 11.5.3 Sparse Coding 322 11.5.4 Collision Reduction 323 11.6 Adaptive Foveation 324 11.6.1 System Algorithm 325 11.6.2 Circuit Implementation 326 11.6.3 The Future 329 11.7 Conclusions 330 References 330
Contents XI CHAPTER 12 Analog VI Platforms 335 12.1 Analog Processing: Obsolete? 335 12.2 The Cellular Neural Network 340 12.3 The Linear CNN 340 12.4 CNNs and Mixed Domain Spatiotemporal Transfer Functions 342 12.5 Networks with Temporal Derivative Diffusion 345 12.5.1 Stability 348 12.6 A Signal Flow Graph-Based Implementation 349 12.6.1 Continuous Time Signal Flow Graphs 349 12.6.2 On SFG Relations with the MLCNN 352 12.7 Examples 355 12.7.1 A Spatiotemporal Cone Filter 355 12.7.2 Visual Cortical Receptive Field Modelling 360 12.8 Modeling of Complex Cell Receptive Fields 362 12.9 Summary and Conclusions 363 References 364 CHAPTER 13 From Algorithms to Hardware Implementation 367 13.1 Introduction 367 13.2 Field Programmable Gate Arrays 367 13.2.1 Circuit Design 369 13.2.2 Design Process 369 13.3 Mapping Two-Dimensional Filters onto FPGAs 369 13.4 Implementation of Complex Wavelet Pyramid on FPGA 370 13.4.1 FPGA Design 370 13.4.2 Host Control 373 13.4.3 Implementation Analysis 374 13.4.4 Performance Analysis 375 13.4.4.1 Corner Detection 377 13.4.5 Conclusions 377 13.5 Hardware Implementation of the Trace Transform 377 13.5.1 Introduction to the Trace Transform 377 13.5.2 Computational Complexity 381 13.5.3 Füll Trace Transform System 382 13.5.3.1 Acceleration Methods 382 13.5.3.2 Target Board 383 13.5.3.3 System Overview 383 13.5.3.4 Top-Level Control 384 13.5.3.5 Rotation Block 384 13.5.3.6 Functional Blocks 386 13.5.3.7 Initialization 386 13.5.4 Flexible Functionals for Exploration 387 13.5.4.1 Type A Functional Block 388
XII Contents 13.5.4.2 Type B Functional Block 388 13.5.4.3 Type C Functional Block 389 13.5.5 Functional Coverage 389 13.5.6 Performance and Area Results 389 13.5.7 Conclusions 391 13.6 Summary 391 References 392 CHAPTER 14 Real-Time Spatiotemporal Saliency 395 14.1 Introduction 395 14.2 The Framework Overview 396 14.3 Realization of the Framework 398 14.3.1 Two-Dimensional Feature Detection 398 14.3.2 Feature Tracker 399 14.3.3 Prediction 404 14.3.4 Distribution Distance 406 14.3.5 Suppression 410 14.4 Performance Evaluation 411 14.4.1 Adaptive Saliency Responses 411 14.4.2 Complex Scene Saliency Analysis 412 14.5 Conclusions 413 References 413 Acronyms and Abbreviations 415 About the Editors 419 List of Contributors 420 Index 423