16-bit Microprocessor for Surveillance System

Similar documents
Tracking Groups of Pedestrians in Video Sequences

Efficient Background Subtraction and Shadow Removal Technique for Multiple Human object Tracking

REAL TIME TRAFFIC LIGHT CONTROL USING IMAGE PROCESSING

A ROBUST BACKGROUND REMOVAL ALGORTIHMS

Building an Advanced Invariant Real-Time Human Tracking System

A Computer Vision System on a Chip: a case study from the automotive domain

A System for Capturing High Resolution Images

Traffic Monitoring Systems. Technology and sensors

Speed Performance Improvement of Vehicle Blob Tracking System

How To Fix Out Of Focus And Blur Images With A Dynamic Template Matching Algorithm


A General Framework for Tracking Objects in a Multi-Camera Environment

Navigation Aid And Label Reading With Voice Communication For Visually Impaired People

Distributed Vision Processing in Smart Camera Networks

A 10,000 Frames/s 0.18 µm CMOS Digital Pixel Sensor with Pixel-Level Memory

Automatic Traffic Estimation Using Image Processing

what operations can it perform? how does it perform them? on what kind of data? where are instructions and data stored?

Implementing an In-Service, Non- Intrusive Measurement Device in Telecommunication Networks Using the TMS320C31

IP Video Rendering Basics

Mouse Control using a Web Camera based on Colour Detection

Real-Time Tracking of Pedestrians and Vehicles

İSTANBUL AYDIN UNIVERSITY

Demo: Real-time Tracking of Round Object

AN EVALUATION OF MODEL-BASED SOFTWARE SYNTHESIS FROM SIMULINK MODELS FOR EMBEDDED VIDEO APPLICATIONS

High-speed image processing algorithms using MMX hardware

Indoor Surveillance System Using Android Platform

7a. System-on-chip design and prototyping platforms

System Architecture of the System. Input Real time Video. Background Subtraction. Moving Object Detection. Human tracking.

Specification and Design of a Video Phone System

Development of a high-resolution, high-speed vision system using CMOS image sensor technology enhanced by intelligent pixel selection technique

Multiplexing. Multiplexing is the set of techniques that allows the simultaneous transmission of multiple signals across a single physical medium.

Tracking and Recognition in Sports Videos

How To Design An Image Processing System On A Chip

Understanding Video Latency What is video latency and why do we care about it?

Locating and Decoding EAN-13 Barcodes from Images Captured by Digital Cameras

Control 2004, University of Bath, UK, September 2004

Fall detection in the elderly by head tracking

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering. (ISO 3297: 2007 Certified Organization)

ADVANCED PROCESSOR ARCHITECTURES AND MEMORY ORGANISATION Lesson-17: Memory organisation, and types of memory

The Advantages of Using a Fixed Stereo Vision sensor

MICROPROCESSOR AND MICROCOMPUTER BASICS

Development of a New Tracking System based on CMOS Vision Processor Hardware: Phase I

Programming Logic controllers

A Computer Vision System for Monitoring Production of Fast Food

Management Challenge. Managing Hardware Assets. Central Processing Unit. What is a Computer System?

A secure face tracking system

Tracking People in D. Ismail Haritaoglu, David Harwood and Larry S. Davis. University of Maryland. College Park, MD 20742, USA

System-Level Display Power Reduction Technologies for Portable Computing and Communications Devices

LEVERAGING FPGA AND CPLD DIGITAL LOGIC TO IMPLEMENT ANALOG TO DIGITAL CONVERTERS

Video Conferencing Unit. by Murat Tasan

Logical Operations. Control Unit. Contents. Arithmetic Operations. Objectives. The Central Processing Unit: Arithmetic / Logic Unit.

Chapter 2 Logic Gates and Introduction to Computer Architecture

SNC-VL10P Video Network Camera

Using MATLAB to Measure the Diameter of an Object within an Image

Note monitors controlled by analog signals CRT monitors are controlled by analog voltage. i. e. the level of analog signal delivered through the

MACHINE VISION MNEMONICS, INC. 102 Gaither Drive, Suite 4 Mount Laurel, NJ USA

Real Time Target Tracking with Pan Tilt Zoom Camera

Video compression: Performance of available codec software

How To Use Trackeye

Towards License Plate Recognition: Comparying Moving Objects Segmentation Approaches

Real-time processing the basis for PC Control

Advanced Computer Architecture-CS501. Computer Systems Design and Architecture 2.1, 2.2, 3.2

From Concept to Production in Secure Voice Communications

A+ Guide to Managing and Maintaining Your PC, 7e. Chapter 1 Introducing Hardware

2.0 Command and Data Handling Subsystem

Journal of Industrial Engineering Research. Adaptive sequence of Key Pose Detection for Human Action Recognition


Video Surveillance System for Security Applications

Real-time Wide Area Tracking: Hardware and Software Infrastructure

Wavelet-Based Printed Circuit Board Inspection System

Implementation of OCR Based on Template Matching and Integrating it in Android Application

Automatic Detection of PCB Defects

Mean-Shift Tracking with Random Sampling

Vision based Vehicle Tracking using a high angle camera

Tracking And Object Classification For Automated Surveillance

Below is a diagram explaining the data packet and the timing related to the mouse clock while receiving a byte from the PS-2 mouse:

Wireless Security Camera

Solomon Systech Image Processor for Car Entertainment Application

Handwritten Character Recognition from Bank Cheque

Analecta Vol. 8, No. 2 ISSN

Application of Virtual Instrumentation for Sensor Network Monitoring

High-Speed Thin Client Technology for Mobile Environment: Mobile RVEC

Development of Integrated Management System based on Mobile and Cloud Service for Preventing Various Hazards

Vision based approach to human fall detection

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Development of a Service Robot System for a Remote Child Monitoring Platform

7 MEGAPIXEL 180 DEGREE IP VIDEO CAMERA

An Energy-Based Vehicle Tracking System using Principal Component Analysis and Unsupervised ART Network

VEHICLE LOCALISATION AND CLASSIFICATION IN URBAN CCTV STREAMS

A Survey of Video Processing with Field Programmable Gate Arrays (FGPA)

Product Characteristics Page 2. Management & Administration Page 2. Real-Time Detections & Alerts Page 4. Video Search Page 6

Real-time Process Network Sonar Beamformer

Model-Based 3D Human Motion Capture Using Global-Local Particle Swarm Optimizations

Implementation of Canny Edge Detector of color images on CELL/B.E. Architecture.

Parallelization of video compressing with FFmpeg and OpenMP in supercomputing environment

LEGO NXT-based Robotic Arm

Development of Integrated Management System based on Mobile and Cloud service for preventing various dangerous situations

Basler. Line Scan Cameras

ASSEMBLY PROGRAMMING ON A VIRTUAL COMPUTER

Computer Performance. Topic 3. Contents. Prerequisite knowledge Before studying this topic you should be able to:

Transcription:

Designing an Embedded Video Processing Camera Using a 16-bit Microprocessor for Surveillance System Koichi Sato *, Brian L. Evans * and J. K. Aggarwal * Computer and Vision Research Center Embedded Signal Processing Laboratory * Department of Electrical and Computer Engineering The University of Texas at Austin, Austin, TX 78712, USA { koh@mail.utexsas.edu, bevans@ece.utexas.edu, aggarwaljk@mail.utexas.edu } Abstract This paper describes the design and implementation of a hybrid intelligent surveillance system consisting of an embedded system and a personal computer (PC)-based system. The embedded system performs some of the image processing tasks and sends the processed data to a PC. The PC tracks persons and recognizes two-person interactions by using a grayscale side-view image sequence captured by a stationary camera. Based on our previous research, we explored the optimum division of tasks between the embedded system and the PC, simulated the embedded system using dataflow models in Ptolemy, and prototyped the embedded systems in real-time hardware and software using a 16-bit CISC microprocessor. This embedded system processes one frame image in 89 ms, which is within three frame-cycle periods for a 30Hz video system. In addition, the real-time embedded system prototype uses 5.7K bytes of program memory, 854K bytes of internal data memory and 2M bytes external DRAM. 1. Introduction Tracking, recognizing and detecting objects using a video sequence are topics of significant interest in computer vision. Cai and Aggarwal [1] reported a variety of methods for analyzing human motion. In [2-3], Haritaoglu, Harwood and Davis tracked both single and multiple humans in outdoor image sequences using a PC. In [4], Nuria, Rosario and Pentland recognized two-person interactions from perspective-view image sequences. They tracked persons and recognized their interaction using trajectory patterns. Meanwhile, some researchers developed applications that included one or more embedded systems. In [5], Pentland proposed a wearable device that sees people using an image sensor and understands the environment. In such equipment, a computer can act or respond appropriately without detailed instructions from humans. In [6], Mahonen proposed a wireless intelligent surveillance camera system that consists of a digital camera, a general-purpose processor or

DSP for image processing and a wireless radio modem. In [7], Shirai et al. introduced a real-time surveillance system with parallel DSP processors (TI TMS320C40). Their system consists of several boards connected in series. It can compute optical flow computation in floating point faster than 30 frames per second. In this system, a DSP is located between the two memories. The DSP computes and transfers the image data from the video memory to the other memory, which is connected to the next processing stage. In [8], we proposed a surveillance system that recognizes interactive human activities using a side-view image sequence. The system consists of a single monochrome video camera, a video-capturing device and a PC. The system is located along a sidewalk and it thus captures human movement from a side-view. Persons in the sequences move horizontally and the pattern of their motion is used to determine the interaction. Our surveillance system made use of outdoor, side-view image sequences. The external conditions for our system create potential difficulties. First, outdoor image sequences tend to contain shadows and moving trees and leaves, which makes segmentation more difficult. Second, side-view image sequences are more prone to occlusion, than are top-view or perspective images. Our strategy to overcome these problems was to use the temporal spatio-velocity (TSV) transform [8,9]. The TSV transform estimates pixel velocities from a binary image sequence. The segmentation based on the TSV-transformed images is more stable because it uses both spatial and velocity information. In the practical implementation of the system, we use several cameras to avoid the problem of the limited field of view inherent in a single-camera system. This in turn causes a problem: more PCs are required because one PC can only achieve real-time performance when processing video data from one camera at a 30 Hz frame rate. To overcome this problem, we propose an embedded system for each camera that performs the fundamental image processing tasks (that is, the human segmentation processing) and sends the output data to a PC. This system is potentially able to connect up to 12 embedded cameras simultaneously in 10 frame/s. In this paper, Section 2 describes the algorithms used in our system and discusses a division of tasks for the embedded systems. Section 3 models and simulates the embedded system using dataflow models in Ptolemy [10]. Section 4 discusses the hardware design. Section 5 draws conclusions. 2. Assignment of tasks to the embedded system In designing the surveillance system, we considered the optimum division of tasks between the embedded system and the PC-based system to take full advantage the unique characteristics of each. For example, a statically schedulable, fine granularity process such as an image differencing operation is more effectively performed on an embedded processor, whereas a database operation is more effectively performed by a PC-based system. The computational complexity of each operation and the transfer rate and overall suitability of each processor

were evaluated. In the discussion below, we first describe the algorithms, followed by a discussion and evaluation of the way we divide the tasks. As a measure of the computational complexity, we use the total number of the operations such as additions, subtractions, multiplications and comparisons. To determine suitability, we consider the code size, computational complexity within a loop and the number of branches in the process flow. Figure 1 presents on overview of the system. In the figure, letters A-I represent process stages and numbers 1-9 denote data transfer stages. Video Sequence Crop 40x320 1 Background 2 Vertical 3 TSV 4 Subtraction Projection Transform Binarization 5 A B C D E Interaction Type 9 Interaction Classification 8 Blob Association 7 Blob Feature Extraction 6 Labeling I H G F Crop (A) and Background Subtraction (B) Figure 1. Overview of single camera system In order to extract persons, we crop an S H xs V image into an S H xr V region. We set that region at the level of an average human torso so that all humans are extracted by this operation. As a result, noise outside the region is eliminated. Then we perform a simple background subtraction and binarization to segment the foreground region using a threshold Th, 1 I( x, y) B( x, y) > Th S( x, y) = 0 otherwise. (1) To avoid missing human blobs that have intensities similar to the background, we use a low threshold, thereby accepting some noise. S H xr V x2 is chosen as an estimate of the computational complexity per frame, while S H xr V [bits/frame] is an estimate of the data transfer rate. Vertical projection (C) In order to reduce the noise, we perform a vertical projection over the entire image (S H xr V size) and re-binarize the projection value by a threshold T H. b 1 > = S( x, y) TH H ( x) (2) y= a 0 otherwise

where T H is a threshold that is constant in all situations and H(x) is the object extracted binary image. As a result, we get a sequence of one-dimensional binary images. For this process, S H x(r V +1)x2 is chosen as an estimate of the computational complexity per frame, while S H [bits/frame] is an estimate of the data transfer rate. The TSV Transform (D) and Binarization (E) The TSV transform, which was first proposed in [8,9], estimates the pixel velocity from binary image sequences. Here, a one-dimensional binary image sequence H n (x) is converted into an S H xv V size TSV image V n (x,v). S H is the horizontal size of the original image and the vertical size V V is determined by the resolution and range of the velocities. λ λ V ( x, v) = e V 1( x v, v) + (1 e ) H ( x) (3) n n For the TSV transform, we compute two multiplications and one addition for each TSV point. Thus, we have S H xv V x3 computations per frame. The transfer rate is S H xv V x8 [bits/frame]. n Shift One-dimensional binary image e -λ 1-e -λ + One-dimensional binary image is stretched vertically to be the same size as TSV image TSV Image TSV image in the previous frame is shifted parallelogram Figure 2. Computing the TSV Transform To group the pixels with similar velocity, we binarize the TSV image by a fixed threshold Th v. ~ 1 if Vn ( x, v) Th V (, ) = V n x v 0 otherwise (4) For binarization, we compute one comparison at each TSV point. This translates into S H xv V computations per frame and results in a data transfer rate of S H xv V bits/frame. Labeling(F), Feature Extraction(G), Blob Association(H) and Interaction Classification(I) Once V n (x,v) is binarized, labeling over V n (x,v) yields human blobs (stage F). Features of the human blobs (stage E) are used to associate the human blobs over frames (stage H). Finally, the system determines the interaction using the human trajectories (stage I). The computation time for each process depends on the content of the images. We have assumed a reasonable number for each process. For labeling, we performed two comparisons at each point. For feature extraction, we assumed 20 blobs are labeled in the previous process, and we performed 300 computations for each blob.

Evaluating Processes by Data Transfer Rate and Computational Complexity Figure 3 is a graph depicting the variation of the data transfer rate (left axis) and the computational complexity (right axis) as functions of the various processes involved in our system. Table 1 summarizes the suitability of the various stages for implementation on the embedded system. Transfer Rate and Computation 3500 1400 3000 1152 3072 3072 1200 Transfer Rate (kbps) 2500 2000 1500 1000 500 0 2304 384 288 288 158.4 384 288 180 9.6 B C D E F G I 1 2 3 4 5 6 7 8 9 Process 768 900 1230 1000 800 600 400 200 0 Computational complexity (ktimes/sec) Transfer Rate (kb/s) Computation (ktimes/s) Figure 3. Data transfer rate and computational complexity Step Code size Iterative Number of Suitability computation branches A Crop Small Simple Small Suitable B Subtraction Small Simple Small Suitable C V Projection Small Simple Small Suitable D TSV Small Simple Small Suitable E Binarization Small Simple Small Suitable F Labeling Small Medium Medium Fair G Feature Ext. Medium Medium Small Fair H Association Large Complex Large Unsuitable I Classification Large Complex Large Unsuitable Table 1. Suitability of each process for implementation on the embedded system Assigned to the embedded system Video Sequence Crop 40x320 1 Background 2 Vertical 3 TSV 4 Subtraction Projection Transform Binarization 5 A B C D E Interaction Type 9 Interaction Classification 8 Blob Association 7 Blob Feature Extraction 6 Labeling I H G F Assigned to the PC-based system Figure 4. Division of tasks between the embedded and PC-based system

For the actual computation, we assumed that the system digitized a 30 [fps] monochrome video signal into a 320x240 image sequence. Also, we used R V =30 [pixels] and V V =40 [pixels]. Based on these considerations, we assigned stages A through E to the embedded system, while stages F through I were handled by the PC based system. The overall surveillance system is feedforward, and we want to partition the blocks to maximize embedded computations and minimize the data transfer from the embedded system to the PC 3. Dataflow Modeling and Ptolemy Simulation We determined the tasks to be assigned to embedded system in the previous section. In this section, we discuss the design and simulation of the embedded system using Ptolemy [10]. Figure 5 shows a block diagram of the embedded system on Ptolemy and Figure 6 gives the simulation result. Figure 5 follows the same steps as the block diagram discussed in Figure 1. Crop: A Background Subtraction: B Vertical Projection: C TSV Transform: D Figure 5. System design on Ptolemy 0 20 40 60 80 100 [frame] Figure 6. Results of the Ptolemy simulation In Figure 6, the input image sequence at the 20th, 40th, 60th, 80th and 100th frame (top rows) and the resulting TSV image at each corresponding frame (bottom rows) are shown. The horizontal axis of the original images and the TSV images are compatible. The vertical axes of the TSV images are velocity axes. The intensity represents the measure of existence in the position and velocity. Thus, a bright blob in the TSV image represents a human blob consisting

of pixels with similar velocities and locations. Actually, we can see that the bright blobs are located in the same horizontal coordinates as the persons. We validated the Ptolemy simulation results by comparing them with the previous PC implementation, and the two sets of results were the same. 4. Hardware Design With the specifications described in Table 2, we designed the hardware to perform the tasks that we simulated in Ptolemy. Figure 7 is a diagram of the embedded system. Figure 8 is a top view picture of the hardware. CPU Hitachi H8/3048F Microprocessor, 16-bit CISC, 16MHz Memory Mitsubishi Multi-port DRAM M5M442256 (1Mbit x 4) Implementation Hitachi C Compiler/Assembler Table 2. Hardware Specification Multi-port DRAM AD Multiport Multiport NTSC Signal Converter DRAM DRAM PC PLL Clock Microprocessor generator Microprocessor Analog Part Figure 7. Diagram of the hardware Figure 8. Picture of hardware design Table 3 shows the results of the hardware simulation. This system processes one frame within 89 ms, which means that this system computes the data corresponding to one frame within three image frame cycles and performs 10 frames/s. Also, we can see that the code size, internal RAM usage, and the data transfer rate to a PC are all very small. Computation time per one frame 89 [ms] < 100 [ms] = 3 frame-cycle Code size 5732 bytes Internal RAM usage 854 bytes Data rate to PC 128kbps Table 3. Result of Hardware simulation 5. Conclusion We have designed an embedded system that is a part of a hybrid system consisting of an embedded system and a PC-based system. We used a low-power CISC microprocessor and four multi-port DRAMs in this application. The execution time of the system is 89 ms per frame, that is, 10 frames/s. The performance bottleneck of the system is the DRAM access time,

because a huge amount of image data has to be transferred between the CPU and the DRAMs. We used several techniques to reduce the access time, including (i) designing an efficient code that minimizes the DRAM access time, and (ii) scheduling the DRAM access order so that the CPU can use the DRAM in the block transfer mode. We have discussed a surveillance system using a 16-bit general purpose processor. However, a DSP is generally considered to be most appropriate for a system dealing with a huge amount of data. Simulation and system design using a DSP, therefore, might better serve this purpose. We leave it as a future research project. References [1] J. K. Aggarwal and Q. Cai, Human Motion Analysis: A Review, Computer Vision and Image Understanding, vol.7, no.3, pp. 428-440, March, 1999. [2] I. Haritaoglu, D. Harwood and L. Davis, W4: Who, When, Where, What: A real time system for detecting and tracking people, Proc. Int. Conf. on Automatic Face and Gesture, Nara, pp 222-227, April 1998. [3] I. Haritaoglu and L. Davis, Hydra: Multiple people detection and tracking using silhouettes, Proc. IEEE Work. on Visual Surveillance, pp.6-13, Fort Collins, Colorado, June 1999. [4] N. Oliver, B. Rosario and A. Pentland, A Bayesian computer vision system for modeling human interactions, Proc. Int. Conf. on Vision Systems, Gran Canaria, Spain, pp. 255-272, January 1999. [5] A. Pentland, Looking at people: sensing for ubiquitous and wearable computing in IEEE Transactions on Pattern Analysis and Machine Intel., vol. 22, no. 1, Jan. 2000, pp. 107-119 [6] P. Mahonen, Wireless video surveillance: system concepts, in Proc. Int. Conf. on Image Analysis and Processing, 1999, pp 1090-1095 [7] Y. Shirai, J. Miura, Y Mae, M Shiohara, H. Egawa, S. Sasaki, Moving object perception and tracking by use of DSP, Proc. of Computer Architectures for Machine Perception, Nov. 1993, pp. 251-256 [8] K.Sato and J.K.Aggarwal, Tracking and Recognizing Two-person Interaction in Outdoor Image Sequences, in Proc. IEEE Work. on Multi-Object Tracking, Vancouver, CA, July, 2001, pp.87-94. [9] K.Sato and J. K. Aggarwal, Tracking objects using temporal spatio-velocity transform, Proc. IEEE Work. on Performance Eval. of Tracking and Surveillance, Kauai, Hawaii, December, 2001. [10] Ptolemy Project, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, http://ptolemy.eecs.berkeley.edu/.