Development of a high-resolution, high-speed vision system using CMOS image sensor technology enhanced by intelligent pixel selection technique

Development of a high-resolution, high-speed vision system using CMOS image sensor technology enhanced by intelligent pixel selection technique Kenji Tajima *a, Akihiko Numata a, Idaku Ishii b a Photron Limited, 1-1-8 Fujimi, Chiyoda-ku, Tokyo, JAPAN 102-0071; b Dept. of Artificial Complex Systems Engineering, Graduate School of Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, JAPAN 739-8527 ABSTRACT We have developed a prototype vision system maintaining conventional data transfer speeds that can achieve both high resolution and high-speed over 1000Hz feedback rate by using Mm-Vision concept, the technique of intelligent selection of pixel of interest for reducing the amount of output data. To verify the effectiveness of the system and its concept, some high-speed image processing experiments have been conducted by the prototype vision system built around a typical personal computer (PC) and software development environment (C or C++ language). In this paper, we also discuss dedicated imaging sensors based on the Mm-Vision concept to improve its performance and usability as well. Keywords: high-speed imaging, vision system, target tracking, CMOS image sensor, random access 1. INTRODUCTION In the real world of vision systems, there are many occasions where both high spatial resolution and high-speed / realtime processing are simultaneously required for applications. For such requirements, however, most of the conventional vision systems are tethered to the standard television systems (e.g. NTSC: 30Hz, PAL: 25Hz) that lack the necessary high-speed performance sufficient to record and analyze the dynamic changes which fast-moving targets show. Some vision chips are proposed that offer high-speed recording of over 1000 FPS (Frames Per Second) combined with high-speed image processing in each pixel in parallel as a single chip 1), but there are limitations on the chip size and the resolutions (number of pixels) under the currently available technologies, which restricts the realization of a vision system with high spatial resolution. As an alternative method to realize a high-speed real-time image processing, some vision systems have been developed that employ a high-speed digital video camera with recording rate of over 1000 FPS and dedicated hardware devices that can process image data output from the camera in real time. The drawback is, however, because high-speed cameras generally have parallel operation of multiple divided areas of an image frame, which compels the dedicated hardware device to be of a parallel structure that consequently tends to make the size of the whole system much larger than what is favorable. In addition, placing a certain limitation on the algorithm must be taken into consideration to avoid the expansion as a result of division of image frame into multiple areas. To overcome the problems, that is, the lowered spatial resolution, increase of the system size, reduced system versatility, etc., as a result of speeding up of the system, a concept of system, the Mm (Mega-pixel and millisecond) - Vision system, incorporating an electronic visual feedback configuration based on the intelligent pixel selection technique has been proposed 2). This paper discusses the basic functions of the prototype camera system developed on this Mm-Vision concept. As verification examples of the developed system, experiments of high-speed target tracking using some visual feature extraction techniques are discussed to verify its high-speed performance and the image processing functionalities in order to find specifications for an Mm-Vision system with more versatility and expandability for future development. * tajima@photron.co.jp http://www.photron.com, http://www.photron.co.jp

2. MM-VISION CONCEPT In the great majority of current vision systems, all images shot by the camera are transferred to the processing unit where all of them are processed in one way or another. As the time for actual processing, as well as data transfer from the camera to the processor unit, of the images rapidly increases along with the improvement of the camera s spatial resolution, it has been increasingly difficult to improve the performance of the entire system. For example, in the case of a system to process 1000 image frames (8 bits) of one megapixel (Mpixel) resolution (e.g. 1024 x 1024 pixels) per second, a transfer / processing capacity of 1 GB/sec (gigabytes per second) is required which in turn requires a system comprising a very large, complex hardware system. On the other hand, however, it is often found that only a tiny portion within image frames shot by the camera is needed for final image processing in vision systems. So, in the Mm-Vision System, we have tried to realize a small-scale and low-cost system, to avoid the above-mentioned bottleneck in communication and processing, by making it possible to transfer, from the entire image area within the imaging sensor, only a local domain image containing necessary information using a feedback configuration via a coordinate transformation circuit. The overall performance of an Mm-Vision system can be improved in inverse proportion to the increasing amount of image processing and the size of the required local domain without ever adversely affecting the resolution of the imaging sensor. The processing flow of the Mm-Vision concept is shown in Fig. 1. Fig. 1. Basic processing flow with the Mm-Vision concept. 2.1. Imaging sensors for Mm-Vision In the Mm-Vision concept, an imaging sensor that has a performance of selective readout of any local domain or discrete pixels within the entire imaging area (random access capability) is required. Imaging sensors are generally divided into two types: one based on the CCD and the other on the CMOS technology, respectively. With the CCD-based imaging sensors, however, because of its particular structure for data readout using the serially-contiguous pixels or transfer lines (Charge Coupled Device), it is difficult to have a random-access capability. With the CMOS technology, on the other hand, because of its two-dimensional structure of the data readout circuit within the imaging sensor, which allows for data readout by selectively addressing X and Y coordinates, it is relatively easy to realize a random-access performance. But, with most of the CMOS imaging sensors that are commercially available, the readout function of the readout circuit is fixed to one exclusive way, and CMOS sensors allowing for

random access to any part within the imaging area at every readout of the frame is scarce. Moreover, with the CMOS imaging sensors, the method of readout (addressing) is very much like that for the DRAM (Dynamic Random Access Memory) devices, which inevitably requires latency (wait time) to be introduced when, for example, the readout line is changed. This may not be a great problem for readout scanning the image data in one direction (sequential access) or readout of a rectangle domain, but to read out the image data selecting any pixel locations to read out (random access), latency is generated at readout of each pixel which amounts to a considerable magnitude when accumulated causing the readout speed to be greatly decreased. This means, if the number of pixels to readout remains the same with every frame, the readout time greatly varies by the shape of the target local domain image and the order of pixels being accessed. With the Mm-Vision concept, because it is intended that the total system processing speed is improved by reducing the number of pixels to read out of each frame being read out, the performance deterioration by the latency in the imaging sensor does not have any meaningful effect on the overall performance of the system. An ideal imaging sensor suitable for the Mm-Vision concept therefore should be one that allows for random access without causing any latency with readout time that is proportional only to the number of pixels to read out. To realize a Mm-Vision system, therefore, we have two options one to develop a new imaging sensor meeting all the requirements of the Mm-Vision concept, the other to use a combination of a commercially available imaging sensor with high-speed recording capability of over 1000 FPS and an external memory (SRAM based memory) devices. 2.2. Image processing with Mm-Vision With the Mm-Vision concept, there is no requirement for the specification of image processing methods. By simplifications of image processing algorithms by the reduced data amount and improvement of recording speed (frame rate), both made possible by the Mm-Vision concept, varied manipulations of image data can be done by hardware processing on simple circuits or software processing on the standard, commercially available personal computer (PC). Also, time-consuming processes such as image expansion, rotation, reduction and geometrical transformation can be carried out as the image data is being read out from the imaging sensor by the use of random access and affine transformation techniques without any additional processing time (Fig. 2). In addition, to meet a requirement for performing hardware image processing, it is possible to develop special hardware device rather easily because the data rate is low. Fig. 2. Examples of affine transformation on the chip

3. MM-VISION PROTOTYPE DESIGN To verify the Mm-Vision concept, we worked out an experimental system using a general-purpose CMOS imaging sensor. The experimental system consists of a camera head, dedicated PCI board, device driver, software development kit (SDK) library and a computer. Fig. 3(a) presents an outline view of the system. The camera head and the PCI board are connected with one single cable that is 7 meters long. Image data transfer from the camera head to the computer and control of and power supply to the camera head are all taken care of by this cable. Settings of the camera head, such as the exposure time, position and range of local domain for readout can be done by software control from the computer using the SDK. To do a software image processing, the overall execution speed of the entire system depends on the computer s process capability. 3.1. Mm-Vision prototype camera head For this experimental development project, we used a CMOS imaging sensor, which would promise the highest possibility of realizing the Mm-Vision concept, selected out of all commercially available sensors. Although the imaging sensor has certain limitations because it is not custom-made for this experiment, it allows for arbitrarily changing, frame-by-frame, the location of local domains for readout and for high-speed output of the digital image data that was read out from the imaging sensor. The data uses the high-speed serial communication format using the LVDS (Low Voltage Differential Signaling), taking into consideration the high-speed operation and physical flexibility of the transfer cable. The camera head has an FPGA (Filed Programmable Gate Array) installed in it to control the imaging sensor operation. The image data output from the imaging sensor is also input to the FPGA, which makes it possible to perform a hardware image processing on the image data. The prototype camera head has a 3x3 filter with programmable kernel parameters and a LUT (Look Up Table) implemented in it. Table. 1 shows the basic specifications. Table. 1. Mm-Vision prototype camera head specification Imaging device CMOS Image Sensor (2/3 inch) Maximum resolution 1280 x 1024 (H x V) Pixel size 6.7um x 6.7um Electrical shutter Variable (0.1msec 26.2msec) Frame rate at full resolution reading 14 FPS Lens mount C mount AD conversion resolution 10bit Data clock 40MHz Sensitivity 1.8V / lux.sec Sensitive waveband 400nm -1000nm (w/o IR cut filter) Position changeable at any timing (each frame at least) Windowing capability 2 pixels step for X direction 1 line step for Y direction 10bit to 8bit LUT Internal hardware processing 3x3 programmable filter Binarization with programmable threshold Others Output data bus 24bit (Maximum) Connector and cable MDR 26pin, 7m long Power +12V DC(Supplied from computer via the cable) Camera head size 110mm x 50mm x50mm 3.2. Dedicated PCI board, its device driver and SDK The image data output from the prototype camera head is captured by the dedicated PCI board and transferred to the main memory in the computer via the PCI bus. The image data transferred to the main memory can now be processed with the standard software programs. Controls over the image data transfer as well as the camera head and pertinent settings can be programmed using the SDK.

3.3. Mm-Vision prototype basic performance The prototype camera head can also be used as a conventional camera. As an example of the camera being used as a conventional camera, a picture of test target with full resolution is shown in Fig. 3 (b). The imaging sensor used in the prototype camera head has a single color filter array (CFA) built on it and can readily be used for color imaging. The process to derive RGB color data from the single-chip color sensor (called the Color Interpolation) can be installed either in the software or in the FPGA in the camera head. (a) (b) Fig. 3. (a) Mm-Vision prototype system and (b) sample image taken by the system. 4. EVALUATIONS WITH HIGH SPEED TARGET TRACKING Three tests for high speed target tracking were done with the Mm-Vision prototype system and software-based image processing. Those image processing units have been programmed in C++ language using the SDK. The overview of the computer used for the tests was as follows: CPU: AMD Athlon XP2800+ Memory: 512MByte OS: Windows 2000 Professional PCI-bus: 32bit, 33MHz 4.1. Window tracking using center of gravity We verified the high-speed performance of the prototype camera head by conducting target tracking tests of binary image data of a small white ball as a target. We find the deviation of the target position between frames can be minute if the framing rate of the system is sufficiently high against the target movement. In this case, by searching only the local domain image immediately surrounding the center of gravity of the target in the previous frame, the location of the center of gravity of the target can be measured in the subsequent frame. Using this characteristic as described in the above, we implemented an algorithm in the software that performs high-speed tracking of the target by changing the location of the center of a search window with a certain magnitude overlapping the moving target s center of gravity frame by frame as shown in Fig. 4. Table. 2 shows the result of the test conducted measuring the tracking speeds, while changing the position of the search window, in case of one target and its window.

Table. 2. Tracking rates with one window (Hz) Exposure Time (msec) Window Size 0.1 0.3 0.5 0.7 1.0 32 x 32 3730 2180 1500 1160 860 64 x 64 1730 1510 1150 940 750 96 x 96 830 760 730 710 620 Fig. 4. Window tracking using the center of gravity at high frame rate. We are showing the result of actual tracking of the target in the following. The search window size is 64 x 64 pixels and the target is a white ball of about 20 mm diameter attached to the tip of a pointer. Fig. 5 shows the experimental environment and temporal changes of the trace of the target while the pointer was circled about two times a second. A tracking of the target has been attained at the high operating speed and the mega-pixel spatial resolution of the system. Fig. 5. Target tracking experiment and its results by the prototype system. 4.2. Target tracking using template matching To verify the image processing capability of the system using grayscale images, we performed another target tracking test using the template matching technique. Matching was carried out, frame by frame, by calculating the SAD (Sum of Absolute Differences) to find out the position where the SAD value gets the minimum. In this case, too, using the characteristic of high-speed vision system that the deviation of a target position between frames is minute, we were able to speed up the matching process by limiting the SAD comparisons to local domains only. Fig. 6 shows the example of test performance as described in the above. The picture in the upper left corner depicts the captured image of 64 x 64 pixels, the windowed portion of 32 x 32 pixels being the template image used for matching. The entire image presents the trace of tracking. With a small template image of 32 x 32 pixels, it has been verified that a matching process at a speed around 1000 FPS is possible using grayscale images of the complexity used in this test.

Fig. 6. Target tracking using the template matching with the prototype system. 4.3. Color tracking As described elsewhere in the above, the Mm-Vision prototype system can work on color images. With a capability to distinguish colors, the system may be able to effectively identify a target in a situation where identification is hard in grayscale images. We verified the capability of the system to process color images by performing high-speed target tracking using the color feature detection technique. To identify the target, the following equations are derived from each of the RGB values: max = max[r, G, B] (1) min = min[r, G, B] (2) A = (max - min) / 255 x 100 (3) We calculated the A value and the hue value defined by the above equations to compare them with the values of the color of the target identify it. Fig. 7 shows the example of test performance conducted in the above. The picture in the upper left corner depicts the captured image of 64 x 64 pixels, and the color (red) shown to the right is the color of the target that was tracked. Fig. 7.Color target tracking with the prototype system It has been verified by the above test that a tracking of color target is possible at a framing rate of several hundred frames per second even in a situation where it is extremely hard to track grayscale images.

5. CONSIDERATION OF SPECIAL IMAGING SENSORS Through the evaluation of the prototype system, the effectiveness of the Mm-Vision concept has been confirmed. However, because the prototype camera system uses a commercially available CMOS sensor, the Mm-Vision concept has not been realized in whole. Because there exists latency, pixel-by-pixel random access is virtually impossible, and the improvement of performance slowed down considerably because the ratio of latency grew faster as the size of rectangular local domains to be read out was reduced. We discuss the specifications required to develop an imaging sensor that meets the requirement of the Mm-Vision concept. 5.1. Basic requirement 5.1.1 Random accessing without any latency Random accessing without any latency means that the readout process is carried out of local domain images or discrete pixels in a constant period of time on sufficiently fast clock speed in whatever order the pixels may be read out. A rough idea for the clock speed, although the faster the better, would be that it is fast enough to produce a frame rate of 30 FPS to read out 1M pixels. 5.1.2 Nondestructive readout Nondestructive readout is as important as the ransom access capability. The reason is that, when an affine transformation circuit, for example, is used for calculation of readout coordinate for a random access operation, the same set of pixels are repeatedly read out (accessed). Nondestructive readout is possible with some of generally available CMOS imaging sensors, but most of them store the image information in the form of electric charges stored in capacitors, which discharge over time resulting in lowered signal levels. Several solutions to this problem are conceivable. One method we are considering is that once the image information is converted into digital form and stored in SRAM-based digital memory, nondestructive readout is subsequently possible without any degradation of the signal level. Fig. 8 shows the conceptual timing of random access and nondestructive readout operation mentioned in this and the above sections. 5.1.3. Analog to digital converter and peripheral circuits on a chip One of the advantages of the CMOS imaging sensors is that a general CMOS circuit can be configured on one chip, which will greatly contribute to reduction of system size and complexity. So, for imaging sensors for the Mm-Vision applications, we endeavor to make the design of peripheral hardware devices easier by implementing the AD converter to digitize the recorded image data and the drive circuit on a chip. For the drive circuit, we must develop its specifications very carefully because the versatility and future expandability may be adversely affected if strict limitations are set around its features. Fig. 8. Conceptual timing of full random accessing and its readout.

5.2 Proposed Architecture for Mm-Vision Image Sensor We propose the following two architectures with the above factors in mind. Both of them are built on the basic structure that the image data is digitized on the sensor chip and stored in the SRAM-based digital memory, and subsequent access from outside is done in the digital memory. 5.2.1 Pixel Parallel Architecture This proposed imaging sensor is based on the architecture having an AD converter and digital memory for one pixel within one sensor pixel, and the AD converter has only to perform one AD conversion at every readout of one frame of image data, which result in an easier speed-up the AD conversion and then the total operation (Fig. 9). Imaging Sensor Chip Fig. 9. Example pixel structure of the pixel parallel architecture. However, because more circuits and wirings are needed in the sensor areas which are otherwise set aside for photo detective device (e.g. photodiode), the light input area (fill factor) is reduced considerably to result in a lowered sensitivity, or the pixel size must be made larger which inevitably expands the total chip size. Lowered sensitivity requires much higher illumination for high-speed framing with shorter exposure time and causes the user much inconvenience in actual operation. With larger-sized chips, some disadvantages are expected such as the cost increase and the yield decrease in the chip manufacture and less flexibility in the selection of taking lens. However, along with the future advancement in manufacturing technologies that may miniaturize production processes, it is expected such adverse effects may be decreased sooner or later. As a derivative idea, an architecture that uses one ADC and digital memory units for multiple pixels, 2 x 2 pixels for example, not for one single pixel, may be considered. 5.2.2 Column Parallel Architecture This is an architecture that can maintain the sensitivity and the sensitive area size the same as before by having an AD converter for each line or for each set of multiple lines and separating the digital memory from the photo detective area, without using the latest miniaturized manufacturing process. The drawback of this architecture is, however, because each of the AD converters must AD convert the image data from one line or one set of multiple line for each frame, it is not suitable compared to the pixel parallel architecture for high-speed operation. An example structure with this architecture is shown in Fig. 10.

Fig. 10. Example structure of the column parallel architecture. 6. CONCLUSION This paper presented the prototype system designed and developed based on the Mm-Vision concept that satisfies both high spatial resolution and high-speed operation simultaneously. We also demonstrated that high spatial resolution and high-speed processing can be realized on a relatively small-scale and versatile system by the tests of window tracking technique using the center of gravity information, template matching technique using grayscale images and high-speed target tracking using the color feature extraction technique. With all the above we discussed the specifications for a special imaging sensor that completely satisfies the requirements of the Mm-Vision concept. We are planning on developing specifications for a dedicated imaging sensor, based on the findings through the above tests, and on implementing a camera head, and consequently, a vision system. Also, we will be working on application systems of the Mm-Vision concept. ACKNOWLEDGEMENTS The authors would like to acknowledge, and express gratefulness for, the assistance and cooperation extended to them by Hiroshi Nadatani and Kiyohito Ochi of Tokyo University of Agriculture and Technology, Kazuki Kato and Shogo Kurozumi of Hiroshima University and Hiroshi Nagai of Photron Limited. REFERENCES 1. Masatoshi Ishikawa and Takashi Komuro, Digital Vision Chips and High-Speed Vision Systems (Invited), Digest of Technical Papers, pp.1-4, Symposium on VLSI Circuits, Kyoto, 2001 2. Idaku Ishii, High Speed Mega-pixel Vision with Intelligent Scanning, Proc. of the 20th Annual Conference of the Robotics Society of Japan, 3A15, 2002 (in Japanese).