Text Localization & Segmentation in Images, Web Pages and Videos Media Mining I

Text Localization & Segmentation in Images, Web Pages and Videos Media Mining I Multimedia Computing, Universität Augsburg Rainer.Lienhart@informatik.uni-augsburg.de www.multimedia-computing.{de,org}

PSNR_Y Goal: Text Extraction Locate text of any size at any position in images, web pages and videos Segment and recognize text Encode extracted text as rigid foreground object in MPEG4 (with Yen-Kuang Chen) 27.5 31.5 31 30.5 30 29.5 29 28.5 28 Signle VOP 160 165 170 175 180 185 190 195 KBits/sec Multiple VOP 2

Related Work 1. Y. Zhong, K. Karu and A. K. Jain. Locating Text in Complex Color Images. Pattern Recognition, Vol. 28, No. 10, pp. 1523-1535, October 1995. 2. Rainer Lienhart and Frank Stuber. Automatic Text Recognition in Digital Videos. In Image and Video Processing IV 1996, Proc. SPIE 2666-20, pp. 180-188, Jan. 1996; also TR-95-036, Dec. 1995. 3. B.-L. Yeo, B. Liu. Visual Content Highlightning via Auromatic Extraction of Embedded Captions on MPEG Compressed Video. IS&T / SPIE Digital Video Compression: Algorithms and Technologies, Feb. 1996. 4. Rainer Lienhart. Automatic Text Recognition for Video Indexing. Proc. ACM Multimedia 96, Boston, MA, Nov. 1996, pp. 11-20. 5. S. Sato and T. Kanade. NAME-IT: Association of Face and Name in Video. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, Puerto Rico, 17-19 June, 1997. 6. Sato, T., Kanade, T., Hughes, E., Smith, M. Video OCR for Digital News Archives. IEEE Workshop on Content- Based Access of Image and Video Databases (CAIVD'98), Bombay, India, January, 1998. 7. Anil K. Jain and Bin Yu. Automatic Text Location in Images and Video Frames. Pattern Recognition, Vol. 31, No. 12, pp. 2055-2076, 1998. 8. H. Li, O. Kia and D. Doermann. Text Enhancement In Digital Videos. In Proceedings of SPIE99, Document Recognition and Retrieval, 1999. 9. Rainer Lienhart and Wolfgang Effelsberg. Automatic Text Segmentation and Text Recognition for Video Indexing. ACM/Springer Multimedia Systems Magazine, Vol. 8, pp. 69-81, Jan. 2000. 10. Huiping Li, David Doemann, Omid Kia. Automatic text detection and tracking in digital video. IEEE Transactions on Image Processing, Vol. 9, No. 1, Jan. 2000. 11. Daniel Loprestie and JiangYing Zhou. Locating and Recognizing Text in WWW Images. Information Retrieval 2 (Kluwer Academic Publishers.), 177-206, (2000). 12. Axel Wernicke and Rainer Lienhart. On the Segmentation of Text in Videos. IEEE Int. Conference on Multimedia and Expo (ICME2000), Vol.3, pp. 1511-1514, July 2000. More information at www.videoanalysis.org Rainer Lienhart, Axel Wernicke. Localizing and Segmenting Text in Images and Videos. IEEE Transactions on Circuits and Systems for Video Technology, pp. 256-268, April 2002. 1996 1998 2000 1 2 3 4 5 6 7 8 9,10 12 11 3

Design Decisions What kind of text occurrences? Scene text Overlay text With what style attributes? Font size Font type Text color In what kind of media data? Image-based Video-based any both What should be achieved? Localization Segmentation Recognition Integrated recognition How will the results be used? Indexing both Object-based video encoding 4

Overview OCR result: Dec 25 1998 5

Text Localization (1/2) 6

Text Box Consolidation (2/2) Derive initial text bounding boxes Refine bounding boxes Remove text boxes which are Too small/large, or Have a bad width-to-height aspect ratio 7

Monitoring + Tracking Result: Text Objects 8

Background Removal Temporal alignment of text lines 3 bitmap at t, t+45, t+90 Low variance image Border floodfilling Binarized image 9

Experimental Results Text localization Image-based: 69.5% (boxes) / 85% (pixels) Video-based: 94.9% (boxes) Text segmentation 79.6% correctly segmented 7.6% damaged, but still recognizable Text recognition 70% (over all steps) 10

Demo 11