An introduction to digital whole slide imaging and whole slide image analysis Toby C. Cornish, MD, PhD Assistant Professor of Pathology July 19 th, 2013 Disclosures Consultant: DigiPath, Inc. Shareholder: DigiPath, Inc. Consultant: Accelpath, Inc. Shareholder: Accelpath, Inc. Objectives At the conclusion of this activity, the learner should: Understand the basic principles of whole slide imaging Recognize differences in approaches to WSI Understand different approaches to whole slide image analysis Recognize pitfalls in acquiring, storing and analyzing whole slide images What is whole slide imaging? What is it? Creation of a single, high magnification digital image of an entire microscopic slide What use is it? Education Research Publication Telepathology / Teleconsultation Primary diagnosis What is whole slide imaging? In a nutshell, how does it work? An automated microscope scans an entire slide at one or more resolutions Digitally knits the consecutive small images together into a single large image This virtual slide is stored locally or remotely What is whole slide imaging? Local viewing of the virtual slide is possible using specialized viewing software Remote viewing of the virtual slide requires a server-client arrangement Image analysis can be performed on this virtual slide Page 1 of 17
WSI: A layer model Image Analysis Digital slide viewer Cornish Digital Path / Image Analysis Image Analysis Digital slide viewer Software Digital slide repository/server File format Compression Whole slide image Software Digital slide repository/server File format Compression Whole slide image Acquisition software Acquisition software Detection Detection Hardware Optics Slide scanning Hardware Optics Slide scanning Slide handling Slide handling Basically, all WSI hardware are robotic/automated microscopes with specialized acquisition software Some instruments are more specialized and purpose-built than others James Bacus, PhD BLISS System (c. 1994) First dedicated virtual microscopy system Olympus Nikon Zeiss Hamamatsu Omnyx (GE/UPMC) Philips BioImagene (Ventana) MikroScan Digipath 3D Histech TissueGnostics Applied Imaging (Ariol) HistoRx Cri/Caliper Leica Aperio Motic Others Page 2 of 17
Capabilities vary widely: Brightfield only vs. Fluorescence only vs. Both Capacities from 1 slide to 400 or more slides Various slide handling mechanisms Capabilities vary widely: 1 x3 slides only vs. 1 x3 and 2 x3 (whole mount) slides Single plane scan vs. z-stacking (vs. limited z-stacking) Software features s Modalities Slide handling Slide scanning method Focusing strategy Brightfield WSI The most common type of scanner By far the least complicated & least expensive Images are acquired and stored as RGB images Use: H&E, IHC, CISH, other visible stains Presents unique analysis challenges Brightfield WSI Fluorescence WSI Same principle as a standard fluorescence microscope: Excitation source Excitation filters Emission filters Fewer instruments on the market; costly Most are brightfield/fluorescence scanners Analysis can be more straighforward than brightfield Uses: Immunofluorescense, FISH Page 3 of 17
Fluorescence WSI Multispectral WSI Combination of a liquid crystal tunable filter and spectral unmixing software Very few (1?) products on the market Extracts more information from a slide by capturing a spectrum Can unmix overlapping chromagenic or fluorescent stains Can effectively remove autofluorescence Slow Multispectral WSI Slide Handling Slide handling Single slide/stage Standalone autoloaders ( hotels ) Slide trays Slide magazines Slide scanning Tiling Two approaches to scanning a slide: Tiling (Bacus patents) Line scanning (Aperio patents) In both cases, the resulting images (tiles or strips) are fitted together into a single large image, i.e. the whole slide image 40x The slide is scanned as a series of rectangular tiles The highest physical magnification desired is used (i.e. 40x, 20x) The strips are assembled into a WSI either concurrently or after the scanning is finished Page 4 of 17
Line scanning Focus strategy: focus every field 40x Usually associated with tiling The slide is scanned as a series of long, narrow strips The highest physical magnification desired is used (i.e. 40x, 20x) The strips are assembled into a WSI either concurrently or after the scanning is finished After moving the stage, each field is autofocused and the tile is imaged Time consuming,?more accurate = Focus point Focus strategy: focus n th field Focus strategy: focus map Usually associated with tiling Autofocusing occurs every n th field Faster, simpler Placement of focus points lacks context Used with line scanning or tiling Focus points are distributed over the tissue forming a surface Focus is calculated for intervening tissue = Focus point Faster,?less accurate = Focus point Image Analysis WSI file size Digital slide viewer Software Digital slide repository/server File format Compression Whole slide image Once all the image data is acquired, the WSI must be stored on disk These files are relatively large Hardware Acquisition software Detection Optics Slide scanning Slide handling Typical resolution for images with apparent magnification of: 40x: 0.25 micron/pixel (mpp) 20X: 0.50 micron/pixel (mpp) Page 5 of 17
WSI file size WSI file size The standard WSI reference tissue size 15 mm 15 mm 20x: 0.5 mpp 15 mm x 15 mm = 30,000 x 30,000 micron = 30,000 x 30,000 px = 900,000,000 px = 225 Mpx 225 Mpx x 24bit/px = 2.51 GB uncompressed The unofficial standard for quoting scan speed and size: 15 mm x 15 mm 40x: 0.25 mpp = 10.1 GB uncompressed WSI file size Due to the large size, WSI data are usually compressed using lossy algorithms Typically JPEG2000 or JPEG Ratios of 1:20 to 1:10 or so 20x: 2.51 GB uncompressed ~ 128-256 MB compressed 40x: 10.1GB GB uncompressed ~ 502 MB 1 GB compressed WSI file format No standard file format for WSI Four non-proprietary/open formats exist DICOM standard (supplement 145) approved in 2010; no reference implementation yet JP2, the file container for JPEG2000 codestreams TIFF/BigTIFF with JPEG compression Deep Zoom images Numerous proprietary formats Results in some degree of vendor lock-in WSI file format WSI file format Most WSI files contain an image pyramid Zoom levels are precalculated and stored in the file The image at each zoom level is broken into small tiles (e.g. 256 x 256 px) DICOM Supplement 145 The small tiles are stored in the image file (or directory) Each additional channel or z-planes are also present in the pyramid DICOM Supplement 145 Page 6 of 17
Image Analysis Digital slide repository Software Hardware Digital slide viewer Digital slide repository/server File format Compression Acquisition software Detection Optics Slide scanning Whole slide image WSIs are usually stored in a dedicated server or Digital slide repository Stores image data and metadata Serves image data to a viewer client for remote access May support thick and/or thin viewer clients May support image analysis Slide handling WSI retrieval Image Analysis Digital slide viewer A viewer client connects to the DSR The client requests the area of the image being displayed (green) at a particular zoom level Software Digital slide repository/server File format Compression Acquisition software Detection Whole slide image The DSR then sends only the tiles needed to fulfill the request Hardware Optics Slide scanning Slide handling DICOM Supplement 145 Analysis Sampling Huge topic many methods Approach depends on the modalities, detection methods and what you are trying to measure Also depends on what software is available First question: how to process the WSI? Sample the WSI and analyze the extracted images. This systematic random sampling example uses a grid-based pattern Page 7 of 17
Large area analysis Large area analysis Break image into a large number of manageable images, process each image separately In reality, each separate image should also include a small surrounding area to avoid cutting bordering objects in half Analysis Types of measurement include: Area (absolute or relative) Count (or density) Amount (or intensity) Detection methods include H & E or other histological stains (Feulgen, trichrome) Antibodies (IF or IHC) Labeled nucleic acid probes Analysis pipeline image processing / analysis components are combined into a pipeline Components include: Noise reduction & grayscale filtering Segmentation Color deconvolution Binary operations Boolean operations Object detection Many more Analysis pipeline We will discuss a few common examples of pipeline components Segmentation Segmentation is the partitioning of an image into two or more regions Uses include: separate ROI for analysis from the background separate positively-stained cells from negative cells Separate a TMA into individual cores Separate tissue regions into distinct ROIs for analysis (e.g. tumor vs. normal) Page 8 of 17
Analysis: segmentation Manual segmentation Many methods of segmentation, including: Manual Intensity-based Color-based Classifier-based Morphology-based Texture-based Segmentation of a WSI into individual TMA cores Manual segmentation Intensity/grayscale segmentation Grayscale segmentation can be used to segment tissue from non-tissue Tissue Background Segmenting the tunica media in a artery using a freehand polygon tool for drawing regions of interest (ROIs) Intensity/grayscale segmentation Commonly used with multichannel data Intensity/grayscale segmentation Commonly used with multichannel data BETA CELLS DAPI (NUCLEI) MERGED ALPHA CELLS ANTI-INSULIN (BETA CELLS) ANTI-GLUCAGON (ALPHA CELLS) NUCLEI Page 9 of 17
Intensity/grayscale segmentation Cornish Digital Path / Image Analysis So how does one segment brightfield (RGB) images? DAPI Channel Nuclei Background Color-based segmentation Only useful for color images, i.e. brightfield images Two common approaches: A brief aside on color Color space segmentation Color deconvolution + intensity-based segmentation R G B RGB color space Analogous to human reception of color, CRTs, color digital cameras, etc. 1 pixel = 3 values (Red, Green, Blue) If 8 bits (256 levels) each, 16.7 M colors in combination 24 bit color ( true color ) 255,0,0 255, 255, 255 0,255,0 0,0,255 Many, many other color spaces HSV CIE Lab YUV YCbCr CMYK And others Page 10 of 17
HSB/HSV color space Color space segmentation H S More intuitive for human interaction 1 pixel = 3 values Hue, Saturation, Brightness (Value) Hue defines a color; saturation the amount of color present -axis Defines the brightness B S H 0o S 35 o 0 o H = 0 to 35 degrees S = 0 to 100% B = 0 to 100% Color deconvolution HSB colorspace segmentation in TMAJ Method for separating out the color components of a given pixel Application to IHC: Ruifrok and Johnston Quantification of histochemical staining by color deconvolution. Anal Quant Cytol Histol. 2001 Aug;23(4):291-9 Works because color IHC dyes are subtractive in nature Color Color deconvolution of ink Color image of mixed inks Additive Color Model Subtractive Color Model Ink 1 Ink 2 Fluorescence/Light IHC/Dyes from 4N6site.com Page 11 of 17
DAB-H example in RGB space Color vectors for deconvolution An RGB image of DAB and hematoxylin-stained tissue, plotted in the RGB colorspace with histogram. Pure white pixels are present in the upper right corner of the RGB colorspace (white sphere). As dye intensity increases, the pixel color moves along the brown and blue vectors to the pure dye colors (brown and blue spheres). By calculating the vectors, the individual dye contribution at each pixel can be determined. Color deconvolution results Color deconvolution example RGB blue / hematoxylin brown / DAB Hematoxylin Color deconvolution example Classifier-based segmentation DAB WSI of foreskin stained for cell of interest; What is the cell density in each compartment? Page 12 of 17
Classifier-based segmentation Cornish Digital Path / Image Analysis Classifier-based segmentation Four images were used as a training set. Images were segmented into three classes: Dermis; Epidermis; and Other (including stratum corneum layer). Total training time: about 5 minutes. (Inform, courtesy Caliper Life Sciences) (Inform, courtesy Caliper Life Sciences) Area quantification: lung Ca Image Analysis Examples BAC in mouse lung, H&E stain Area quantification: lung Ca Area quantification: lung Ca Tissue Tumor BAC in mouse lung, H&E stain Manual segmentation using ImageScope Page 13 of 17
Area quantification: lung Ca Area quantification: lung Ca Tissue Tumor Manual segmentation using ImageScope Tumor Layer Area quantification: lung Ca Stereology examples Although reasonably straightforward, this method can be very time consuming Stereology techniques, which are based on unbiased sampling might be more efficient Stereology tools for WSI are available for Spectrum (third party) and from other sources Nuclear staining: ki67 Nuclear staining: ki67 Follicular lymphoma Calculate number and % positive nuclei (labeling index) Page 14 of 17
Nuclear staining: ki67 Nuclear staining: ki67 Aperio Nuclear quantitation algorithm Color deconvolution - hematoxylin Nuclear staining: ki67 Nuclear staining: ki67 Color deconvolution Ki67/DAB Nuclear quantitation: overlay Nuclear staining: ki67 Ki67-DAB/H Manual Aperio Overlay Nuclear quantitation: nuclear parameters Page 15 of 17
Ki67-DAB/H Manual Cornish Digital Path / Image Analysis Comparison with manual counts Aperio Overlay Comparison with manual counts Pitfalls Pitfalls When analyzing WSI, there are several issues that must be kept in mind, including: Depth of focus Image storage Image analysis tools Interpreting analysis results Most slide scanners do not support z- stacking or support limited z-stacking If z-stacking is used, file size can skyrocket Tools for analysis of 3D datasets are limited Use: FISH, cytology Depth of focus Page 16 of 17
Image storage While TMAs are high density, experiments using whole tissue sections can require massive storage Use the lowest magnification practical 100 slides at 4x ~ 820 MB 100 slides at 10x ~ 5 GB 100 slides at 20x ~ 20 GB 100 slides at 40x ~ 80 GB Image storage While TMAs are high density, experiments using whole tissue sections can require massive storage Use the lowest magnification practical 100 slides at 4x ~ 820 MB 100 slides at 10x ~ 5 GB 100 slides at 20x ~ 20 GB 100 slides at 40x ~ 80 GB Image analysis tools WSIs require specialized analysis packages because: The file formats are high specialized WSIs are very large Interaction (e.g. annotation or previewing) requires specialized viewers TMAs that have been segmented into core images are an exception Most hardware companies will sell analysis software, usually at the premium price typical of image analysis software Vendor Lock-in! Interpreting analysis results Quantitative analysis of WSIs is a misleading term Usually what is returned is uncalibrated, relative units Most stains are not stoichiometric Most detection processes are non-linear Must keep in mind three levels of measurement: Interpreting analysis results Acknowledgements Ordinal scale: measurement provides a rank order only; distance between units not uniform Non-parametric; median Interval scale: distance between units ( interval ) is uniform Parametric allowed; mean, median Ratio scale: distance between units is uniform and a value of zero is defined; ratios between measurements have meaning Parametric allowed; mean, median JHU (current & former) Cory Brayton Angelo De Marzo Marc Halushka Kristen Lecksell James Morgan Bora Gurel Caliper Life Sciences James Mansfield Page 17 of 17