Image Analysis Using the Aperio ScanScope Allen H. Olson, PhD Algorithm Development Engineer Aperio Technologies INTRODUCTION Why should I choose the Aperio ScanScope over competing systems for image analysis? This is the fundamental question that I frequently get asked. The answer is simple: studies that take weeks or months with competing systems can be accomplished in only a few days using a ScanScope and it s suite of powerful viewing and analysis tools. With the ScanScope system, you will scan and analyze entire slides. Not only will you work more efficiently, but your final results will be better, because you are not limited to analyzing only small representative portions of a specimen. Here are some important reasons for choosing Aperio for your image analysis solution. Entire-Slide Scanning and Analysis: No more hunting around with a microscope, trying to find representative image fields to capture. This time consuming and labor intensive step is replaced with a fast, automated scan of the entire slide. You will have the added advantage of never having to worry about whether you missed something. The entire digital slide is always available for immediate inspection, along with all analysis results. One ScanScope Replaces a Number of Automated Microscopes: This is because the entire slide is scanned first and the scanner is not needed for analysis tasks. After scanning, digital slides are available for use on your network by any number of users. Users no longer have to compete and wait for expensive microscope systems to become available to do analysis, but use network computers to view and analyze the digital slides. Scanning First Saves Time: No more waiting around, during the analysis step, for time consuming image capture. This saves valuable time for the doctor or researcher. In high throughput situations, new slides are being scanned while previously scanned slides are being viewed and analyzed. Extremely Fast Viewing: Digital images are stored as a single file in a tiff format that allows rapid access to any image location at any magnification. You can download our free ImageScope viewer and experience this first hand at http://aperio.com/. Entire Slides Analyzed in Minutes: Because viewing the digital slide is so fast, you can zoom to any magnification very quickly and designate regions for analysis in just a few seconds. Drawing tools are available for rectangular and irregularly shaped regions, as well as exclusion regions, or you can simply analyze the entire digital slide. After designating analysis regions, and selecting the desired algorithm, a single click starts the analysis all results appear on the screen upon completion. Batch mode analysis: You don t have to wait for one analysis job to finish before moving on to the next digital slide for analysis. With the Aperio Client/Server software, a job queue is maintained and each job is processed in order. Multiple client machines can be configured to process jobs, so that analysis throughput can be made as fast as needed. A large queue of jobs can also be processed overnight and ready for inspection the following morning. 1
Verifiable: Analysis results, including pseudo-color mark-up images, are stored with the image and can be reviewed for accuracy at any time. The mark-up image shows exactly how the algorithm performed and provides immediate confirmation that the algorithm results are correct. This information is always available for review at any point in a study. Repeatable: Analysis can be repeated for any digital slide and results compared in just a few minutes. Different parameter settings or algorithms can be run for comparison. You can even add or exclude regions and repeat the entire analysis. Calibration: No daily calibration procedure is required because every slide is calibrated as part of the automatic scanning procedure. A clear area is identified and scanned for each slide and is used to normalize camera gain and illumination. As a result, digital RGB values are easily converted to optical density. Pixel size is also measured at the factory for each supplied objective lens and is recorded in every digital slide image. Export of Result Data: Quantitative results for single slides, as well as for sets of slides, can be exported into a single data file, for further interpretation with popular tools like Excel (Microsoft). No file size limitations: No more file folders full of snapshots, each with their own set of analysis results that have to be collated and combined into a single result set. With the Aperio system, you get a single set of results for each slide, with a corresponding mark-up image. All results are readily viewed in an overlay fashion on the input slide image. Algorithms for IHC and Stain Intensity: The Aperio analysis package includes algorithms for Nuclear, Membrane, Micrometastisis, Staining Intensity quatification, and the ability to run ImagePro Plus macros on entire digital slides. These are highly advanced algorithms based upon morphological image processing methods. High Resolution: Standard 20X scanning (and even 40X) allows you to count and measure cellular structures, in addition to area and intensity of staining. Open Analysis Architecture: A Software Development Kit (SDK) is also available for customers who want to run their own C-code algorithms in the Aperio software framework. The SDK comes with a sample algorithm and instructions for how to port any algorithm into the Aperio framework. Only the core image processing logic needs to be supplied by the customer, the SDK supplies all of the other higher level logic for integration into the Aperio analysis architecture. Third-Party Tools Supported: Popular third-party tools, such as ImagePro Plus (MediaCybernetics) and Matlab (The Mathworks), are also supported. Existing ImagePro macros can be applied to entire digital slides and modules for viewing and extracting image data into the Matlab environment are available. TMA Software: The Aperio TMALab software is a powerful tool for managing and analyzing Tissue Micro Array digital slides. It allows you to segment a scanned TMA slide into a number of grids and assign row/column indices to each spot. All of the Aperio analysis capability is built into TMALab and all slide data are stored in a common database. Individual spot images can be extracted into standard tiff files and analysis results from groups of slides can be exported into a single Excel-compatible file for further inspection and reporting. Support After the Sale: The Aperio algorithm package includes support from Aperio s experienced technical staff, who will work with you to refine these algorithms to meet your particular analysis needs. In many cases, they can show you sample analysis results before you make your purchase decision. 2
ANALYSIS WORKFLOW Once your slide has been scanned, you will view the image and carry out the analysis steps using the Aperio ImageScope viewer. You can download ImageScope at http://aperio.com. This free product includes the Positive Pixel Count algorithm shown here. The workflow outlined in this section is for a single image being analyzed in real-time on the user s computer (local mode processing). Batch mode processing is discussed later in this document. Analysis generally consists of the following steps: 1. Viewing the image. The ImageScope viewer is used at this step to inspect the image. You can rapidly pan and zoom to any desired magnification, at any point on the slide image. A reference thumbnail image shows your current location on the slide. 2. Designating regions for analysis. This step can be skipped if every area of the slide is to be analyzed. However, there are typically regions of the slide that are relevant to the study and others that are not. At this step, you use the rectangle and pen tools to mark regions of the image to be included or excluded from analysis. 3. Selecting the appropriate analysis algorithm. Although algorithm parameters may typically be adjusted, these adjustments are done separately and saved for use at this step. The user simply selects the appropriate algorithm name from a pull-down menu. 4. Analyze. This is done by clicking on the Run button. A progress indicator will show percent completion and results will appear on the screen when it reaches 100%. 5. Review and confirm analysis results. Numerical results are difficult to confirm manually if manual analysis were easy, we wouldn t need the algorithm in the first place. For this reason, all algorithms return a pseudo-color markup image which provides a visual representation of the results. For example, in a nuclear counting algorithm, all nuclei that were counted are shown with a colored line around their border. In this way, the user can quickly determine whether any nuclei have gone undetected, or non-nuclear objects have been identified as nuclei. 6. Make modifications. If the outcome of the prior step revealed that the analysis results have significant errors, then modifications may be made to the algorithm input parameters or region selections and the analysis step repeated. This step typically only consists of including and excluding areas on the slide. If algorithm parameters need changing, experienced Aperio staff are ready to help you make the necessary adjustments. 7. Export Results. Whether you are conducting a study that contains a number of slides and want to conduct further analysis to look for correlations between specimens, or simply want a graphical representation of the numerical data, you can export the results to a common file format that can be imported into popular applications, such as Microsoft Excel. Figure 1 (top) shows a screenshot from the ImageScope viewer, in which a digital slide has been prepared for analysis. A thumbnail image shows how the current field of view relates to the entire slide image and is used for rapid navigation to the desired area. Using the magnification slider, the image can be quickly zoomed to the desired resolution. Three regions have been designated for analysis using the drawing tools a rectangular region, a free-hand region, and an exclusion region. The exclusion region provides a simple method for excluding portions of a region, which would otherwise be included in the analysis. Each region is also listed in the annotation window. The algorithm window is used to select the desired algorithm input parameters to the algorithm can be modified in this window, but typically are preset and imported for immediate use. Clicking the Run button on the algorithm window initiates the analysis and updates the screen as shown below (Figure 1, bottom). The annotation window is updated with numerical measurements and a pseudo-color markup image is superimposed on the original image. The markup image provides a visual representation of the numerical results. 3
Exclusion Region Thumbnail Image Magnification Slider Rectangle Region Free-Hand Region Algorithm Window Run Button Annotation Window Mark-up Image Results Figure 1 Screenshots from the ImageScope viewer, before (top) and after (bottom) analysis. 4
Although the screen resolution in Figure 1 is 5X magnification, the analysis was done at full 20X resolution for this image. This represents quite a large combined area of analysis. Double-clicking on any screen location in ImageScope will immediately display that area at full magnification (Figure 2). It is a simple matter to toggle between viewing the slide image and the markup image, by clicking on the Show/Hide Layer button on the annotation window. Figure 2 Image at 20X (left) with corresponding markup image (right). The markup image (Figure 2, right) shows how the strong, medium, and weak-brown pixels have been segmented and are colored red, orange, and yellow respecitvely. Numerical results may be viewed directly from the annotation window in ImageScope,or exported to a flat-file that can be imported into popular graphing and reporting programs, like Excel. Macros can also be written and saved in Excel to automate the graphing step (Figure 3). Figure 3 Analysis results are exported from ImageScope and graphed in Excel, using automated macros. 5
POSITIVE PIXEL COUNT ALGORITHM (PPC) Purpose: To measure area and intensity of staining for two-color slides. Four categories of staining are measured negative staining and strong, medium, and weak-positive staining. It is also useful for measuring percent positive by area and average intensity of positive staining. Methodology: An HSI color model is used to divide the color space into two classes, positive and negative. Thresholds are used to further divide the positive color class into three intensity ranges. This produces the four categories of staining. Each pixel that is stained is put into one of the four categories and total pixel counts are reported, along with average intensity, for each category. Default Settings: The default input parameters are for brown-positive and blue-negative slides. The example shown in the previous section (Figure 2) utilized the PPC algorithm to measure positive brown staining, which is the default behavior of the algorithm. However, PPC can be used in a variety of situations that may not be immediately obvious. For example, it can just as easily be used to measure the percent of fatty vacuoles in liver tissue. Fatty cells are mostly clear in bright-field and show up as circular white regions in Figure 4 (left). The surrounding tissue is typically purple. To use PPC in this situation, the input parameters are configured to measure two levels of positive staining (weak and strong) and a threshold is set to divide the two groups of pixels. The range of positive hue values is set to include the entire color circle, so that every pixel is considered positive and there are no negative pixels. Figure 4 Input Image (left) and markup image (right) for measuring fatty vacuoles in liver. For this application, the markup image is the twocolor image shown in Figure 4 (right), where all of the vacuoles are shown colored yellow and the tissue is colored dark red. The PPC algorithm results include pixel counts for both of these categories. The results have been exported to a standard text file and imported into Excel, to produce the graph shown in Figure 5. For this section of tissue, the percent, by area, of the fatty vacuole content is 16%. Comparing the markup image to the input image provides immediate confirmation that the algorithm performed properly in segmenting the fatty vacuoles. Figure 5 Pie chart graphic from Excel. 6
NUCLEAR ALGORITHM Purpose: To count and measure nuclei in brown-positive and blue-negative IHC-stained slides, such as ER, PR, and Ki67. Average size and staining intensity is also measured. It is most often used to determine the percentage of nuclei that have stained positive. Methodology: Positive (brown) and negative (blue) regions are segmented based upon color. Morphological operators (erosion, dilation, connected component labeling, and declusting) are used to further identify individual nuclei. Thresholds for object boundary detection can be set manually, or automatic threshold can be achieved based upon amplitude or edge statistics. Nuclei can be further discriminated from other objects based upon size and shape (roundness, compactness, elongation) this is particularly useful for automatically ignoring connective tissue (stroma) and other unwanted objects. The algorithm returns the number of positive and negative nuclei, as well as the average size and intensity of each type. The percent positive is calculated based upon both nuclear counts and area ratio. A markup image is generated in which the edges of positive nuclei are colored green and the edges of negative nuclear are colored blue. Default Settings: The default input colors are brown-positive and blue-negative. The default threshold method is manual and is set at 200, on a scale of 0 to 255 (black to white, respectively). Figure 6 Screenshot showing ER (left) and PR (right) analysis, using the Aperio nuclear algorithm. 7
An advantage of using the Aperio software is that multiple images can be viewed simultaneously. This is particularly useful for analyzing IHC-stained slides, where more than one assay has been prepared. This feature makes is easy to designate the same regions for analysis on each digital slide. The images can even be synchronized so that they pan and zoom together. Figure 6shows and two slides, ER and PR, which have been analyzed with the Aperio nuclear algorithm. Four equivalent regions of analysis have been designated on each slide. A magnified (20X) view of these two digital slides is shown below in Figure 7. With a few exceptions, the algorithm has done a very good job of separating clusters of nuclei. Notice that the boundaries of the negative nuclei are colored red and the negative nuclei are colored green. This makes it easy to verify at a glance that the algorithm has performed satisfactorily. Figure 7 Magnifiied (20X) view of ER (left) and PR (right) digital slides. Note that this field of view shows only a small portion of the overall area analyzed. The cumulative results (all four regions) for both ER and PR digital slides are shown in Figure 8, after having been imported into Excel for further inspection. The number of brown (positive) and blue (negative) nuclei are reported, as well as the percent positive by nuclear ratio (brown/total). Similarly, the pixel counts for the two groups of nuclei are also reported, along with a percent positive by pixel ratio. Average intensity and sizes are also reported. Figure 8 Numerical results for ER and PR nuclear analysis 8
TMALAB TMALab is a software application that has been specifically designed to make working with TMA slides more efficient. A Tissue Micro Array (TMA) is a single slide with an array of small circular tissue samples or cores, called spots. The logical grouping of the spots often has two levels. First, the spots are grouped into blocks or grids. Each grid is composed of a rectangular array in which a spot is referenced by row and column position. There are also reference spots. An example is shown below in Figure 9. Grid of Spots Row/Column Order Reference Spots Figure 9 TMA slide, showing grid structure and reference spots TMALab s key features include: Easily locate and identify individual cores within Tissue Micro Arrays. View TMA spots at high resolution with rapid panning and zooming. Organize spots into folders and identify by grid/row/column coordinates. Associate user defined data with each spot. Accurately and rapidly quantify TMA s using image analysis algorithms. Sort, Filter, and Group spots by data values. Export spot metadata for use in third-party applications like Excel. Export images for use in third-party applications. The very same viewing and analysis software available in ImageScope is used in TMALab. This means that you will be using familiar tools to accomplish the same analysis tasks performed on whole slides, only now you have the added structure which allows you to automatically perform that analysis on 9
a per-spot basis. Results for entire sets of spots can be exported in table format to a single text file for further analysis and graphing. Adding a TMA slide to TMALab TMA slides are scanned with the ScanScope just like any other slide and results in a single digital slide image. Adding a slide to TMALab involves the simple process of browsing to the location of that file on your network and adding it to the TMALab database. Only the file location is added to the database. Spot locations are stored in the database as a set of coordinates within the image file. This eliminates the need to create file folders full of snapshot images, one for each TMA slide and makes for efficient viewing of TMA spot images. Locating the Spots TMALab provides an easy-to-use graphical tool for identifying the structural layout of spots on a TMA slide. Figure 10 shows a group of spots that have been identified as Grid A. To accomplish this task, the user simply clicks the New Grid button and identifies the corner spots of the new grid, along with the number of columns and rows. Clicking the Center Spots button will automatically find the spot locations of the actual spots. Actual alignment of spots does not need to be perfect for the automatic procedure to work well and manual adjustments are also possible. Reference spots (if present) are also added and this process is repeated for each grid. All this is accomplished in just a few minutes. Figure 10 Screenshot of TMALab s graphical tool for locating spots on a digital slide. 10
Viewing the Spots Images A file-folder structure is used to store the TMA slides. You can organize your slides using this structure in a manner appropriate to the type of work you are doing. When you first open the TMALab application, you will see this folder structure and you can navigate to the slide/spot that you wish to work with. Figure 11 shows a spot that was located in the previous section. It is located in the Example TMA folder and is highlighted in the data table as Grid A, Row 8, Column 9. A thumbnail is also shown of the entire slide, with a small rectangle with shows the location of this particular spot. The magnification of the spot image can be changed by dragging the Zoom slider. Folder Location Data Table Zoom Slider Thumbnail Figure 11 Screenshot of TMALab application, showing folder structure, data table, and spot that is selected for viewing and analysis. You can type directly into the data table, to enter a manual score, for example. Other operations on a spot can be selected from the pull-down menus, or by right-clicking on a spot and selecting the desired operation from a list that will appear on the screen (Figure 12). You may open spot images in separate windows, export the spot images to tiff files, export spot meta-data, analyze the spots using the Aperio analysis tools, delete spots, and move the spots to other folders within the TMALab folder structure. Figure 12 Menu produced by right-clicking on data table. 11
Analyzing Spots Images Right-clicking on one or more spots and selecting the Analyze Selected Spots option will allow you to select an algorithm to use for analysis. Each algorithm has a set of input parameters that you can either modify, or load a previously saved configuration. In most cases, users will load pre-defined configuration settings. You may also elect to save all or a subset of the analysis results in the data table. A single click of the Start button will cause all of the selected spots to be analyzed in order and the results to be stored in the data table. There is no limit to the number of spots that can be queued for analysis, which means you can analyze entire slides, or even sets of slides, at one time. However, TMA analysis takes place on the client machine (your machine), which means that you must wait for the analysis to complete before you can review the results. Analysis of each spot typically takes only a few seconds. Figure 13 Screenshot showing analysis results for Row1, Column 4 of Grid A. Figure 13 is a screenshot showing the TMALab application after analyzing all of the spots in Row 1 of Grid A. The data table shows the new fields that have been populate with the analysis results. The spot image for Column 4 is shown with the markup image overlaid. The Positive Pixel Count algorithm has been applied and the colors correspond to levels of staining as described earlier in this document. In this example, the entire spot has been analyzed, but it is also possible to use the pen tools to include specific regions for analysis and exclude others. This can be important for cases where only certain tissue types contained within the TMA are to be analyzed. 12
Figure 14 shows a graph, using Excel, of the staining percentages for the Row 1 results. These results were exporting by highlighting the spots for Row 1 in the data table, and selecting Export Selected Spot Data. It is easy to see from the graph that spots 1, 5, 6, 9, and 10 had very little positive staining. The ability to selectively export large blocks of analysis data and graph the results using familiar software tools, like Excel, is a powerful capability that is offered by TMALab. Figure 14 Staining percentages for Grid A, Row 1, using Excel. Exporting Spot Images Spot images can also be exported to as individual tiff files, for use in other applications. This is done by highlighting the set of spots and selecting Export Selected Spot Images. Figure 15 shows the folder containing the tiff files for spot images in Grid A, Row 1. Notice the naming convention that embeds the grid/row/column into the file name. Figure 15 Screenshot of file folder containing tiff files of individual spot images. 13