SciELO - Scientific Electronic Library Online

 
vol.44 número3Adaptive simulation of the internal flow in a rocket nozzle índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Latin American applied research

versión On-line ISSN 1851-8796

Lat. Am. appl. res. vol.44 no.3 Bahía Blanca jul. 2014

 

A local adaptive threshold approach to assist automatic chromosome image segmentation

V. Calzada-Navarrete and C. Torres-Huitzil

Laboratorio de Tecnologias de Información, CINVESTAV-Tamaulipas, México. vcalzada@tamps.cinvestav.mx, ctorres@tamps.cinvestav.mx

Abstract— In cytogenetics, karyotype analysis is used to assess the presence of genetic defects by visualization chromosomes structure from microscopic images. A key step in this process is image thresholding, used to detect and extract objects of interest from background, as it affects the performance of further processing steps in image analysis. In this paper, an adaptive local thresholding for Q-band chromosome image segmentation is presented. A re-threshold process based on the Sauvola's local adaptive technique is applied to extract chromosomes from background. Local adaptive histogram equalization is added between thresholding steps to enhance chromosome segments to reduce the chances of pixel misclassification. The proposed thresholding approach provides 93 % of precision, which is better than other similar approaches when evaluated on a reference image dataset.

Keywords— Local Adaptive Thresholding; Suavola Algorithm; Re-threshold; Human Chromosomes; Karyotyping.

I. INTRODUCTION

Computer assisted methods for human chromosome analysis are essential tools in cytogenetic laboratories since chromosome images provide crucial information about health of an individual. However, manual image analysis performed by experts has been proved to be time-consuming, laborious and yet error-prone task. Chromosome analysis is done on dividing cells in their metaphase stage. During metaphase, chromosomes are stained to become visible and imaged by a microscope. In a normal, nucleated human cell, there are 46 chromosomes represented in the clinical routine by a structure called karyotype (Ji, 1994). The main goal is to identify every chromosome in the image and to assign it to one of 22 classes and a two sex chromosomes, XX in females and XY in males. A small deviation from this number of chromosomes and/or chromosomes structure, might be considered as an evidence of physical abnormalities. Chromosome images are inherent to partial occlusion, overlapping and touching of chromosomes. These are some of the major factors hindering automatic analysis (Zhao and Kong, 2013).

The first step in chromosome images analysis is the object extraction that represent chromosomes or chromosomes clusters from background with high confidence. Image binarization is the process that converts gray-level or color images into binary ones in order to extract objects of interest from background regions by finding and applying an appropriate threshold for image pixels. Binarization or thresholding is an important step that affects the performance of further processing stages in the overall flow of image analysis for karyotyping, see Fig. 1. In spite of its conceptual simplicity, thresholding for chromosome image segmentation faces some challenges such as non-uniform illumination, high variability in chromosomes and background fluorescence, banding patterns and blurred boundaries of chromosomes, and presence of staining debris (Ji, 1994; Grisan et al., 2009), which have to be overcome for a successful segmentation.

Figure 1: A general overview of the processing flow for chromosome image segmentation, which consists of two main subtasks: object-background separation and chromosome disentangling.

One of the most widespread used technique for initial segmentation is based on a global threshold estimated by the Otsu method (Otsu, 1979). Due to chromosome intensities are brighter than the neighboring background, although the background surface is not globally uniform, the Otsu method is effective. However, as chromosomes have high variability in intensity, band patterns, global thresholding is not a suitable method as it can break a single chromosome into two parts or tend to cluster touching chromosomes. To overcome this problem Grisan et al. (2009) partitioned the input image into tiles and then applied the Otsu method locally on each tile. In Ji (1994) a global threshold with local re-thresholding scheme was proposed to provide a two-step coarse-fine segmentation. Recently in Poletti et al (2012), authors carried out an study to assess and compare the performance of several algorithms for human chromosome segmentation on a Q-band prometaphase image dataset. Although every algorithm has strong/ weak points, local methods have shown better performance than other categories, because of their implicit capability of adapting to the background variation. Recall that local thresholding methods find a threshold for each image pixel based on local characteristics and statistics of pixels within a neighborhood centered around a given pixel (Trier and Jain, 1995). When object's intensity is brighter than the neighboring pixels, adaptive thresholding is an effective segmentation method. However, when a number of pixels in the foreground are darker than neighboring foreground pixels, adaptive thresholding creates holes inside the chromosome.

Based on the capabilities of local adaptive thresholding and the idea of re-thresholding proposed in Ji (1994), in this paper, an automated local adaptive re-threshold approach for human chromosome segmentation is proposed, which outperforms previously reported methods according to standard metrics. The rest of the paper is organized as follows. In section II the material and methods used in this work are presented. By utilizing the Sauvola's method and adaptive histogram equalization, a re-threshold method to extract chromosome and chromosome clusters from background is presented. In section III, experiments carried out to validate the proposed approach are presented and a comparison with other reported methods as well. Finally, section IV presents the concluding remarks and briefly outlines future work.

II. MATERIAL AND METHODS

A. Image dataset

The image dataset used in this work was provided by researchers of the Laboratory of Biomedical Imaging, University of Padova, Italy, publicly available for download at http://bioimlab.dei.unipd.it. The image dataset is composed of 162 images with PAL resolution (768×576 pixels, 8 bits per pixel) acquired from 117 cells, from amniotic fluid and choroidal villi, at the prometaphase stage, containing a total number of 5474 chromosomes. The images were acquired from both healthy and pathologic cells, and no selection based on quality or homogeneity was performed (Grisan et al, 2009; Poletti et al., 2012). According to authors, the images do not necessarily contain a whole set of 46 chromosomes, as it may happen in routine laboratory acquisition that the whole set may be spread over different images. In order to obtain a set of ground truth images as a reference, authors manually annotated 37 images of the whole dataset, which take binary values: 0 for pixels belonging to the background and 1 for pixels that belong to a chromosome (with no distinction between single chromosomes or clusters of chromosomes). Images belong to slightly different stages of the prometaphase and were acquired with different illumination conditions and camera magnification. They contain both overlapping and touching chromosomes, on average 5.7 and 12.1 per karyotype, respectively (Grisan et al, 2009; Poletti et al, 2012).

B. Sauvola's algorithm

Sauvola's method (Sauvola and Pietikainen, 2000) is a local adaptive threshold scheme that takes a gray-scale image as input. This method adapts the threshold t(x, y) for each pixel according to equation:

(1)

where m and s are the mean and local standard deviation computed in a window of size w centered on the current pixel, respectively. R is the dynamic range of standard deviation (R= 128 for 8-bit gray level images) and k is a user-defined parameter usually set to 0.5. The parameter k controls the value of the threshold in the local window such that the higher the value of k, the lower the threshold from the local mean m(x,y). Sauvola's method has been widely used in document binarization as a first step in document analysis like page segmentation or optical character recognition. Comprehensive evaluations of thresholding techniques for document binarization (Trier and Jain, 1995) show that Sauvola's method works best among the local thresholding techniques. In spite of its advantages, Sauvola's method suffers from different limitations (Lazzara and Thierry, 2013), being relevant the following ones. Low contrasted objects may be considered as textured background or show-through artifacts, thus they may be removed or partially retrieved. Furthermore, textures are really sensitive to window size but it is suggested that using a large window may improve the binarization results on textured objects. However, in case of both small and large objects are in the same image, Sauvola's method will not be able to retrieve all objects correctly. On the other hand, too large windows may include data from objects of different nature. In case of a too small window, statistics inside portions of objects may behave like in background: pixels values are locally identical. Since Sauvola's method relies on the fact there is a minimum of contrast in the window to set a pixel as foreground, it is unable to make a proper threshold choice.

C. Adaptive histogram equalization

Although rapid progress has been made in developing techniques and tools for acquiring chromosome images, the challenges posed by variations in illumination or low contrast makes the task of general chromosome segmentation difficult to solve. Low contrast images could occur due to reasons such as poor or non-uniform lighting condition, nonlinearity or small dynamic range of the imaging sensor, i.e., illumination is distributed non-uniformly within the image. Thus, it is necessary to improve the image contrast to provide a better representation for subsequent image analysis steps. An effective technique called contrast-limited adaptive histogram equalization (CLAHE) is utilized for contrast enhancement by limiting histogram equalization (Pizer et al., 1987). CLAHE uses a small window that slides through every image pixel sequentially, and only pixels within the current position of the window are histogram equalized; the gray-level mapping for enhancement is done only for pixels of the window.

D. Proposed method

In this work, a local adaptive re-threshold scheme, based on the Sauvola's method and CLAHE, is proposed for the first step in chromosome segmentation from Q-band images. Due to some of the limitations of Sauvola's method cited above, it is proposed an hybrid solution that takes advantage of two different window sizes for statistical measures computation: a rather small window is used to retrieve the global distribution of small and large objects, and a larger one to get local fine details applied to a contrast enhanced image. The first step in the proposed method is an initial segmentation by thresholding. This step is conservative as it usually segments a large proportion of the chromosomes and group touching chromosomes together into clusters. After the first binarization, the original image is masked to extract the potential image pixels that are part of chromosomes. Then a contrast enhancement is applied to the resulting image in order to maximize local contrast and to delineate better the chromosomes. The second threshold step assumes that the first binarization has retrieved all the objects parts and that objects components are correctly grouped. Thus, Sauvola is applied once more but with a different window size in order to fit appropriately to the local contents of chromosomes. With this technique, clusters of touching chromosomes might be separated but single chromosomes are not divided into pieces as most thresholding methods do. Obtaining an accurate binary image that is representative of the geometric characteristics of the chromosome object is of critical importance for further steps in image segmentation.

The proposed method is conceptually shown in Fig. 2 and it can be summarized in the following steps:

  1. Apply Sauvola to the original gray-level image of the chromosomes in order to produce and initial segmentation.
  2. Use the initial binarized image as a mask to extract pixels that potential represent chromosomes or chromosome clusters.
  3. Image intensity values are adjusted to enhance contrast by using CLAHE.
  4. Apply Sauvola to obtain a more accurate binary image using a different window size.
  5. Morphological opening and closing operations are used to close any possible hole, and remove artifacts that are present after re-thresholding.


Figure 2: Local adaptive thresholding for chromosome image segmentation.

Note that in the last step morphological operators are used to remove some artifacts in the resulting image. Detection and deletion of nuclei is performed by observing that objects being round and large are evidence that indicates that an object is a nucleus. Additionally, debris in the input image, caused by stained cytoplasm or other stained particles, is recognized as a very small object Ji (1994). It should be pointed out that the proposed thresholding approach was designed and implemented after experimentation on the available data. In the following section, experimental results are presented and discussed.

III. EXPERIMENTAL RESULTS

A. Parameter adjustment and measures

The main goal of experimentation is to find the best parameters for which the re-threshold scheme is able to decide correctly whether a pixel belongs to the foreground or to the background from images in the dataset. As stated earlier, this method is designed to be part of a whole chromosome image processing chain as illustrated in Fig. 1. In the current implementation, for the initial threshold step a window size of 9× 9 pixels and k=0.2 were used for the Sauvola's method, whereas for the second threshold a window size of 15×15 and k=0.2 were used. For adaptive histogram equalization, by using CLAHE, the image was divided into tiles of 10×10 pixels with 128 bins for the histogram. Figure 3 shows some intermediate results of applying the proposed re-threshold method with these parameters to an input image. An erosion with circular structuring element of radius of 13 pixels was used to identify and remove nuclei, whereas a morphological opening with square structuring element of 3 pixels to remove debris and small objects.

Figure 3: Some intermediate results of the proposed method for chromosome extraction from background.

In order to quantitatively evaluate the proposed method, a set of common statistical measures of the performance of a binary classification test were used, namely, precision, accuracy, recall and F-score. These measures can be computed in terms of true positive (TP), true negative (TN), false positive (FP), and false negative (FN) according to the following equations:

(2)
(3)
(4)
(5)

where PPV stands for positive predictive value. TP refers to foreground pixels correctly labeled as foreground. FP refers to background pixels incorrectly labeled as foreground. FN refers to foreground pixels incorrectly labeled as background. Finally, TN refers to background pixels correctly labeled as background.

Sensitivity and PPV, also known as recall and precision, respectively, are of great relevance in chromosome image segmentation. Sensitivity accounts for the ability to correctly identify foreground pixels as it is defined as the ratio of the number of chromosome pixels identified as such over the number of ground truth chromosome pixels. PPV represents the number of actual chromosome pixels identified as such over the total number of pixels that the algorithm identified as chromosomes. Thus PPV is more useful than specificity in assessing algorithm performance when the number of the pixels belonging to the two classes is highly unbalanced toward the negative class, i.e., the number of pixels belonging to chromosomes are less than background pixels; specificity for all methods would saturate towards a meaningless 0.99 (Poletti et al, 2012).

B. Evaluation and comparison

The percentage of correct pixels successfully extracted by the method is measured using the sensitivity and PPV measures on 37 Q-band images from the dataset yielding the indicators summarized in Table 1, along with results for other thresholding methods used as baseline for comparison purposes. Results shown in Table 1 are taken and adapted from those presented in (Poletti et al, 2012), where methods were classified in three categories: the global methods provide a threshold that is constant all over the image, the local methods provide a space-variant threshold, and the multi-threshold identify a number of grey-level intervals separated by different thresholds.

Table 1: Performance on pixel classification for some thresholding methods used for chromosome image segmentation, adapted from Poletti et al (2012).

The AdT method is based on the interpolation of sampled local Otsu thresholds computed over image tiles, RBLS is an algorithm based on the minimization of the energy of a curve that fits the contour of the objects of interest, PSO-NM is based on the fitting of the histogram with Gaussian mixture model via simplex + PSO (Particle Swarm Optimization), and PSO-EM is based fitting of the histogram with Gaussian mixture model via Expectation-Maximization + PSO. See (Poletti et al, 2012) for a survey, description, and a discussion on performance of these algorithms for chromosome image segmentation.

Note that the proposed method provides both competitive sensitivity and PPV compared to other methods. Each image in the dataset, together with classification results compared to the ground truth images, were analyzed and recorded. The F-score was used to compare the proposed method with others using a single metric. From results presented in Table 1, it is clear that the proposed algorithm yields superior results compared to other known algorithms. It provides an accurate description of chromosome location in order to reduce the computational complexity of the disentanglement step in image chromosome segmentation (Ji, 1989, Legrand et al., 2008; Arachchige et al., 2013). To obtain a global view of the performance of the proposed approach with respect to the Sauvola's algorithm parameters, some experiments were carried out with different values of k, and the window sizes in the first and second stages, w1 and w2, respectively. Figure 4 summarizes the performance plotted in terms of PPV against sensitivity, for w1 =7.19, and k =0.2 - 0.4 by increments of 0.025. Better performances are obtained when w2 is larger than w1 and the k value is closer to 0.2.


Figure 4: Plot of Positive Predictive Value against Sensitivity obtained for different values of the window size and k parameter for the Sauvola's method.

C. Execution time

In spite of the computational speed of the implementation of the proposed method is not the main focus of this work, it should be taken into account for practical applications related to chromosome image analysis. To efficiently implement local adaptive thresholding, integral images were used as in intermediate representation from wich statistical measures can be computed in a local neighborhood. The use of integral images in binarization algorithms was first introduced by Shafait etal. (2008) allowing local thresholding methods, such as Sauvola's method, to run in time close to global methods. The current implementation uses integral images to compute both means and variances which respectively need local sums and squared sums. It was carried out on a MacBook Pro with an Intel Core i7 2.66 GHz processor and 4 GB main memory in ANSI C without multithreading and compiled using gcc with O3 optimization flag set. The computation time for the complete thresholding approach is around 326 millisecond. This computing time shows that the method can be computed within practical time constraints.

IV. CONCLUSIONS

In this paper a re-threshold approach for chromosome image segmentation that improves results of other methods has been presented. The proposed method is based on the Sauvola's method and adaptive histogram equalization. To show the benefits of the proposed approach, Q-band chromosomes images from a dataset were evaluated. Results shown an improvement in segmentation performance, according to common evaluation measures, compared to other methods. Sauvola's method is probably one of the most used methods applied in documents binarization but at least the evaluation carried out in this work has shown that it still competes with current thresholding methods used for chromosome image segmentation. The proposed method provides an accurate description of chromosome location that potentially reduces the computational complexity of subsequent stages of karyotyping. The current efforts are towards chromosome disentangling, i.e., the automated separation and classification of touching or overlapping chromosomes, using a geometrical approach as suggested by some previous works. Preliminary results show the suitability of the proposed re-threshold approach in the first stages of chromosome segmentation.

ACKNOWLEDGMENTS
Authors kindly acknowledge the partial support received from CONACyT, Mexico, through the research grant No. 99912. Also, authors want to thank Dr. Enrico Grisan for having kindly provided the ground-truth from chromosome images.

REFERENCES
1. Arachchige, A.S, J. Samarabandu, J.H.M. Knoll and P.K. Rogan, "Intensity Integrated Laplacian-Based Thickness Measurement for Detecting Human Metaphase Chromosome Centromere Locations," IEEETrans. Biomed. Eng., 60, 2005-2013 (2013).
2. Grisan, E., E. Poletti and A. Ruggeri, "Automatic segmentation and disentangling of chromosome in Q-band prometaphase images," IEEETrans. Inf. Technol. B., 13, 575-581 (2009).
3. Ji, L., "Intelligent splitting in the chromosome domain," J. Pattern Recognit., 22, 519-532 (1989).
4. Ji, L., "Fully automatic chromosome segmentation," J. Cytom., 17, 196-208 (1994).
5. Lazzara, G. and G. Thierry, "Efficient multiscale Sauvola's binarization," J. Doc. Anal. Recognit., 1-19 (2013).
6. Legrand, B., C.S. Chang, S. Ong, S. Neo and N. Palanisamy, "Automated identification of chromosome segments involved in traslocations by combining spectral karyotyping and banding analysis," IEEE Trans. Syst. Man Cybern., Part A-Syst. Hum., 38, 1374-1384 (2008).
7. Otsu, N., "A threshold selection method from gray-level histograms," J. IEEE Trans. Syst., Man Cybern., 9, 62-66 (1979).
8. Poletti, E., F. Zappeli, A. Ruggeri and E. Grisan, "A review of thresholding strategies applied to human chromosome segmentation," J. Comput. Meth. Programs in Biomed., 108, 679-688 (2012).
9. Pizer, S.M., E.P. Amburn, J.D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. Romeny, J.B. Zimmerman and K. Zuiderveld, "Adaptive histogram equalization and its variations," J. Comput. Vis., Graph. Image Process., 39, 355-368 (1987).
10. Sauvola, O.D. and M. Pietikainen, "Adaptive document image binarization," J. Pattern Recognit., 33, 225-236 (2000).
11. Shafait, F., D. Keysers and T.M. Bruel, "Efficient implementation of local adaptive thresholding techniques using integral images," Proc. SPIE 6815, Document Recognition and Retrieval XV, 1-6 (2008).
12. Trier, J. and A.K. Jain, "Goal-directed evaluation of binarization methods," J. IEEE Trans. Pattern Anal. Mach. Intell., 17, 1191-1201 (1995).
13. Wang, Y.P., Q. Wu, K.R. Castleman and Z. Xiong, "Chromosome image enhancement using multiscale differential operators," IEEE Trans. Med. Imaging., 22, 685-693 (2003).
14. Zhao, Y. and S E. Kong, "Automated classification of touching or overlapping M-FISH chromosomes by region fusion and homolog pairing," J. PatternAnal. Applic., 16, 31-39 (2013).

Received: November 22, 2013
Accepted: February 11, 2014
Recommended by Subject Editor: Jorge Solsona