Potential of Computer-Aided Diagnosis to Improve CT Lung Cancer Screening-FqU.pdf

(2616 KB) Pobierz
640071289 UNPDF
136
IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 2, 2009
Potential of Computer-Aided Diagnosis to
Improve CT Lung Cancer Screening
Noah Lee , Student Member, IEEE , Andrew F. Laine , Senior Member, IEEE , Guillermo Márquez,
Jeffrey M. Levsky, and John K. Gohagan
ABSTRACT— The development of low-dose spiral computed tomog-
raphy (CT) has rekindled hope that effective lung cancer screening
might yet be found. Screening is justified when there is evidence that
it will extend lives at reasonable cost and acceptable levels of risk. A
screening test should detect all extant cancers while avoiding unnec-
essary workups. Thus optimal screening modalities have both high
sensitivity and specificity. Due to the present state of technology, ra-
diologists must opt to increase sensitivity and rely on follow-up di-
agnostic procedures to rule out the incurred false positives. There is
evidence in published reports that computer-aided diagnosis tech-
nology may help radiologists alter the benefit–cost calculus of CT
sensitivity and specificity in lung cancer screening protocols. This
review will provide insight into the current discussion of the ef-
fectiveness of lung cancer screening and assesses the potential of
state-of-the-art computer-aided design developments.
INDEX TERMS— Computer-aided diagnosis, lung cancer
screening, machine learning, receiver operating characteris-
tics.
Lung cancer screening is the presumptive identification
of unrecognized malignant tissue in high-risk asymptomatic
individuals. Screening may include medical examinations such
as sputum cytology (SC) [15], [79], chest X-ray (CXR) [15],
low-dose spiral computed tomography (LDCT) [8], [15], [31],
[46], [68], gene expression tests [88], and accompanied com-
puter-aided diagnosis (CADx) and detection (CADe) schemes
[81]–[87]. Current CAD technology shows potential to im-
prove CT lung cancer diagnosis, yet the question of whether
state-of-the-art screening technology can decrease mortality
rate is inconclusive [1], [3]–[5], [9]–[15], [20], [64], [67],
[69]–[75], [77], [80], [86]. To address the mortality question
by direct comparison of LDCT technology with CXR, several
clinical trials are active or planned [1]. The largest of these and
most advanced is the National Lung Screening Trial [6] (NLST)
under the direction of U.S. National Cancer Institute in which
the targeted 50 000 asymptomatic former or current heavy
smokers were randomized to receive an initial and three annual
screens by LDCT or CXR. Conclusive results are expected to
be available by 2011. Outside the United States, randomized
controlled trials include the ITALUNG trial, 4 NELSON, 5 and
the UK Lung Cancer Screening Trial (UKLS). 6 Other clinical
trials [90] 7 lack current gold standard clinical research guide-
lines to perform randomized controlled studies for objective
evaluation of the superiority of a medical intervention for lung
cancer screening.
In this paper, we take a top-down approach in discussing the
potential of CAD for lung cancer screening. We hypothesize
that state-of-the-art CAD technology has the potential to im-
prove LDCT lung cancer screening, but the technology needs
comparative and careful assessment with respect to clinically
relevant performance measures [7], [103]–[105], [118]. We will
provide insight into the current effectiveness of LDCT lung
cancer screening and assess state-of-the-art CAD developments
in commercial and academic research. In this context, this
paper branches into three main components: 1) an overview of
CAD and associated image-processing methodologies to aid
the diagnostic decision process in lung cancer diagnosis; 2) the
various evaluation criteria, in particular, high sensitivity and
high specificity, in order to assess the potential of CAD for lung
cancer screening; and 3) the integration of CAD into clinical
practice.
and poor, men, women, and children. Lung cancer is the
leading cancer killer in the United States, with 1.3 million deaths
worldwide annually [76]. Estimates for 2008 were for 215 020
new lung cancer cases and 161 840 deaths from lung cancer in
the United States [78]. 1 In 2009, the estimated new cases and
deaths are 219 440 and 159 390, respectively. 2 More than 75%
of lung cancers are diagnosed in advanced stages. The average
five-year survival rate after lung cancer diagnosis is about 15%.
If lung cancer is detected at its earliest stage, the five-year sur-
vival rate can reach 70% [14], [16], [19]. Approximately $9.6
billion are spent in the United States each year for lung cancer
treatment. 3 These figures call for effective cancer control and
prevention strategies such as lung cancer screening programs.
Manuscript received February 18, 2009. First published October 16, 2009;
current version published December 09, 2009.
N. Lee and A. F. Laine are with the Heffner Biomedical Imaging Lab, Depart-
ment of Biomedical Engineering, Columbia University, New York, NY 10027
USA (e-mail: nl2168@columbia.edu; laine@columbia.edu).
G. Márquez is with the Early Detection Research Group, National Cancer
Institute, Bethesda, MD 20892 USA (e-mail: marquezg@mail.nih.gov).
J. M. Levsky is with the Division of Cardiothoracic Imaging, Department of
Radiology, Montefiore Medical Center and Albert Einstein College of Medicine,
Bronx, NY 10467 USA (e-mail: jlevsky@montefiore.org).
J. K. Gohagan is with the Basic Prevention Sciences Research Group, Na-
tional Cancer Institute, Bethesda, MD 20892 USA (e-mail: gohaganj@mail.nih.
gov).
Digital Object Identifier 10.1109/RBME.2009.2034022
4 http://www.cspo.it.
5 http://www.nelsonproject.nl/.
6 http://www.hta.ac.uk/1752.
7 http://clinicaltrials.gov/ct2/show/NCT00963651.
1 http://www.cancer.gov/cancertopics/types/lung (2008).
2 http://www.cancer.gov/cancertopics/types/lung (2009).
3 http://progressreport.cancer.gov.
1937-3333/$26.00 © 2009 IEEE
AUTHORIZED LICENSED USE LIMITED TO: IEEE XPLORE. DOWNLOADED ON MAY 13,2010 AT 11:46:08 UTC FROM IEEE XPLORE. RESTRICTIONS APPLY.
I. I NTRODUCTION
C ANCER affects everyone—the young and old, the rich
LEE et al. : POTENTIAL OF COMPUTER-AIDED DIAGNOSIS TO IMPROVE CT LUNG CANCER SCREENING
137
II. H ISTORICAL D EVELOPMENT OF CAD
The first concept of CAD was introduced half a century ago
[17], [18], where Lusted talked about automated diagnosis of ra-
diographs by computers in 1955 [77]. Early attempts to build a
CAD system were initiated in the early 1950s. Several decades
of research passed until this dream bared fruit in 1998 for the
first commercial mammography CAD system [77] approved by
the FDA. Large-scale systematic research began in 1980, but
new automated systems were not immediately successful [89].
Development began from an initial concept of a fully automated
computer diagnosis to a computer-aided diagnosis, where the
human relies on the machine as a second reader. Since then,
many CAD applications have been developed to help radiolo-
gists interpret images [77], [89]. CAD remains a major research
subject, and many CAD systems and applications have been pro-
posed. Currently, the major application of CAD involves breast
cancer [71], lung cancer [90]–[98], colon cancer, and prostate
cancer treatment. They have become part of the routine clin-
ical work for detection of breast cancer in some clinics [89].
Starting from ad hoc and heuristic approaches [24], [25], [29],
CAD technology moved to sophisticated machine-learning and
data-mining techniques [82], [93], [98], [106]–[109]. In recent
years, sophisticated machine-learning schemes have been devel-
oped [21]–[23], [90], [110] and entered the field of automated,
semiautomated, and interactive CAD systems [82], [90]. Ma-
chine learning for CAD has become one of the principal re-
search areas in medical imaging and diagnostic radiology. The
reported literature [2], [24], [25], [27], [29], [91], [92], [98], [99]
gives evidence that current CAD schemes as a second reader
opinion often outperform manual grading performance of ex-
perts alone. Nishikawa and Doi provide an in-depth review of
the historical and current developments of CAD from a clin-
ical perspective [77], [89]. An in-depth review of CAD method-
ologies for lung cancer is described by Sluimer et al. [54] and
Chan et al. [86].
image retrieval and search. The field is advancing quickly, with
new CAD schemes being developed and investigated for the task
of lung cancer diagnosis and detection [106]–[109]. In what fol-
lows, we will give a brief snapshot of current CAD schemes for
the three areas as well as present day and future areas of inves-
tigation that are being made.
A. Lung Tissue and Regions of Abnormality Discrimination
The task of discriminating lung tissue and abnormal lung re-
gions involves the analysis of large thoracic three-dimensional
CT image datasets (see Fig. 1). Images containing diffuse abnor-
malities have been especially problematic in nodule screening
due to partial volume effects and ambiguous image artifacts,
making the distinction between nodule tissue and abnormal
lung tissue difficult. Early approaches defined lung boundaries
and used thresholding methods to differentiate between vessels
and nodules [24]–[26]. Subsequent approaches attempted to
remove ambiguous structures by comparisons between neigh-
boring slices [27], while others applied true three-dimensional
algorithms [28]. The notion of subtracting known anatomical
structures to simplify the detection and classification task
was suggested by the work of Mori et al. [29]. The authors
described a method for the automated anatomical labeling of
the tracheobronchial tree extracted from three-dimensional CT
data and its application to virtual bronchoscopy. Proposed work
in this area is manifold and provided discrimination results can
be grouped to binary two-class discrimination [98] and finer
multiclass discrimination [91] into respective tissue types.
Depeursigne et al. [91] presented a texture classification
system for lung tissue multiclass classification into five dif-
ferent lung tissue patterns, (i.e., healthy, emphysema, ground
glass, fibrosis, and micronodules). They used overcomplete
wavelet frames combined with gray-level histogram features
and obtained a classification accuracy of 92.5%. Classification
was performed using k-nearest neighbor (KNN). In 2008, the
authors reported a system [92] that integrated additional clinical
context information to perform lung tissue classification with
further 8% performance improvement compared to [91] using
an optimized support vector machine (SVM).
Arzhaeva et al. [98] proposed a system for the localization of
interstitial lesions in chest radiographs. The system used a two-
class supervised classification approach to distinguish between
normal and diseased texture. Texture analysis was performed
by multiscale Gaussian filter banks, linear discriminant analysis
(LDA), and an SVM classifier. They evaluated the method on 44
abnormal and eight normal cases with an area under the ROC
curve (AUC) value of 78%.
Kato et al. [111] presented a bag of features approach for lung
tissue multiclass classification in diffuse lung disease to clas-
sify disease patterns with inhomogeneous texture distributions
within a region of interest (ROI). They use a scale-invariant fea-
ture transformation descriptor over many ROI samples for local
feature extraction and to account for translation and rotation in-
variance. The authors report a classification accuracy of 92.8%
using 1109 ROIs from 211 patients.
III. CAD O VERVIEW FOR CT L UNG C ANCER S CREENING
CAD systems for lung tissue discrimination, nodule discrimi-
nation, and nodule characterization are increasingly being used
as a second reader to aid the diagnostic decision process and
to reduce the number of overlooked lung cancers. There is ev-
idence in published reports that CAD technology may help ra-
diologists alter the benefit–cost calculus of CT sensitivity and
specificity in lung cancer screening protocols to the benefit of
patients and radiologists alike [85]. Current CAD schemes in-
clude lung tissue discrimination [21]–[29], nodule detection and
classification [34], [36], [39], [43]–[46], [82], [87], [97], inter-
stitial disease detection, differential diagnosis of interstitial dis-
ease, distinction between benign and malignant pulmonary nod-
ules [93], [94], and estimation of malignancy potential as well
as growth measurement [55]. CAD in this context has improved
since 2000, but major challenges persist in three areas: 1) dis-
crimination of lung tissue and regions of abnormality ;2) nodule
detection and classification ; and 3) nodule characterization and
growth measurement .
The pool of existing CAD systems and approaches is broad,
ranging from hybrid image-processing systems [37] including
registration [114], [115] and segmentation [116], [117] to lung
B. Lung Nodule Discrimination
Lung nodule discrimination consists of two main com-
ponents: a) nodule detection and b) nodule classification .
AUTHORIZED LICENSED USE LIMITED TO: IEEE XPLORE. DOWNLOADED ON MAY 13,2010 AT 11:46:08 UTC FROM IEEE XPLORE. RESTRICTIONS APPLY.
138
IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 2, 2009
Fig. 1. From CT lung data to lung cancer diagnosis. (Top) CT lung dataset from the LIDC database with several hundred slices. (Middle) True positive nodules
with different characteristics (solid, spiculated, and low contrast) surrounded in red. (Bottom) False positive nodules surrounded in yellow.
Recently, a comparative CAD assessment 8 was performed on
the NLST data through standardized databases such as the
Lung Image Database Consortium (LIDC) for the first time.
The results of this assessment have not been published yet.
The European counterpart for comparative CAD assessment is
the NELSON and ANODE09 study. 9 Other openly available
datasets include the database of Lung Test images from Motol
Environment (Lung TIME) [113]. The Lung TIME database
consists of 157 CT scans with 394 annotated nodules of various
types, including solitary, regular, irregular, pleural, and vessel
attached nodules.
In the last decades, a large body of research has been re-
ported in the field of lung nodule detection and classification
[2], [31]–[39], [43]–[46], [82], [87], [97], [112]. A central
concern in nodule detection is the high rate of false positives
when sensitivity is increased to detect subtle nodules. A nodule
is deemed a false positive result if it led to a completely negative
workup or more than 12 months of follow-up with no cancer
diagnosis. Reducing false positive rates while maintaining high
sensitivity is still a difficult problem (see Fig. 1). Techniques
include LDA [34], rule-based approaches (a set of “if-then”
statements) [38], combinations of these two [39], artificial
neural networks (ANNs) [40], and maximum-margin based
discriminators such as the SVM [92]. Novel methodologies
for searching have been introduced, which include template
matching for detection [36], unsupervised clustering techniques
[39], and a local density maximum algorithm [41]. Methods
to improve discrimination of nodules from lung tissue include
subtraction of vessels by region-growing [30], knowledge-con-
strained routines based on anatomical models of the thorax
[31], and deformable models [32]. Various approaches have
been taken to define the pleural interface and distinguish
juxta-pleural nodules, including morphological image filtering
[33], [34], curvature analysis of the pleural interface [35],
and adaptive template-matching for the appropriate shape of a
nodule given a location on the pleural wall [36].
Lee et al. [36] proposed a novel template-matching technique
based on genetic algorithms and template matching for the de-
tection of nodules. They evaluated their method on 557 sectional
images with a detection rate of 72% and a false positive rate of
1.1 per sectional image.
Armato et al. [38] reported an extension of his earlier two-
and three-dimensional automated lung CT analysis method [34]
to segment lung volume on a section-by-section basis. A rule-
based [42] approach combined with LDA was applied to reduce
the number of nodule candidates. They evaluated their method
on 43 CT scans with an AUC value of 90% and a nodule detec-
tion sensitivity of 70%. The false-positive detections per section
were 1.5.
Li et al. [45] reported on a CAD scheme to help radiolo-
gists improve the detection of pulmonary nodules in chest ra-
diographs by focusing on false positive reduction. They could
reduce the number of false positives to 44.3% with a small in-
crease in the number of true positives of 2.3%.
Katsuragawa et al. [2] described an automated method to dis-
tinguish benign and malignant solitary nodules. Fifty-five chest
radiographs were discriminated using LDA and ANN for fea-
ture combination and classification. Comparisons with manual
grading showed that LDA had an AUC value of 88.6%, whereas
manual identification resulted in an AUC value of 85.4%.
Brown et al. [46] reported patient-specific models for de-
tecting lung nodules for use in screening and follow-up surveil-
lance. Baseline image data facilitated segmentation of subse-
8 http://skynet.ohsu.edu/lungnodule09/.
9 http://anode09.isi.uu.nl/index.php.
AUTHORIZED LICENSED USE LIMITED TO: IEEE XPLORE. DOWNLOADED ON MAY 13,2010 AT 11:46:08 UTC FROM IEEE XPLORE. RESTRICTIONS APPLY.
640071289.031.png
LEE et al. : POTENTIAL OF COMPUTER-AIDED DIAGNOSIS TO IMPROVE CT LUNG CANCER SCREENING
139
quent images so that changes in size and/or shape of nodules
could be measured automatically. The system performed with
an 86% detection rate and an average of 11 false positives per
case on the baseline scans of 17 subjects. Follow-up scans per-
formed with a detection rate of 81%. Brown et al. [62] also de-
veloped an automated system for detecting lung micronodules
and applied it to data from 15 subjects with 77 lung nodules.
Preliminary results indicated that the automated system consid-
erably improved the radiologist’s performance in micronodule
detection but with a compensatory loss of specificity.
Gurcan et al. [39] developed a CAD system for lung nodule
detection on CT images wherein the first-stage lung regions
were identified by k-means clustering. After rule-based classi-
fication, LDA was used to further reduce the number of false
positives. They used 1454 CT slices from 34 patients with 63
lung nodules and obtained a sensitivity of 84% with 5.48 false
positives per slice.
Arimura et al. [43] reported a CAD system for nodule detec-
tion using a difference-image technique. They compared several
rule-based schemes for identifying nodules. A massive-training
ANN (MTANN) [44] reduced the false positives. The method
was evaluated on a confirmed cancer database of 106 CT scans
with 109 cancer lesions from 73 patients. They reported a sen-
sitivity of 83% and 5.8 false positives per scan.
Suzuki et al. [82] developed a technique that used a multiple
MTANN (multi-MTANN) for false-positive reduction. The in-
vestigators found that use of the trained multi-MTANN elimi-
nated 68.3% of false-positive findings with a reduction of one
true-positive result. The false-positive rate of the original CAD
scheme was improved from 4.5 to 1.4 false positives per image,
at an overall sensitivity of 81.3%, suggesting that this technique
reduced the false-positive rate of the CAD scheme for nodule
detection on chest radiographs while maintaining a high level
of sensitivity.
Shiraishi et al. [87] investigated the effect of a CAD scheme
on radiologist performance in the detection of lung cancers
on chest radiographs. They combined two independent CAD
schemes for the detection and classification of nodules into one
new CAD scheme by use of a database of 150 chest images.
Performance of the CAD scheme indicated that sensitivity in
detecting lung nodules was 80.6%, with 1.2 false-positive re-
sults per image, and sensitivity and specificity for classification
of nodules by use of the same database for training and testing
the CAD scheme were 87.7% and 66.7%, respectively. The
AUC value for detection of lung cancers improved significantly
from without (72.4%) to with CAD (77.8%). Shi-
raishi et al. (100) also developed a CAD system for detection
of nodules in the lateral views of chest radiographs in order to
improve the overall performance in combination with a CAD
scheme for posterior–anterior (PA) views.
Murphy et al. [112] presented a large-scale evaluation study
of automatic nodule detection in chest CT using local image fea-
tures (shape index and curvedness) and two successive iterations
of KNN classification for false-positive reduction. On 813 ran-
domly selected scans, a sensitivity of 80% was achieved with an
average of 4.2 false positives/scan. The same group participated
in the ANODE09 benchmark and achieved top performance
among six different CAD systems. Most of the work reported
for nodule detection and classification uses binary two-class de-
cisions to discriminate nodules.
C. Nodule Characterization by Malignancy Potential
The characterization of nodules by their malignancy potential
involves the analysis of nodule candidates into different nodule
type categories such as subtlety, texture, margin, sphericity,
calcification, internal structure, lobulation, spiculation, and
malignancy. A major application of CAD for lung CT is the
classification of nodules by likelihood of malignancy using
automated feature analyses algorithms [47], [96]. Here, the
common approach has been to calculate many features by
which to measure nodules and attempt to find correlations
between particular features (e.g., size, shape, attenuation) and
histological-confirmed cancers. Promising results have been
demonstrated using classifiers based on classical nodule texture
features [48]. More recently, fractal analysis of lung-nodule
interfaces [49] and LDA of multiple features [50] have shown
promise. Other important efforts in distinguishing benign and
malignant nodules are measurement of size change over time
[35], [46] and quantification of nodule uptake of intravenously
administered contrast enhancement [51]. The solitary pul-
monary nodule is a commonly encountered finding that might
represent lung cancer. Morphological characteristics including
lesion size, contour, edge, calcification, nodule density, and
contrast enhancement can help differentiate malignant from
benign nodules. Temporal change in lung nodule size raises
concern for malignancy, while size stability is traditionally
considered an indicator of benignity [52].
Yankelevitz et al. [55] sought to determine the accuracy of
LDCT volumetric measurements of small pulmonary nodules to
assess growth and malignancy via three-dimensional image ex-
traction and isotropic resampling. The synthetic nodule studies
revealed that volume could be measured accurately to within
3%.
Ko et al. [35] developed a CAD system that automatically
identified nodules from chest CT, quantified nodule diam-
eter, and estimated temporal change in size. High correlation
between the algorithm and thoracic radiologists on change
in nodule size was achieved (Spearman rank correlation co-
efficient ). The automated nodule detection system
identified 86% of 370 nodules in 16 studies from eight patients
with known nodules.
Li et al. [95] evaluated a system to investigate whether a
CAD scheme can assist radiologists in distinguishing small
benign from small malignant nodules on LDCT data. The
dataset used consisted of 28 primary lung cancers (6–20 mm)
and 28 benign nodules. Cancer cases included nodules with
pure ground-glass opacity, mixed ground-glass opacity, and
solid opacity. The AUC of the CAD scheme alone was 83.1%
for distinguishing benign from malignant nodules. The average
AUC value for radiologists was improved with the aid of the
CAD scheme from 78.5% to 85.3% . The radi-
ologists’ diagnostic performance with the CAD scheme was
more accurate than that of the CAD scheme alone
and that of radiologists alone. Li et al. [93] also described
the current status of the development and evaluation of CAD
schemes for the detection and characterization of lung nodules
AUTHORIZED LICENSED USE LIMITED TO: IEEE XPLORE. DOWNLOADED ON MAY 13,2010 AT 11:46:08 UTC FROM IEEE XPLORE. RESTRICTIONS APPLY.
640071289.032.png 640071289.033.png 640071289.034.png 640071289.001.png 640071289.002.png 640071289.003.png 640071289.004.png 640071289.005.png 640071289.006.png 640071289.007.png 640071289.008.png 640071289.009.png 640071289.010.png 640071289.011.png 640071289.012.png 640071289.013.png 640071289.014.png 640071289.015.png 640071289.016.png 640071289.017.png 640071289.018.png 640071289.019.png 640071289.020.png 640071289.021.png 640071289.022.png 640071289.023.png 640071289.024.png 640071289.025.png 640071289.026.png 640071289.027.png 640071289.028.png 640071289.029.png
140
IEEE REVIEWS IN BIOMEDICAL ENGINEERING, VOL. 2, 2009
in thin-section CT. They also reviewed a number of observer
performance studies, in which it was attempted, to assess the
potential for clinical usefulness of CAD schemes for nodule
detection and characterization in thin-section CT.
Petkovska et al. [94] studied whether conventional nodule
densitometry or contrast enhancement maps of indeterminate
lung nodules can distinguish benign from malignant nodules.
Conventional nodule densitometry was performed to obtain
the maximum difference in mean enhancement values for each
nodule from a circular ROI. The ROC curve for higher values
of enhancement indicated malignancy, which had an AUC
value of 76%. The visually scored magnitude of enhancement
was found to be less effective in distinguishing malignant from
benign lesions, with an AUC value of 62%. The visually scored
pattern of enhancement was found to be more effective with an
average AUC value of 79%.
Zhen et al. [114] proposed a new tumor growth measure for
pulmonary nodules, which could account for tumor deforma-
tion using nonrigid registration combined with nodule detection
and -segmentation. They proposed an adaptive doubling time
measure and reported comparative results to the standard
doubling time growth-rate measure. Results were based on
two successive scans for ten benign and nine malignant nodule
datasets.
Dreiseitle [109] proposed a learning scheme for training mul-
ticlass classifiers by maximizing the volume under the ROC sur-
face, which could benefit multiclass lung nodule characteriza-
tion tasks. Rather than having a binary decision on the malig-
nancy of a test case, a multiclass grading on the malignancy
decision would further provide additional measures that could
improve lung cancer screening.
The mentioned CAD schemes were not applied directly to
the domain of lung cancer screening, yet they provide theoret-
ical justification and potential to be effective tools to improve
CT lung cancer screening. One has to examine carefully the re-
ported results and their applicability to lung cancer screening.
IV. C LINICALLY R ELEVANT S ENSITIVITY AND S PECIFICITY
We emphasize that for cancer screening and performance
assessment of available CAD schemes, the focus of attention
should be put on clinically relevant performance measures.
In the context of cancer screening, the sensitivity-specificity
calculus of CAD systems is an essential factor when it comes
to treatment cost and patient outcome. One should consider
that validation of false-positives and ground-truth generation is
still a very timid approach. Problems of inter- and intraobserver
variability (see Fig. 2) and manual time-intensive grading call
for minimally invasive methodologies to obtain ground truth
information that would further alter the sensitivity-specificity
calculus of CAD and its potential acceptance. Minimally inva-
sive lung cancer surgery such as thoracoscopic lobectomy or
video-assisted thoracic surgery (VATS), or even noninvasive
surgeries such as the CyberKnife method [121], are advances
towards this direction.
To provide clinically relevant definitions for sensitivity and
specificity [103]–[105], we follow the definition in [100] and
[101] and point out how these diagnostic performance mea-
sures should be interpreted for cancer screening. Sensitivity and
specificity measure the number of false positives and false neg-
atives and are useful in evaluating the effectiveness of screening
methods. Alternative terms are the true-positive rate (TPR) and
the false-positive rate (FPR). The terms “positive” and “nega-
tive” are used to refer to the presence or absence of lung cancer.
Sensitivity and specificity are defined as follows. The sensi-
tivity of a screening test is its ability to detect those individuals
with cancer. It is computed by taking the number of true posi-
tives (TPs) and dividing it by the total number of cancer cases
(TP FN). The specificity of a test is its ability to identify those
individuals who actually do not have cancer. It is computed by
dividing the true negative (TN) by the sum of the TN and FP
cases. From these probabilities, one can compute confidence in-
tervals [102] and ROC curves that summarize diagnostic perfor-
mance for comparative assessment. However, the majority of
published research does not provide confidence intervals even
though they could be obtained from the algorithms.
D. Machine Learning for Lung Cancer Screening
In recent years, the machine-learning community developed
sophisticated tools and learning paradigms to address the issue
of CAD schemes that show clinically relevant performance mea-
sures. Feature selection methods and temporal learning schemes
are being employed successfully for the task of nodule char-
acterization and growth measurement. Recently, Vapnik et al.
[123] proposed a new framework called learning with hidden in-
formation that would enable the integration of hidden informa-
tion that could further improve CAD technology for lung cancer
diagnosis.
In Barreno et al. [106], the authors described a theoretical
analysis on how to combine classifiers with an optimal decision
rule and optimal ROC curve. The combination of different CAD
schemes also found interest in comparative CAD studies such
as the ANODE09 study. The issue of unbalanced class distribu-
tion in medical diagnostic applications and different class im-
portance needs to be addressed when developing CAD schemes
for effective lung cancer screening. Most standard classification
methods, however, are designed to maximize the overall accu-
racy and cannot incorporate different costs to different classes
explicitly. Liu and Tan et al. [107] proposed a method to di-
rectly maximize the weighted specificity and sensitivity of the
ROC curve. They reported excellent generalization properties
with the ability to assign different error costs to different classes
to account for the difference in the importance of the class dis-
tribution. Mozer et al. [108] took an approach of constrained
optimization to obtain a reduced solution space that directly
models the problem domain and has relevant performance char-
acteristics on a specific target region of the ROC curve. They
showed significant performance improvements in the domain of
telecommunications that could also benefit the application do-
main of lung cancer screening.
A. Detection Theory and ROC
The ROC curve was first developed by electrical engineers
and radar engineers during World War II for detecting enemy
objects in battlefields and was soon introduced in psychology to
account for perceptual detection of signals [17], [18]. The use
of ROC in medicine to assess diagnostic test performance was
AUTHORIZED LICENSED USE LIMITED TO: IEEE XPLORE. DOWNLOADED ON MAY 13,2010 AT 11:46:08 UTC FROM IEEE XPLORE. RESTRICTIONS APPLY.
640071289.030.png
Zgłoś jeśli naruszono regulamin