Paper
17 March 2006 The effect of data set size on computer-aided diagnosis of breast cancer: comparing decision fusion to a linear discriminant
Author Affiliations +
Abstract
Data sets with relatively few observations (cases) in medical research are common, especially if the data are expensive or difficult to collect. Such small sample sizes usually do not provide enough information for computer models to learn data patterns well enough for good prediction and generalization. As a model that may be able to maintain good classification performance in the presence of limited data, we used decision fusion. In this study, we investigated the effect of sample size on the generalization ability of both linear discriminant analysis (LDA) and decision fusion. Subsets of large data sets were selected by a bootstrap sampling method, which allowed us to estimate the mean and standard deviation of the classification performance as a function of data set size. We applied the models to two breast cancer data sets and compared the models using receiver operating characteristic (ROC) analysis. For the more challenging calcification data set, decision fusion reached its maximum classification performance of AUC = 0.80±0.04 at 50 samples and pAUC = 0.34±0.05 at 100 samples. The LDA reached a lower performance and required many more cases, with a maximum of AUC = 0.68±0.04 and pAUC = 0.12±0.05 at 450 samples. For the mass data set, the two classifiers had more similar performance, with AUC = 0.92±0.02 and pAUC = 0.48±0.02 at 50 samples for decision fusion and AUC = 0.92±0.03 and pAUC = 0.55±0.04 at 500 samples for the LDA.
© (2006) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Jonathan L. Jesneck, Loren W. Nolte, Jay A. Baker M.D., and Joseph Y. Lo "The effect of data set size on computer-aided diagnosis of breast cancer: comparing decision fusion to a linear discriminant", Proc. SPIE 6146, Medical Imaging 2006: Image Perception, Observer Performance, and Technology Assessment, 614616 (17 March 2006); https://doi.org/10.1117/12.655235
Lens.org Logo
CITATIONS
Cited by 1 scholarly publication.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data fusion

Data modeling

Computer aided diagnosis and therapy

Breast cancer

Binary data

Solid modeling

Tumor growth modeling

Back to Top