Denoising and dimensionality reduction of genomic data

Enrico Capobianco

doi:10.1117/12.609299

23 May 2005 Denoising and dimensionality reduction of genomic data

Enrico Capobianco

Proceedings Volume 5841, Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III; (2005) https://doi.org/10.1117/12.609299
Event: SPIE Third International Symposium on Fluctuations and Noise, 2005, Austin, Texas, United States

Abstract

Genomics represents a challenging research field for many quantitative scientists, and recently a vast variety of statistical techniques and machine learning algorithms have been proposed and inspired by cross-disciplinary work with computational and systems biologists. In genomic applications, the researcher deals with noisy and complex high-dimensional feature spaces; a wealth of genes whose expression levels are experimentally measured, can often be observed for just a few time points, thus limiting the available samples. This unbalanced combination suggests that it might be hard for standard statistical inference techniques to come up with good general solutions, likewise for machine learning algorithms to avoid heavy computational work. Thus, one naturally turns to two major aspects of the problem: sparsity and intrinsic dimensionality. These two aspects are studied in this paper, where for both denoising and dimensionality reduction, a very efficient technique, i.e., Independent Component Analysis, is used. The numerical results are very promising, and lead to a very good quality of gene feature selection, due to the signal separation power enabled by the decomposition technique. We investigate how the use of replicates can improve these results, and deal with noise through a stabilization strategy which combines the estimated components and extracts the most informative biological information from them. Exploiting the inherent level of sparsity is a key issue in genetic regulatory networks, where the connectivity matrix needs to account for the real links among genes and discard many redundancies. Most experimental evidence suggests that real gene-gene connections represent indeed a subset of what is usually mapped onto either a huge gene vector or a typically dense and highly structured network. Inferring gene network connectivity from the expression levels represents a challenging inverse problem that is at present stimulating key research in biomedical engineering and system biology. Several attempts have been made to describe gene networks with only limited interactions, thus exploiting the inherent sparsity of these systems. This in turn suggests that a certain redundancy of links in gene networks, or equivalently the inherent sparsity structure of these systems, might let the essential connections be identified and the inverse problem be given both satisfactory definition and computationally efficient tractability.

Citation Download Citation

Enrico Capobianco "Denoising and dimensionality reduction of genomic data", Proc. SPIE 5841, Fluctuations and Noise in Biological, Biophysical, and Biomedical Systems III, (23 May 2005); https://doi.org/10.1117/12.609299

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available