Noise factor analysis for cDNA microarrays

Yoganand Balagurunathan; Naisyin Wang; Edward R. Dougherty; Danh V. Nguyen; Yidong Chen; Michael L. Bittner; Jeffrey M. Trent; Raymond J. Carroll

doi:10.1117/1.1755232

1 July 2004 Noise factor analysis for cDNA microarrays

Yoganand Balagurunathan, Naisyin Wang, Edward R. Dougherty, Danh V. Nguyen, Yidong Chen, Michael L. Bittner, Jeffrey M. Trent, Raymond J. Carroll

Author Affiliations +

Journal of Biomedical Optics, Vol. 9, Issue 4, (July 2004). https://doi.org/10.1117/1.1755232

Abstract

A microarray-image model is used that takes into account many factors, including spot morphology, signal strength, background fluorescent noise, and shape and surface degradation. The model yields synthetic images whose appearance and quality reflect that of real microarray images. The model is used to link noise factors to the fidelity of signal extraction with respect to a standard image-extraction algorithm. Of particular interest is the identification of the noise factors and their interactions that significantly degrade the ability to accurately detect the true gene-expression signal. This study uses statistical criteria in conjunction with the simulation of various noise conditions to better understand the noise influence on signal extraction for cDNA microarray images. It proposes a paradigm that is implemented in software. It specifically considers certain kinds of noise in the noise model and sets these at certain levels; however, one can choose other types of noise or use different noise levels. In sum, it develops a statistical package that can work in conjunction with the existing image simulation toolbox.

1. Introduction

The introduction of cDNA microarray technology¹ allows thousands of gene expression values to be measured simultaneously, thereby providing insight into the global gene-expression patterns of cells (tissues) being studied. The approach is powerful for studying the myriad transcription-related pathways involved in cellular growth, differentiation, and transformation.² ³ ⁴ ⁵ The quality of each gene-expression value detected from this measurement technology depends intricately on the image-processing algorithm and interactions. Numerous image-processing tools have been proposed to extract signal intensity from the cDNA arrays. A method that uses a statistical test to segment the hybridized region from the background and the inner hole is used in our study.⁶ To better quantify the extracted data, metrics have been introduced to better understand the data generation.⁷

Despite the extensive application of cDNA technology, few studies have been devoted to examining the quality and reliability of gene expression signals in terms of how close the detected signals are to the true gene expression levels in a biological sense.⁸ Linking various noise conditions to the signal extraction has been the goal of most image-extraction algorithms, the purpose being to develop better algorithms. Most proposed imaging methods are based on intuitive evidence. This study employs a microarray-image model that takes into account many factors, including spot morphology, signal strength, background fluorescent noise, and shape and surface degradation.⁹ The model yields synthetic images whose appearance and quality reflect that of real microarray images. Here we use the model to link noise factors to the fidelity of signal extraction with respect to a standard image-extraction algorithm.⁶ ⁷ Of particular interest is the identification of the noise factors and their interactions that significantly degrade the ability to accurately detect the true gene-expression signal. This study uses statistical criteria in conjunction with the simulation of various noise conditions to better understand the influence of noise on signal extraction for cDNA microarray images.

Although some principles of experimental design have been proposed for microarray experiments, they have been focused primarily on optimizing the yield of information on the biological tissue samples of interest relative to the reference sample¹⁰ ¹¹ and on assessing within and between array variability. In this study, we use factorial experiments to systematically identify factors and their interactions that significantly affect the accuracy of detecting the expression signal. Because noise–factor interactions can affect the quality of signal detection in unpredictable ways, a systematic examination of these interactions is needed.

Two points need to be kept in mind regarding the statistical analysis. First, it is generally true that signal detection algorithms can better recover the true signal for images with less severe levels of noise. Thus, when we compare signal estimation for low noise with estimation for high noise, the actual error of estimation should be less for low noise—and this will be borne out. Our concern here, however, lies in a different direction. We want to examine the significance of different levels of various kinds of noise on signal estimation. If there is no significant effect on estimation error relative to different levels of a particular type of noise, then reducing the noise in the image to a lower level will not significantly affect signal detection; however, if there is a significant effect, then it would be worthwhile to try to reduce that type of noise.

A second point is that we are proposing a paradigm implemented in software, and not simply providing results. We have chosen to consider certain kinds of noise in the noise model and to set these at certain levels. One can choose other types of noise or use different noise levels. Clearly, bringing the noise levels closer will reduce the significance of noise effects, whereas moving them farther apart will increase the significance. What we have done is to develop a statistical package to work in conjunction with the existing image simulation toolbox.

2. Image Simulation

This section describes the noise conditions used in the current study. A detailed description of image simulation is given in the original paper.⁹ Figure 1 shows the cDNA spot and model generation with various noise conditions. The addition of noise to the array is broadly divided into three levels: array-level, block-level, and spot-level noise. Detailed distributional descriptions of the various types of noise are given in the appendix. Throughout this section, when describing a type of noise, we refer to the appendix for specific distributional information. The reference uses the simulation number. Our experiments involve three noise settings: −1, 0, and +1, where the increasing ordinal numbering corresponds to worst to least noise (Table 1).

Figure 1

Microarray spot model.

Table 1

Settings for noise parameters.
Index	Noise Type	Level +1: Good	Level 0: Average	Level −1: Bad
1	Sig./background noise(SigBack)	3	2	1.5
2	Expresser or outlier probability rate (OutL)	0.1	0.25	0.5
3	Spike noise (spike) (L_spi,μ_spi,W_spi)	0.01, (500,700), (2,5)	0.015, (700,1000), (2,5)	0.06, (900,1200), (6,10)
4	Snake noise (Snake) (κ_sn,L_s,W_sn,N_seg)	0.15, (10,50),1,2	0.20, (40,70),1,5	0.25, (50,90),2,12
5	Parabolic background with deviation (ParaB) (γ_ch1, γ_ch2)	1, (10, 12)	1, (15, 17)	1, (25, 27)
6	Spot radius: deviation (Spot) (σ_s)	10	20	30
7	Inner hole (InnH) (μ_h,σ_h,μ_v,σ_v)	(4,7,5,8),(4,7, 5,8)	(10,20,5,10),(10,20,5,10)	(35,45,10,20),(35, 45,10,20)
8	Foreground noise (ForeN) (α_m,α_s)	(0, 0,4,7), (0, 0,4,7)	(0,0,5,10), (0,0,5,10)	(0,0,10,15), (0,0,10,15)
9	Edge noise (EdgeN) (δ_ed)	0.3	0.1	0.03
10	Chord noise (Chord) (p₀,p₁,p₂,p₃,p₄)	(0.9,0.07, 0.03)	(0.75,0.15,0.05,0.05)	(0.2,0.35,0.20, 0.15,0.1)
11	Scratch noise (Scratch) (κ_sc,L_s∼U[L_sc1,L_sc2],W_sc, N_sc)	2.5,(9,35),3,2	3.5,(15,45),5,4	4,(25,65),7,10
12	Signal deviation (sigSD) (α)	0.15	0.25	0.35
13	Flat background with deviation (FlatBack) (γ_ch1,γ_ch2)	0, (10,12)	0, (15,17)	0, (25,27)

The analysis of a detection algorithm begins with a ground truth. Here that ground truth refers to a “true” expression intensity that must be estimated by the detection algorithm. A microarray containing N gene expression spots with intensity levels I_k, for k=1,…,N, is simulated by an exponential distribution. Base intensities for the red and green channels, R_k and G_k, respectively, are generated from two independent normal distributions having a mean I_k and standard deviation αI_k, where α is a common coefficient of variation.

A particular gene (RNA) may be over/or underexpressed, and this will show up in the red (test) channel. We refer to such a gene as an expresser or outlier. These are found randomly in the model by selecting a gene from the entire microarray with a probability p_outlier to be an outlier. If gene k is selected, then a scaling factor t_k=10^b_k is applied, where b_k satisfies a beta distribution, b_k∼Β(1.7,4.8), and where the ± sign is selected with equal probability. Based on the scaling factor, the individual channel intensities are given by $R_{k}^{'} = R_{k} \sqrt{t_{k}}$ and $G_{k}^{'} = G_{k} / \sqrt{t_{k}}$

The dyes commonly used for microarray experiments show nonlinear response characteristics, and different dyes give different responses. This effect is modeled by the nonlinear function

f (x) = a_{3} [a_{0} + x {(1 - e^{- x / a_{1}})}^{a_{2}}]; a_{3} > 1.

R^′ and G^′ are transformed by the detection system response characteristic function defined by f_R(x) or f_G(x) to obtain realistic fluorescent intensities. The resulting observed fluorescent intensities, R_k ^″=f_R(R_k ^′) and G_k ^″=f_G(G_k ^′) are the true mean intensities across the k’th spot.

Normally distributed foreground noise of intensity I_f is added pixelwise on the spots (simulation 9 in the appendix). This foreground noise typically has zero mean. It results in spot intensities SR=R_k ^″+I_f1 and SG=G_k ^″+I_f2. Figure 2 shows noise addition at various levels. In this figure, and in all subsequent figures illustrating noise, all other noise factors are set at the best level (less variant than +1 level).

Figure 2

Foreground noise variation illustrated at three levels: +1, 0, −1.

Owing to laboratory dust that may stick on the arrays and fluoresce on laser excitation to give high-intensity spikes, or high-intensity points caused by cDNA precipitation, spike noise, at a preset rate, L_spi, is added randomly across the entire slide area. Once a pixel is selected for spike noise, the adjacent pixels have a higher probability of being affected. This is fixed by a random number chosen from a uniform rate, W_spi, which gives a count of pixels randomly chosen to be influenced by this noise. The intensity, N_S, of the spike noise is governed by an exponential distribution with mean μ_spi. Figure 3 shows spike noise added at different levels.

Figure 3

Spike noise variation illustrated at three levels +1, 0, −1 (left to right).

Physical handling of the array slides can result in scratch noise (surface scratches), which typically results in low intensity levels. Scratch-noise intensity is parameterized as a ratio, κ_sc, giving the background-to-scratch noise intensity level. Other parameters are the number of strips, strip thickness W_sc, and a random strip length, L_sc (simulation 24 in the appendix). These scratches are placed at random positions on the array and are inclined according to a (discrete) uniformly random angle, θ_sc∈{0,45,90,135,180}. Figure 4 shows scratch noise at different levels.

Figure 4

Scratch noise variation illustrated at three levels: +1, 0, −1 (left to right).

Fine dust particles on the slides can create snake noise upon laser excitation. These snake-noise strips are typically of higher intensity than the signal level. To simulate this noise, multidirectional snake noise has been generated consisting of some number, N_seg, of segments. Analogously to scratch noise, the intensity is parameterized as a ratio, κ_sn, giving the average signal-to-snake noise intensity level, the number of snakes, snake thickness W_sn, and a random length, L_sn, given as a multiple of the spot size. Figure 5 shows snake noise at different levels.

Figure 5

Snake noise variation illustrated at three levels +1, 0, −1 (left to right).

The cDNA deposition spot is considered to be circular, with a random radius S (simulation 1 in the appendix). The mean of the radius is set according to the array density, and its variance relates to the consistency of spot size. The standard deviation is a predetermined proportion, k_s, of the mean. The radius mean is set for every block, and randomized over a small range within the array (simulation 12). Depending on the robot arm and printing ability of the pins, the interspot distance, G_sp, may vary. Owing to the physical mechanics of the robot arm, the block size (pixel units) is fixed in most cases. The interspot distance can be set to accommodate spot size and random variations in spot radii. The spot variability at three levels is shown in Fig. 6.

Figure 6

Spot radius deviation illustrated at three levels: +1, 0, −1 (left to right).

Owing to the impact of the print tip on the glass surface, or possibly to the effect of surface tension during the drying process, a significantly lesser amount of cDNA can be deposited near the spot center. An elliptical shape models this inner hole with random horizontal and vertical axes, H and V (simulation 3). Interarray variability in the distributions of H and V is modeled by uniformly distributed means μ_H and μ_V (simulation 14). The choice of the parameters governs the hole shapes. The center position of a hole is allowed to drift over a range (simulation 4). The shape is unaffected by the drift because the contact of the mechanical print tip to the surface is unaffected. Figure 7 shows the noise at different levels.

Figure 7

Inner hole noise variation illustrated at three levels: +1, 0, −1 (left to right).

The irregularity of RNA washout during slide preparation is modeled by chord noise (chord removal). The number, N_c, of chords to be removed for a spot is selected from a discrete distribution, {0,1,2,3,4}, where the elements of the distribution occur with probabilities p₀, p₁, p₂, p₃, and p₄, respectively. For images with very few pieces cut off, the zero-chord probability p₀ is very high, and the three- and four-chord probabilities are close to 0 (possibly equal to 0). To model interarray variability, the probabilities can be treated randomly. This noise parameter is set once for every block that is not a spot level noise. Once the number of chords for a spot is determined, the distance, L, of each chord center to the edge is selected from a beta distribution, with interblock variability for the beta distribution being uniformly modeled (simulation 5). Finally, the chord locations are chosen uniformly randomly according to an angle θ between 0 and 2π. Figure 8 shows chord noise at different levels.

Figure 8

Chord noise variation illustrated at three levels: +1, 0, −1 (left to right).

Owing to the manner in which liquid dries, the spots usually do not have smooth edges. Edge noise is simulated via a parameterized edge-noise algorithm adopted from digital document processing. Edge noise is applied to the outer perimeter of the spot (after chord removal). Figure 9 shows the noise at different levels.

Figure 9

Spot edge variation illustrated at three levels: +1, 0, −1 (left to right).

Many factors contribute to the fluorescent background observed: autofluorescence from the glass surface or the surface of the detection instrument, nonspecific binding of fluorescent residues after hybridization, local contamination from posthybridization slide handling, etc. Background noise is simulated by a normal distribution whose parameters are randomly chosen to describe the process, and for multiple arrays, the interarray difference is modeled by a uniform distribution (simulation 20).

Rather than be constant across the entire microarray, the mean of the background noise may vary, owing to various scanning effects. It can take different shapes: parabolic, positive slope, or negative slope. In this case a function g(x,y) is first generated (parabolic, positive slope, or negative slope) to form a background surface and normal noise is added to it pixelwise. Figure 10 shows parabolic background noise at different levels.

Figure 10

Variation in signal-to-background noise ratio (SigBack) and parabolic background. SigBack is set at −1, while parabolic background is varied from +1, 0, −1, left to right.

The addition of various noise types makes the microarray highly peaked, with high pixel differences. This stark irregularity can be mitigated by smoothing the image with either a flat or pyramidal convolution kernel. Our simulation study uses a flat smoothing function.

Once a microarray image has been simulated, the signal extraction toolbox Dearray uses statistical methods to segment the signal and the background pixels.⁶ ⁷ Different levels of significance can be set for this procedure. Once the signal pixels are identified, a trimmed mean of their values gives an estimate of the signal mean. Background information is extracted by taking pixel information from four corners of a given spot to estimate its mean. Actual signal expression is estimated by the difference between the two. If a spot’s irregularity in shape and signal (area of the spot, signal variation, etc.) is reflected by a low-quality metric, then the spot can be flagged. At the final step, a linear corrective normalization procedure is carried out to compensate for variation in the dye response. Ratio intensities are then computed. A logarithmic scale applied to the ratios can be used to map the data to a desirable range.

3. Experimental Design and Statistical Data Analysis

The array model has more than twenty parameterized noise conditions. We consider thirteen commonly occurring noise conditions for this study. These are grouped into four categories, which then correspond to four experiments: (1) background noise, (2) shape noise, (3) surface noise, and (4) weak signal. Each category has five conditions, with some of the thirteen conditions occurring in more than one category. The experiments are described in Table 2. In experiments 1A through 4A, each factor can take on two levels, 0 or 1. In experiments 1B through 4B, the factors take on the levels −1 or 1. Assuming two levels for each noise factor, there are thirty-two conditions for each category. For each condition, 8 replicate arrays are generated so there are 256 arrays per experiment. Each array has 1600 spots in a 40×40 matrix format. These numbers have been chosen to provide sufficient replicates while not resulting in inordinate image-processing time.

Table 2

Experiments.
Experiment 1: Background–Noise Interactions
Index	Noise Type
1	Sig./background noise(SigBack)
2	Expresser or outlier level (OutL)
3	Spike noise (Spike) (L_spi,μ_spi∼U[e,f],W_spi∼U[g, h])
4	Snake noise (Snake) (κ_sn,L_sn∼U[L_sn1,L_sn2],W_sn,N_seg)
5	Parabolic background with deviation (ParaB) (γ_ch1,γ_ch2)
Experiment 2: Shape–Noise Interactions
Index	Noise Type
1	Spot radius: deviation (Spot) (σ_s)
2	Inner hole (InnH) (μ_h,σ_h,μ_v, σ_v)
3	Foreground noise (ForeN) (α_m,α_s)
4	Edge noise (EdgeN) (δ_ed)
5	Chord noise (Chord) (p₀,p₁,p₂,p₃,p₄)
Experiment 3: Surface–Noise Interactions
Index	Noise Type
1	Spot radius: deviation (Spot) (σ_s)
2	Inner hole (InnH) (μ_h,σ_h,μ_v, σ_v)
3	Snake noise (Snake) (κ_sn,L_sn∼U[L_sn1,L_sn2],W_sn,N_seg)
4	Scratch noise (Scratch) (κ_sc,L_s∼U[L_sc1,L_sc2],W_sc,N_sc)
5	Chord noise (Chord) (p₀,p₁,p₂,p₃,p₄)
Experiment 4: Weak Signal–Noise
Index	Noise Type
1	Signal standard deviation (SigSD) (α)
2	Foreground noise (ForeN) (α_m,α_s)
3	Sig./background noise (SigBack)
4	Flat background with background deviation (FlatBack) (γ_ch1,γ_ch2)
5	Spike noise (Spike) (L_spi,μ_spi∼U[e,f],W_spi∼U[g,])

3.1.

Experimental Conditions

The background–noise interaction involves noise that can alter the background and thereby influence signal extraction. Parabolic noise generates a concave background, and at different levels the backgrounds are expected to show more deviation. A high signal-to-background noise ratio reduces the gap between the average signal and background mean levels. Spike and snake noise create surface noise. Expresser variability simulates spots with expresser gene expressions.

Noise degradations related to spot shapes are grouped together in the shape–noise interaction experiment. Noise related to spot shapes is grouped together. These include spot radius, inner-hole variation (from no hole to close to half the spot size), edge noise, and chord removal. To check the interaction of these with foreground noise, the latter is included.

The third experiment, surface–noise interaction, combines shape variation with surface noise, both snake and scratch.

In the last experiment, weak signal–noise interaction involves alterations in signal level, including foreground noise, spike noise, background unevenness, and signal-to-background ratio. This grouping is good for analyzing the effects of weak signals on the signal estimation process.

The quality of microarray images is typically assessed by a trained microbiologist in the laboratory after image scanning. In this study, the noise-level parameters used for the different factor levels correspond to the kinds of noise distributions seen in practice. As noted in the original simulation paper,⁹ the exact parameters will vary, depending on the technology, and the ones used in this paper correspond to general conditions observed over many years of application since the development of Dearray in 1997.⁶ Although metrics have been proposed to quantify microarray quality,⁷ there is no direct way to determine the effect of each noise level on the metrics. This is mostly attributed to the multivariate influence of the various degradations on the estimated signal. While it is no doubt true that individual statistical results obtained in this paper may not apply for different noise distributions, the general methodology will apply, and we believe that the conclusions drawn here are indicative of what one might expect with similar technology (for specific issues regarding parameters, refer to the original paper).

To quantify the relation between the factor levels (−1,0,+1), noise levels, and image quality, Table 3 provides measures corresponding to the different experiments and factor levels. All measures, except for the coefficient of variation, are defined at the spot level, and therefore have been averaged across all spots over all replicates. The table includes the means (expectations) of twelve measurements. There are four measurements for the red channel: SR_S.Dev is the standard deviation of the signal intensity; SR_SNR is the signal-to-noise ratio, which is defined as the ratio of the mean signal intensity to the local background standard deviation; SR_Quality is the channel quality metric defined in Ref. 7, which is formed as a minimum of four component qualities involving area, background, consistency, and saturation; and SR_BkDev is the standard deviation of the background intensity. There are four analogous measures for the green channel: SG_S.Dev, SG_SNR, SG_Quality, and SG_BkDev. There are four common measurements: |Error| is the absolute error for the signal estimation; Prop.Area is the proportional area relative to the mask size; Total-Q is the total quality, which is based on the intensity quality of both channels and the signal-to-noise ratio of both channels, and CV is the coefficient of variation of the intensity. In all experiments, the mean error, E|Error|], of the actual to estimated signal ratios increases as the degradation increases.

Table 3

Image-quality measurements for the experiments.
Quantitative Measures	Noise Levels for Experiment 1
Quantitative Measures	Good (+1 Level)	Average (0 Level)	Bad (−1 Level)
E[SR_S.Dev]	1176.52	1178.82	1320.937
E[SR_SNR]	111.269	66.143	17.921
E[SR_Q]	0.7333	0.7807	0.9775
E[SR_bkDev]	17.884	32.315	131.853
E[SG_S.Dev]	1181.09	1178.237	1328.67
E[SG_SNR]	105.290	61.987	17.655
E[SG_Q]	0.7514	0.8017	0.9793
E[SG_bkDev]	19.134	34.613	134.66
E[\|Error\|]	0.0714	0.1402	0.2843
E[Pro.Area]	0.9622	0.9509	0.8491
E[Total-Q]	0.7131	0.7495	0.8744
E[CV]	0.0478	0.1108	0.1805
Quantitative Measures	Noise Levels for Experiment 2
Quantitative Measures	Good (+1 Level)	Average (0 Level)	Bad (−1 Level)
E[SR_S.Dev]	970.14	727.107	569.103
E[SR_SNR]	96.750	77.082	52.770
E[SR_Q]	0.9956	0.9894	0.9528
E[SR_bkDev]	16.83	16.415	16.158
E[SG_S.Dev]	971.78	733.498	575.78
E[SG_SNR]	78.434	64.192	45.785
E[SG_Q]	0.9956	0.9894	0.9504
E[SG_bkDev]	21.079	19.90	18.767
E[\|Error\|]	0.1675	0.2473	0.5212
E[Pro.Area]	0.9405	0.8775	0.7606
E[Total-Q]	0.9627	0.9489	0.8989
E[CV]	0.0423	0.0423	0.0531
Quantitative Measures	Noise Levels for Experiment 3
Quantitative Measures	Good (+1 Level)	Average (0 Level)	Bad (−1 Level)
E[SR_S.Dev]	987.017	747.165	576.904
E[SR_SNR]	99.803	86.568	60.511
E[SR_Q]	0.9927	0.9878	0.9222
E[SR_bkDev]	17.205	18.097	20.328
E[SG_S.Dev]	989.46	746.12	584.234
E[SG_SNR]	79.880	69.825	50.435
E[SG_Q]	0.9998	0.9880	0.9234
E[SG_bkDev]	21.470	21.65	22.925
E[\|Error\|]	0.1121	0.3274	0.4992
E[Pro.Area]	0.9350	0.8586	0.7343
E[Total-Q]	0.9904	0.9483	0.8744
E[CV]	0.0419	0.0417	0.0477
Quantitative Measures	Noise Levels for Experiment 4
Quantitative Measures	Good (+1 Level)	Average (0 Level)	Bad (−1 Level)
E[SR_S.Dev]	1160.35	1125.47	1134.62
E[SR_SNR]	48.610	23.032	8.772
E[SR_Q]	0.9905	0.9768	0.9568
E[SR_bkDev]	39.331	87.582	261.48
E[SG_S.Dev]	1160.29	1131.83	1140.364
E[SG_SNR]	41.620	20.620	8.3094
E[SG_Q]	0.9906	0.9768	0.9569
E[SG_bkDev]	46.343	98.420	275.703
E[\|Error\|]	0.1474	0.4607	0.8199
E[Pro.Area]	0.9432	0.8993	0.83707
E[Total-Q]	0.9243	0.8355	0.61612
E[CV]	0.1209	0.1980	0.2493

While most of the measurements in Table 3 show straightforward effects, there is an apparent anomaly in experiment 1, which treats background characteristics. The mean variation of the background (E[SR_bkDev],E[SG_bkDev]) shows an increase from +1 to −1 level, along with the mean SNR (E[SR_SNR],E[SG_SNR]), which goes from good to bad. Some decrease in the proportional area of the spots is also seen. A paradox occurs with respect to total quality: E[Total-Q] increases as the levels go from +1 to −1. This is due to the effect of the parabolic background on spots in the central portion of the array. There the image gets a very low background standard deviation, which improves the SNR, and therefore improves E[Total-Q].

3.2.

Statistical Analysis of Data

For each set of experiments we used a 2^k factorial design, with k=5 experimental factors. Each factor consists of two levels.¹² ¹³ Since our primary objective is to determine how the experimental noise factors affect the accuracy of detecting gene expression, the appropriate basic response variable considered for analysis is the absolute difference between the detected (estimated) and the true expression ratio at each spot. Because the distribution of these measurements tends to have a long right tail, we therefore analyze the response variable in the log-log scale for the analysis of variance model.¹² More precisely, a constant 1 has been added to a response before taking the log transformation. The goal here is to reduce the potential dominating influence from extremely large responses, yet not to dramatically increase the transformed absolute differences when the true expression ratios are close to 0, noting that log (0) goes to negative infinite. Here, taking a different transformation can be viewed as evaluating the responses at different scales. One advantage of considering the absolute difference rather than the original difference, beyond its being a meaningful measurement, is that the responses are now all positive so that regardless of what monotone transformation is taken, the relative order among responses is kept. Because of that, even though the outcomes are not transformation invariant among nonlinear monotone transformations, they are less sensitive toward the choice of transformation. In fact, we have conducted analyses using other concave transformations as well as rank-based methods, in which cases the conclusion of the analysis remains unchanged.

To further avoid the situation that outlying observations have a dominating influence on the estimated main or interaction effects, we adopt the following screening procedure in our analysis. First, data points with an estimated expression ratio larger than 30 are excluded from the analysis. Such high-ratio points are often excluded in practice. Second, we have performed a regular least-squares estimation procedure¹² and produced studentized residuals¹² for each observation. A data point with an absolute studentized residual greater than 4 is considered as an extreme outlying observation and is further excluded from the main analysis. The chance of having an absolute studentized residual greater than 4 is less than 10⁻⁴ (for normally distributed data). The use of studentized residuals gives us a statistically meaningful way to exclude points with very high estimated ratios without requiring a subjective cutoff point lower than 30. This two-part screening procedure eliminates about 1 of the total observations in each experiment.

We fit an analysis-of-variance model with main effects, two-way, and three-way interactions to the remaining data. Results for the main effects and two-way interactions based on F-tests are obtained. We test the significance of the five main affects and all ten first-order interactions simultaneously for each experiment. Thus, we have a total of 15 hypothesis tests per experiment. We use the Bonferonni adjustment¹² to control the family wise error rate (FWER) in multiple testing (testing main and first-order interactions). At α=0.05 level, this gives 0.0033 as the significance threshold for each test. Thus, the probability of erroneously rejecting any null hypothesis is controlled at 0.05.

When there are two levels in each factor, as in all of our experiments, we construct an equivalent t-test for each of the 15 F -tests. By equivalence, we mean that the p value of an F-test is the same as that of the corresponding two-sided t-test. The t-test statistics with sign and the p values, when significant, are reported. For each main effect, the t-test statistic is the difference, standardized by its standard error (S.E.), between the estimated effects of the two noise levels. Even though the S.E.s are not identical among all main effects, a consequence of using robust regression procedures, they are within 0.5 of each other. In other words, the size of the t-test statistic reflects the magnitude of changes associated with the noise factor. All the main effect t-test statistics are positive and this simply indicates that the presence of a high noise level creates more damage than that of a low noise level. For each two-way interaction, the t-test statistic is the standardized difference between the estimated cell mean when both high noise factors are present and that cell mean predicted based on outcomes from individual noise factors, assuming no interaction. A positive t-test statistic indicates a “synergistic” interaction; that is, the damage caused by the presence of both noise factors is worse than the additive effect from individual noise factors. A negative t-test statistic stands for an “antagonistic” interaction— the opposite of “synergistic” interaction. Finally, throughout, the experimental unit is the individual spot in each array.

4. Experimental Results

As noted in the introduction, signal-detection algorithms can recover the true signal more easily for images with less severe levels of noise. Thus, when comparing experiments 1A to 4A with experiments 1B to 4B, with the noise level 0 (less severe) and noise level −1 (more severe), respectively, we expect that the true gene expression can be more accurately estimated in experiments 1A to 4A. This means that for data with more noise (−1; experiments 1B to 4B) the difference between the estimated and true expression ratio is greater. This is shown in Fig. 11, where, for all experiments, the distributions of these absolute differences at their most extreme noise level (all 0, or all −1) in log-log scale are presented by box plots. The top and bottom edges of each box correspond to the upper and lower quartiles of the measurements, respectively. The solid dots in the middle give the locations of the medians. Figure 11 clearly shows that the medians and upper quartiles of 1B to 4B are larger than the corresponding medians and upper quartiles of 1A to 4A.

Figure 11

Box plots for absolute differences between the true and estimated expression ratios in a log-log scale. For each experiment, only the responses in the most extreme level (all −1 for Bs and all 0 for As) are plotted. Each box contains the central 50 of the data. The solid dot in the middle gives the location of the median. The top and bottom whiskers reach the largest and smallest nonoutlying observations, respectively, while the circles indicate the locations of outlying observations.

In this paper our interest goes beyond this general statement; it is to determine the kinds of noise reduction that significantly affect signal estimates. For instance, in experiment 1, concerning background noise, if there is a significant difference between levels −1 and 1 for the parabolic background factor (p<0.0033), then lessening the curvature of the parabolic background significantly improves estimation at level α=0.05. We reach this conclusion because the response for the factorial experiment is the absolute difference between the estimated and actual signal values.

Let us consider experiment 1 in detail, the results being given in Table 4. The four columns of the table correspond to experiment 1B for all signal levels, 1B for low signal levels, 1A for all signal levels, and 1A for low signal levels. For each experiment, data in the low signal level comprise the one-third of the original data points whose true signal values are in their lower tertile. We have considered low signal levels as a case in their own right (besides being included among all signal levels) because signal detection is made more difficult when a signal is low. The table is broken into main effects and interactions. For experiment 1B (level −1 versus level 1) using all signals, all five effects are significant. This means that reducing any of these effects can be helpful. They are also all significant for low signals. Note that all five factors in the experiment directly affect pixel values, either raising or lowering them for the affected pixels, and the difference in degrees between levels −1 and 1 significantly affects signal estimation. The magnitude of the t-test statistics suggests that the high outlier and spike noise levels are more damaging to the image than the others.

Table 4

Experiment 1: Background noise.
Source	Exp. 1B All Levels	Exp. 1B Low Levels	Exp. 1A All Levels	Exp. 1A Low Levels
Main Effects
SigBack	32.60(<0.0001)	15.65(<0.0001)	35.38(<0.0001)	21.81(<0.0001)
OutL	106.80(<0.0001)	42.32(<0.0001)	24.79(<0.0001)	4.02(<0.0001)
Spike	104.77(<0.0001)	94.09(<0.0001)	2.37	5.72(<0.0001)
Snake	3.17(0.0015)	2.99(0.0028)	0.10	0.69
ParaB	28.27(<0.0001)	13.85(<0.0001)	21.57(<0.0001)	11.40(<0.0001)
Interaction
SigBack^*outL	−2.10	−3.65(0.0003)	0.85	−0.95
SigBack^*spike	−17.25(<0.0001)	−12.44(<0.0001)	2.81	3.38(0.0007)
SigBack^*snake	−0.37	0.22	−2.05	1.31
sigBack^*paraB	11.20(<0.0001)	7.15(<0.0001)	4.22(<0.0001)	0.44
outL^*spike	73.19(<0.0001)	42.35(<0.0001)	3.12(0.0018)	4.05(<0.0001)
outL^*snake	0.66	0.40	−1.87	0.83
outL^*paraB	−6.57(<0.0001)	−5.67(<0.0001)	−0.10	−1.55
Spike^*snake	−1.92	−1.27	−0.26	1.73
Spike^*paraB	−11.61(<0.0001)	−6.50(<0.0001)	1.77	−0.35
snake^*paraB	−2.29	−0.10	−1.47	0.35

If we now consider experiment 1A (level 0 versus level 1) for all signals, both spike and snake effects become insignificant. This means that, relative to snake or spike noise, signal estimation is not significantly different at these two levels. Looking at the fourth column, we see that spike noise is still significant for level 0 versus level 1 for low signals. For these, there is a significant difference in performance of the algorithm relative to spike noise.

Interpretation of interactions can often be difficult, but in some cases it can be revealing. For instance, confining ourselves to the case of all signal levels, in experiment 1B we see that there is interaction between the signal-to-background noise and the parabolic effect. This is not surprising because the ratio is affected by the background. The interaction of the outlier effect and spike noise is also reasonable since both produce extreme values on the microarray. The large positive t-test statistic suggests a strong “synergistic” interaction effect throughout all four scenarios. A similar type of interaction is observed when both signal-to-background noise and parabolic-background noise levels are high. Figures 12 and 13 illustrate the mixed visual effects between signal-to-background noise and parabolic-background noise and spike noise, respectively, with the underlying true spot-intensity distributions being the same in each part and with only the noise factors contributing to the differences.

Figure 12

Signal-to-background and parabolic noise at (+1,+1), (0,0), (−1,−1) level, from left to right.

Figure 13

Signal-to-background and spike noises at different levels.

For experiment 2 (shape noise), in Table 5 we see that four of the factors are significant for experiment 2B, for all signals or just low signals. Among them, the strongest factors are the inner hole size and the foreground noise. The effect of foreground noise is similar to background noise in that it directly affects pixel values. The effect of low spot radius, large inner hole size, and excessive chord removal is to lessen the signal area, thereby reducing the pixel area over which the signal is to be estimated. Chord removal is not significant in experiment 2A for low signals, which means that at level 0 there is insufficient chord removal to significantly affect signal estimation relative to level 1. The fact that edge noise is not significant in experiment 2B indicates that the imaging algorithm can deal equally well with spot detection at both levels relative to handling edge noise.

Table 5

Experiment 2: Shape noise.
Source	Exp. 2B All Levels	Exp. 2B Low Levels	Exp. 2A All Levels	Exp. 2A Low Levels
Main Effects
Spot	39.83(<0.0001)	24.19(<0.0001)	9.89(<0.0001)	5.32(<0.0001)
InnH	96.02(<0.0001)	54.21(<0.0001)	19.43(<0.0001)	11.78(<0.0001)
ForeN	71.22(<0.0001)	39.65(<0.0001)	21.31(<0.0001)	11.79(<0.0001)
EdgeN	2.41	0.77	3.91(<0.0001)	1.33
Chord	15.65(<0.0001)	8.27(<0.0001)	5.42(<0.0001)	2.21
Interaction
spotR^*innH	26.31(<0.0001)	14.95(<0.0001)	3.99(<0.0001)	1.49
spotR^*foreN	−0.98	−0.41	1.82	0.71
spotR^*edgeN	4.96(<0.0001)	3.36(0.0008)	−0.10	−0.33
spotR^*chord	−6.52(<0.0001)	−3.55(0.0004)	−0.45	−0.10
innH^*foreN	12.17(<0.0001)	7.60(<0.0001)	−2.01	−0.70
innH^*edgeN	0.70	0.41	3.00(0.0027)	2.82
innH^*chord	9.01(<0.0001)	4.33(<0.0001)	4.26(<0.0001)	4.13(<0.0001)
foreN^*edgeN	0.45	−0.30	−0.22	−1.15
foreN^*chord	3.80(0.0001)	2.37	1.69	0.55
edgeN^*chord	−3.74(0.0002)	−2.25	−0.41	−1.01

There is an apparent anomaly with regard to edge noise in experiment 2A: edge noise is significant relative to levels 0 and 1, but not with respect to levels −1 and 1. This phenomenon is an “apparent” anomaly because one cannot compare p values across different experiments with full confidence—although we often do make such comparisons in a heuristic mode. Recall that the denominator of the F-statistic contains a variance estimator, and therefore a low variance will tend to make the F-statistic significant. Because the variance is very low in experiment 2A in contrast to experiment 2B, significance in the former and lack of significance in the latter is a reasonable consequence and does not imply that the difference in damage between two levels in experiment 2B is less than that in 2A. The damage effect of edge noise starts to show in 2A when the effects of inner hole size and foreground noise are not as dominating as they are in 2B. In experiment 2B, the effect of edge noise is still present in its significant interaction with both spot radius and chord noise. Figure 14 shows the mixed visual effect between the spot radius and chord noise.

Figure 14

Spot radius deviation and chord noise at (+1,+1), (0,0), (−1,−1) levels, from left to right.

Regarding interaction in experiment 2B, the three distinctly geometric factors (spot radius, inner hole, and chord noise) interact significantly for both the overall signal and low-signal cases. This is reasonable because each affects the area over which signal estimation takes place. Interaction is greatly reduced in experiment 2A, particularly for low signals, where only interaction between the inner hole and chord removal is strongly significant. Figures 15 and 16 show the mixed visual effects of the inner hole with spot radius and chord noise, respectively.

Figure 15

Spot radius deviation and inner hole at (+1,+1), (0,0), (−1,−1) levels, left to right.

Figure 16

Inner hole and chord noises at (+1,+1), (0,0), (−1,−1) levels, left to right.

Whereas experiment 2 mixes shape effects with foreground noise and edge noise, experiment 3 mixes them with scratch and snake noise. Table 6 shows a fair amount of consistency between the two experiments with regard to the three geometric factors relative to both main effects and interaction. One notable change is that the interaction between spot radius and chord removal changes from being “antagonistic” in experiment 2 to being “synergistic” in experiment 3. Even though the order of estimated cell means in the four noise level combinations remains the same in both experiments, in experiment 3 the estimated cell mean when both noise factors are present is much higher than in the other three; consequently, a significant “synergistic” interaction is observed. For the most part, snake and scratch noise show no significant main effects. The exception is scratch noise for low signals in experiment 3B. This is quite plausible because scratch noise causes a strip of low values, thereby reducing an already low signal. Note also the interaction of snake and scratch noise in three of the four experiments.

Table 6

Experiment 3: Shape-surface noise.
Source	Exp. 3B All Levels	Exp. 3B Low Levels	Exp. 3A All Levels	Exp. 3A Low Levels
Main Effects
Spot	32.80(<0.0001)	20.70(<0.0001)	6.40(<0.0001)	4.10(<0.0001)
InnH	103.75(<0.0001)	22.13(<0.0001)	15.87(<0.0001)	8.94(<0.0001)
Snake	0.68	1.81	0.17	0.20
Scratch	1.26	5.50(<0.0001)	0.37	0.69
Chord	20.68(<0.0001)	13.55(<0.0001)	1.65	0.40
Interaction
spotR^*innH	23.47(<0.0001)	14.44(<0.0001)	−0.14	−0.71
spotR^*snake	−6.46(<0.0001)	−4.05(<0.0001)	−2.86	−1.31
spotR^*scratch	−3.04(0.0023)	0.57	−0.17	−1.09
spotR^*chord	16.56(<0.0001)	8.71(<0.0001)	2.61	0.81
innH^*snake	−3.06(0.0022)	−1.51	0.00	−0.44
innH^*scratch	−1.79	1.87	0.10	−0.79
innH^*chord	14.60(<0.0001)	7.70(<0.0001)	0.17	−2.38
snake^*scratch	−3.49(0.0005)	−2.12	−5.49(<0.0001)	−4.24(<0.0001)
snake^*chord	8.84(<0.0001)	5.53(<0.0001)	0.92	0.00
scratch^*chord	1.66	0.84	2.78	2.04

Experiment 4 concerns signal conditions, in particular, signal deviation, signal-to-background ratio, and foreground noise. These conditions are bound to affect signal estimation, and the main-effects part of Table 7 demonstrates this. The only exception is for low-signal values when comparing levels 0 and 1 in experiment 4A. Since signal deviation is tied to the signal mean, a low signal diminishes this deviation and signal deviation is not significant for low signal values. Figure 17 shows the mixed visual effects between signal-to-background and spike noise. As has been common throughout, overall interaction between the factors is much less relative to levels 0 and 1 than with respect to levels −1 and 1.

Figure 17

Signal-to-background and spike noise variation at (+1,+1), (0,0), (−1,−1) levels, left to right.

Table 7

Experiment 4: Weak-signal noise.
Source	Exp. 4B All Levels	Exp. 4B Low Levels	Exp. 4A All Levels	Exp. 4A Low Levels
Main Effects
SigSD	77.19(<0.0001)	20.87(<0.0001)	20.21(<0.0001)	0.35
ForeN	7.85(<0.0001)	3.52(0.0004)	5.58(<0.0001)	2.16
SigBack	55.55(<0.0001)	25.82(<0.0001)	46.33(<0.0001)	26.34(<0.0001)
FlatBack	67.22(<0.0001)	30.82(<0.0001)	42.74(<0.0001)	24.38(<0.0001)
Spike	74.96(<0.0001)	68.49(<0.0001)	3.01(0.0025)	2.84
Interaction
sigSD^*foreN	−1.41	−3.48(0.0005)	−0.22	1.04
sigSD^*sigBack	−4.86(<0.0001)	−4.43(<0.0001)	0.22	0.00
sigSD^*flatBack	−5.44(<0.0001)	−5.42(<0.0001)	−2.80	−4.13(<0.0001)
sigSD^*spike	44.79(<0.0001)	24.89(<0.0001)	0.30	−1.00
foreN^*sigBack	−0.42	0.82	2.86	2.19
foreN^*flatBack	0.17	0.49	−0.77	−3.72(0.0002)
foreN^*spike	−2.29	−0.81	0.57	1.09
sigBack^*flatBack	21.72(<0.0001)	11.84(<0.0001)	3.23(0.0012)	1.59
sigBack^*spike	−17.92(<0.0001)	−16.97(<0.0001)	1.69	2.33
flatBack^*spike	−21.68(<0.0001)	−17.62(<0.0001)	−2.49	−2.10

5. Conclusion

Factorial analysis has been applied to simulated microarray images to study the effects and interaction of noise types at different noise levels. This type of analysis provides a general paradigm for investigating the effects of noise within a comprehensive simulation environment, thereby providing a tool by which one can quantitatively determine which kinds of noise should be mitigated in microarray technology. For instance, from the analysis described in this paper, it can be concluded that elimination of the inner hole and the stabilizing of spot radius will have a strongly beneficial effect on signal estimation. Additional information can be found online.¹⁴

Appendix

Parameter settings for the microarray simulation. The notation N(a,b) denotes the normal distribution with mean a and variance b; U[a,b] is the uniform distribution on the interval [a,b]; U{a,b,c,…} is the uniform distribution on the indicated set of values; Β(a,b) is the beta distribution with parameters a and b; and exp(a) is the exponential distribution with mean a.

Level	Simulation	Parameter Descriptions	Distribution
Spot	1. Spot size	S: Spot radius with (μ_s,σ_s ²)	S∼N(μ_s,σ_s ²)
	2. Spot drift	δ_x,δ_y: Drifting level	δ_x,δ_y∼U(d_a,d_b)
		d_a,d_b: Percentage of spot radius
		P_D: Drift activation probability	D_x=δ_x×S×U[−1,1]
			D_y=δ_y×S×U[−1,1]
		D_x,D_y: Relative drifting
		(X₁ ^′,Y₁ ^′): Drifted center coordinates	${\begin{cases} X_{1}^{'} = X + D_{x} \\ Y_{1}^{'} = Y + D_{y} \end{cases} {\begin{cases} X_{2}^{'} = X_{1}^{'} + u [- 1, 1] \\ Y_{2}^{'} = Y_{1}^{'} + u [- 1, 1] \end{cases}$
		(X₂ ^′,Y₂ ^′): Second channel, where (X,Y) are predefined spot center coordinates
	3. Inner hole size	H, V: Horizontal and vertical axis of the inner elliptical hole	H∼N(μ_H,σ_H) V∼N(μ_V,σ_V)
	4. Inner hole drift	X_C,Y_C: Ideal spot center	X_R=X_C+δc_xR
		X_R,Y_R: First channel coordinates	Y_R=Y_C+δc_yR
		X_G,Y_G: Second channel coordinates	X_G=X_C+δc_xG
		where δc_xG,δc_yG,δc_xR,δc_yR: drift level set at the block level	Y_G=Y_C+δc_yG
	5. Chord removal	P_{N_c}: Chord removal probability { p_k: probability of k chords to be removed from a target spot}	P_{N_c}={p₀,p₁,p₂,p₃,p₄}, where p₀+p₁+p₂+p₃+p₄=1 N_c∼{0,1,2,3,4}

		L: Chord length	L∼B(α_L,β_L)
		θ: Chord position	θ∼U(0,2π)
	6. Spot intensity	β: Mean intensity for the assumed cell system	I_k∼exp(β)
		R_k,G_k:k’th spot (fixed) signal intensities for both channels	R_k∼N(I_k,σ_I) G_k∼N(I_k,σ_I)
		α: Coefficient of variation of signal intensity in the system	σ_I=α×I_k
	7. Expresser or outlier’s intensity	p_outlier: Outlier activation probability b_k: Outlier control level t_k: Targeted outlier expression ratio, with equal probability of ± sign R_k ^′,G_k ^′:k’th outlier signal intensities for both channels	Equal probability at 0.05 to 0.10 b_k∼Β(1.7,4.8) t_k=10^±b_k $R_{k}^{'} = R_{k} \sqrt{t_{k}}$ $G_{k}^{'} = G_{k} / \sqrt{t_{k}}$
	8. Channel conditioning	R_k ^″,G_k ^″: Prenormalized signal intensity of the spots on red, green channels	R_k ^″=f₁(R_k ^′) G_k ^″=f₂(G_k ^′)
		a₀,a₁,a₂, and a₃, parameters for response characteristic function	f(x)=[a₀+x(1−e^−x/a₁)^a₂]a₃; where a₃>1
	9. Spot signal variation— foreground noise	SR_k,SG_k: Pixelwise (x,y) signal intensity	SR_k(x,y)∼R_k ^″+N(μ_{R_k ^″},σ_R ²)
			SG_k(x,y)∼G_k ^″+N(μ_{G_k ^″},σ_G ²)
		α_s: Within-spot signal coefficient of variation	${\begin{array}{l} μ_{{R^{″}}_{k}} = {R^{″}}_{k} \times α_{m_{1}};_{α m_{1}} \sim u [f_{a_{1}}, f_{b_{1}}] \\ μ_{{G^{″}}_{k}} = {G^{″}}_{k} \times α_{m_{2}};_{α m_{2}} \sim u [f_{a_{2}}, f_{b_{2}}] \end{array}$
			${\begin{array}{l} σ_{R} = {R^{″}}_{k} \times α_{s_{1}};_{α s_{1}} \sim u [f_{c_{1}}, f_{d_{1}}] \\ σ_{G} = {G^{″}}_{k} \times α_{s_{2}};_{α s_{2}} \sim u [f_{c_{2}}, f_{d_{2}}] \end{array}$
	10. Edge enhancement	W_ed: Level of enhancement, parameter (μ_e) set for the block	W_ed∼N(μ_e,1)
		N_e: Number of pixels enhanced
	11. Edge noise	Apply edge noise at the set level (δ_ed)
Block	12. Radius parameters	μ_s,k_s: mean and radius deviation factor	μ_r∼U(s_a,s_b) σ_s∼k_s×μ_s
		s_a,s_b: bounds of radius, set by block size and interspot gap
	13. Chord parameters	N_c: Chord rate picked with equal probability	N_c∈U{0,1,2,3,4} having weights {p₀,p₁,p₂,p₃,p₄}
		α_L,β_L: Chord distributional parameters	α_L∼U(a_α,b_α),β_L∼U(a_β,b_β)
	14. Inner hole parameters	μ_H,μ_V,σ_H,σ_V: Parameters for inner elliptical hole	μ_H∼U(L_a,L_b)×μ_R, μ_V∼U(L_a,L_b)×μ_R
		μ_R: Mean spot radius in the block	σ_H=α₁×μ_R,σ_V=α₂×μ_R
			α₁∼U(P_a,P_b),α₁∼U(P_a,P_b)
	15. Drift parameters	δc_xG,δc_yG,δc_xR,δc_yR: drift level i, j: Percentage of the spot radius	δc∼U[i,j] δc_xG=δc×U[−1,1],δc_yG=δc×U[−1,1] δc_xR=δc_xG+U[−1,1],δc_yR=δc_yG+ U[−1,1]
	16. Enhancement	l_a,l_b: Range of intensity ratio. Set mean level of enhancement for a block	μ_e∼U(l_a,l_b)
Array	17. Physical dimensions	B_w,B_h: Block size—width, height (distance between first spot centers of any two blocks)	Typical setting for an 8-block, 2-row array (in pixels):
		M_l,M_r,M_t,M_b: Margin settings (left, right, top, bottom)	B_h,B_w=900 M_l,M_r,M_t,M_b=100
		N_pin,N_row: Number of pins in an array, printed equally across N_row number of rows
		NS_w,NS_h: Number of spots along the width (NS_w) and height (NS_h) of the block
	18. Signal-to- noise ratio	SNR: Signal-to-noise level is set for an array
	19. Interspot distance	G_sp: Interspot distance, set for an array
	20. Background	I_{b_ch1},I_{b_ch2}: Background intensity, with parameters set for an array	I_{b_ch1}∼N(μ_b,σ_b₁ ²) I_{b_ch2}∼N(μ_b,σ_b₂ ²)
		γ: Background level	γ∼U[a,b]
		Parameter settings:
		-Flat fluorescent background	μ_b=γ,
		-Functional background g(x,y): choice of parabolic, positive or negative slant surface function	μ_b=γ×g(x,y), with σ_b₁=(k_b₁μ_b),σ_b₂=(k_b₂μ_b)
	21. Spike noise	L_spi: Level of spike noise (set in terms of percentage of total pixels)
		N_s: Intensity of the spike noise	N_s∼exp(μ_spi),
		μ_spi: Noise rate	μ_spi∼U[e,f]
		W_spi: Width of the noise cluster	W_spi∼U[g,h]
	22. Edge noise	δ_ed: Set the controlling parameter	δ_ed set as a percentage of maximum intensity value
	23. Snake noise	N_seg: Number of snake tails in an image	N_seg,κ_sn,L_sn,W_sn
		I_sn: Intensity of the noise tail	I_sn∼N(μ_sn,σ_sn),
		κ_sn: Average signal-to-snake noise intensity level	μ_sn=(I_k/κ_sn),σ_sn=k_sn×μ_sn
		L_sn: Length of the segment expressed as multiples of average spot size	L_sn∼U[L_sn1,L_sn2]
		W_sn: Width of the snake noise tail
	24. Scratch noise	N_sc: Number of scratch tails in an image	N_sc,κ_sc,W_sc, θ
		I_sc: Intensity of the scratch noise	I_sc∼N(μ_sc,σ_sc)
		κ_sc: Average background-to- scratch noise intensity level	μ_sc=(μ_b/κ_sc),σ_sc=k_sc×μ_sc
		L_sc: Length of the segment in units of average size of the spots W_sc: Width of the scratch noise θ: Scratch noise inclination	L_sc∼U[L_sc1,L_sc2] θ∈U{0,45,90,135,180} deg

Acknowledgments

Y.B. was supported by the Center for Environmental and Rural Health at Texas A&M University. E.R.D. was supported by the National Human Genome Research Institute. D.V.N. was supported by the National Cancer Institute (CA-90301). R.J.C. was supported by a grant from the National Cancer Institute (CA-57030), and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences (P30-ES09106). N.W. was supported by the National Cancer Institute (CA-74552).

REFERENCES

1.

M. Schena , D. Shalon , R. W. Davis , and P. O. Brown , “Quantitative monitoring of gene expression patterns with a complementary DNA microarray,” Science , 270 467 –470 (1995). Google Scholar

2.

M. N. Arbeitman , E. E. Furlong , F. Imam , E. Johnson , B. H. Null , B. S. Baker , M. A. Krasnow , M. P. Scott , R. W. Davis , and K. P. White , “Gene expression during the life cycle of Drosophila melanogaster,” Science , 297 2270 –2275 (2002). Google Scholar

3.

S. Chu , J. DeRisi , M. Eisen , J. Mulholland , D. Botstein , P. O. Brown , and I. Herskowitz , “The transcriptional program of sporulation in budding yeast,” Science , 282 699 –705 (1998). Google Scholar

4.

J. DeRisi , L. Penland , P. O. Brown , M. L. Bittner , P. S. Meltzer , M. Ray , Y. Chen , Y. A. Su , and J. M. Trent , “Use of a cDNA microarray to analyse gene expression patterns in human cancer,” Nat. Genet. , 14 457 –460 (1996). Google Scholar

5.

I. S. Lossos , A. A. Alizadeh , M. Diehn , R. Warnke , Y. Thorstenson , P. J. Oefner , P. O. Brown , D. Botstein , and R. Levy , “Transformation of follicular lymphoma to diffuse large-cell lymphoma: alternative patterns with increased or decreased expression of c-myc and its regulated genes,” Proc. Natl. Acad. Sci. U.S.A. , 99 8886 –8891 (2002). Google Scholar

6.

Y. Chen , E. R. Dougherty , and M. L. Bittner , “Ratio-based decision and quantitative analysis of CDNA microarrays,” J. Biomed. Opt. , 2 (4), 364 –374 (1997). Google Scholar

7.

Y. Chen , V. Kamat , E. R. Dougherty , M. L. Bittner , P. S. Meltzer , and J. Trent , “Ratio statistics of gene expression levels and application to microarray data analysis,” Bioinformatics , 18 (9), 1207 –1215 (2002). Google Scholar

8.

D. V. Nguyen , A. B. Arpat , N. Wang , and R. J. Carroll , “DNA microarray experiments: biological and technological aspects,” Biometrics , 58 (4), 701 –717 (2002). Google Scholar

9.

Y. Balagurunathan , E. R. Dougherty , Y. Chen , M. L. Bittner , and J. M. Trent , “Simulation of cDNA microarrays via a parameterized random signal model,” J. Biomed. Opt. , 7 (3), 507 –523 (2002). Google Scholar

10.

K. Kerr and G. A. Churchill , “Experimental design for gene expression microarrays,” Biostatistics, 2 183 –202 (2001). Google Scholar

11.

M. K. Kerr and G. A. Churchill , “Statistical design and analysis of gene expression microarrays,” Genet. Res. , 77 (2), 123 –128 (2001). Google Scholar

12.

Google Scholar

13.

Google Scholar

14.

Google Scholar

Notes

Address all correspondence to Dr. Edward R. Dougherty, Texas A&M Univ., Dept. of Electrical Engineering, 111D Zachry, College Station, TX 77843-3128. Tel: 979-694-9538, Fax: 979-845-6259, E-mail: e-dougherty@tamu.edu

Citation Download Citation

Yoganand Balagurunathan, Naisyin Wang, Edward R. Dougherty, Danh V. Nguyen, Yidong Chen, Michael L. Bittner, Jeffrey M. Trent, and Raymond J. Carroll "Noise factor analysis for cDNA microarrays," Journal of Biomedical Optics 9(4), (1 July 2004). https://doi.org/10.1117/1.1755232

Published: 1 July 2004

Access the abstract

JOURNAL ARTICLE
16 PAGES

DOWNLOAD PAPER SAVE TO MY LIBRARY

GET CITATION

CITATIONS

Cited by 27 scholarly publications.

Explore citations on Lens.org

RIGHTS & PERMISSIONS

Get copyright permission Get copyright permission on Copyright Marketplace

KEYWORDS

Interference (communication)

Signal to noise ratio

Signal detection

Factor analysis

Statistical analysis

Error analysis

Seaborgium

1.

Introduction

2.

Image Simulation

Figure 1

Table 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

Figure 7

Figure 8

Figure 9

Figure 10

3.

Experimental Design and Statistical Data Analysis

Table 2

3.1.

Experimental Conditions

Table 3

3.2.

Statistical Analysis of Data

4.

Experimental Results

Figure 11

Table 4

Figure 12

Figure 13

Table 5

Figure 14

Figure 15

Figure 16

Table 6

Figure 17

Table 7

5.

Conclusion

Appendix

Acknowledgments

REFERENCES

Notes

Show All Keywords

Keywords/Phrases

Search In:

Publication Years