|
1.IntroductionThe introduction of cDNA microarray technology1 allows thousands of gene expression values to be measured simultaneously, thereby providing insight into the global gene-expression patterns of cells (tissues) being studied. The approach is powerful for studying the myriad transcription-related pathways involved in cellular growth, differentiation, and transformation.2 3 4 5 The quality of each gene-expression value detected from this measurement technology depends intricately on the image-processing algorithm and interactions. Numerous image-processing tools have been proposed to extract signal intensity from the cDNA arrays. A method that uses a statistical test to segment the hybridized region from the background and the inner hole is used in our study.6 To better quantify the extracted data, metrics have been introduced to better understand the data generation.7 Despite the extensive application of cDNA technology, few studies have been devoted to examining the quality and reliability of gene expression signals in terms of how close the detected signals are to the true gene expression levels in a biological sense.8 Linking various noise conditions to the signal extraction has been the goal of most image-extraction algorithms, the purpose being to develop better algorithms. Most proposed imaging methods are based on intuitive evidence. This study employs a microarray-image model that takes into account many factors, including spot morphology, signal strength, background fluorescent noise, and shape and surface degradation.9 The model yields synthetic images whose appearance and quality reflect that of real microarray images. Here we use the model to link noise factors to the fidelity of signal extraction with respect to a standard image-extraction algorithm.6 7 Of particular interest is the identification of the noise factors and their interactions that significantly degrade the ability to accurately detect the true gene-expression signal. This study uses statistical criteria in conjunction with the simulation of various noise conditions to better understand the influence of noise on signal extraction for cDNA microarray images. Although some principles of experimental design have been proposed for microarray experiments, they have been focused primarily on optimizing the yield of information on the biological tissue samples of interest relative to the reference sample10 11 and on assessing within and between array variability. In this study, we use factorial experiments to systematically identify factors and their interactions that significantly affect the accuracy of detecting the expression signal. Because noise–factor interactions can affect the quality of signal detection in unpredictable ways, a systematic examination of these interactions is needed. Two points need to be kept in mind regarding the statistical analysis. First, it is generally true that signal detection algorithms can better recover the true signal for images with less severe levels of noise. Thus, when we compare signal estimation for low noise with estimation for high noise, the actual error of estimation should be less for low noise—and this will be borne out. Our concern here, however, lies in a different direction. We want to examine the significance of different levels of various kinds of noise on signal estimation. If there is no significant effect on estimation error relative to different levels of a particular type of noise, then reducing the noise in the image to a lower level will not significantly affect signal detection; however, if there is a significant effect, then it would be worthwhile to try to reduce that type of noise. A second point is that we are proposing a paradigm implemented in software, and not simply providing results. We have chosen to consider certain kinds of noise in the noise model and to set these at certain levels. One can choose other types of noise or use different noise levels. Clearly, bringing the noise levels closer will reduce the significance of noise effects, whereas moving them farther apart will increase the significance. What we have done is to develop a statistical package to work in conjunction with the existing image simulation toolbox. 2.Image SimulationThis section describes the noise conditions used in the current study. A detailed description of image simulation is given in the original paper.9 Figure 1 shows the cDNA spot and model generation with various noise conditions. The addition of noise to the array is broadly divided into three levels: array-level, block-level, and spot-level noise. Detailed distributional descriptions of the various types of noise are given in the appendix. Throughout this section, when describing a type of noise, we refer to the appendix for specific distributional information. The reference uses the simulation number. Our experiments involve three noise settings: −1, 0, and +1, where the increasing ordinal numbering corresponds to worst to least noise (Table 1). Table 1
The analysis of a detection algorithm begins with a ground truth. Here that ground truth refers to a “true” expression intensity that must be estimated by the detection algorithm. A microarray containing N gene expression spots with intensity levels Ik, for k=1,…,N, is simulated by an exponential distribution. Base intensities for the red and green channels, Rk and Gk, respectively, are generated from two independent normal distributions having a mean Ik and standard deviation αIk, where α is a common coefficient of variation. A particular gene (RNA) may be over/or underexpressed, and this will show up in the red (test) channel. We refer to such a gene as an expresser or outlier. These are found randomly in the model by selecting a gene from the entire microarray with a probability p outlier to be an outlier. If gene k is selected, then a scaling factor tk=10bk is applied, where bk satisfies a beta distribution, bk∼Β(1.7,4.8), and where the ± sign is selected with equal probability. Based on the scaling factor, the individual channel intensities are given by and The dyes commonly used for microarray experiments show nonlinear response characteristics, and different dyes give different responses. This effect is modeled by the nonlinear function R′ and G′ are transformed by the detection system response characteristic function defined by fR(x) or fG(x) to obtain realistic fluorescent intensities. The resulting observed fluorescent intensities, Rk ″=fR(Rk ′) and Gk ″=fG(Gk ′) are the true mean intensities across the k’th spot.Normally distributed foreground noise of intensity If is added pixelwise on the spots (simulation 9 in the appendix). This foreground noise typically has zero mean. It results in spot intensities SR=Rk ″+If1 and SG=Gk ″+If2. Figure 2 shows noise addition at various levels. In this figure, and in all subsequent figures illustrating noise, all other noise factors are set at the best level (less variant than +1 level). Owing to laboratory dust that may stick on the arrays and fluoresce on laser excitation to give high-intensity spikes, or high-intensity points caused by cDNA precipitation, spike noise, at a preset rate, Lspi, is added randomly across the entire slide area. Once a pixel is selected for spike noise, the adjacent pixels have a higher probability of being affected. This is fixed by a random number chosen from a uniform rate, Wspi, which gives a count of pixels randomly chosen to be influenced by this noise. The intensity, NS, of the spike noise is governed by an exponential distribution with mean μspi. Figure 3 shows spike noise added at different levels. Physical handling of the array slides can result in scratch noise (surface scratches), which typically results in low intensity levels. Scratch-noise intensity is parameterized as a ratio, κsc, giving the background-to-scratch noise intensity level. Other parameters are the number of strips, strip thickness Wsc, and a random strip length, Lsc (simulation 24 in the appendix). These scratches are placed at random positions on the array and are inclined according to a (discrete) uniformly random angle, θsc∈{0,45,90,135,180}. Figure 4 shows scratch noise at different levels. Fine dust particles on the slides can create snake noise upon laser excitation. These snake-noise strips are typically of higher intensity than the signal level. To simulate this noise, multidirectional snake noise has been generated consisting of some number, N seg , of segments. Analogously to scratch noise, the intensity is parameterized as a ratio, κsn, giving the average signal-to-snake noise intensity level, the number of snakes, snake thickness Wsn, and a random length, Lsn, given as a multiple of the spot size. Figure 5 shows snake noise at different levels. The cDNA deposition spot is considered to be circular, with a random radius S (simulation 1 in the appendix). The mean of the radius is set according to the array density, and its variance relates to the consistency of spot size. The standard deviation is a predetermined proportion, ks, of the mean. The radius mean is set for every block, and randomized over a small range within the array (simulation 12). Depending on the robot arm and printing ability of the pins, the interspot distance, Gsp, may vary. Owing to the physical mechanics of the robot arm, the block size (pixel units) is fixed in most cases. The interspot distance can be set to accommodate spot size and random variations in spot radii. The spot variability at three levels is shown in Fig. 6. Owing to the impact of the print tip on the glass surface, or possibly to the effect of surface tension during the drying process, a significantly lesser amount of cDNA can be deposited near the spot center. An elliptical shape models this inner hole with random horizontal and vertical axes, H and V (simulation 3). Interarray variability in the distributions of H and V is modeled by uniformly distributed means μH and μV (simulation 14). The choice of the parameters governs the hole shapes. The center position of a hole is allowed to drift over a range (simulation 4). The shape is unaffected by the drift because the contact of the mechanical print tip to the surface is unaffected. Figure 7 shows the noise at different levels. The irregularity of RNA washout during slide preparation is modeled by chord noise (chord removal). The number, Nc, of chords to be removed for a spot is selected from a discrete distribution, {0,1,2,3,4}, where the elements of the distribution occur with probabilities p0, p1, p2, p3, and p4, respectively. For images with very few pieces cut off, the zero-chord probability p0 is very high, and the three- and four-chord probabilities are close to 0 (possibly equal to 0). To model interarray variability, the probabilities can be treated randomly. This noise parameter is set once for every block that is not a spot level noise. Once the number of chords for a spot is determined, the distance, L, of each chord center to the edge is selected from a beta distribution, with interblock variability for the beta distribution being uniformly modeled (simulation 5). Finally, the chord locations are chosen uniformly randomly according to an angle θ between 0 and 2π. Figure 8 shows chord noise at different levels. Owing to the manner in which liquid dries, the spots usually do not have smooth edges. Edge noise is simulated via a parameterized edge-noise algorithm adopted from digital document processing. Edge noise is applied to the outer perimeter of the spot (after chord removal). Figure 9 shows the noise at different levels. Many factors contribute to the fluorescent background observed: autofluorescence from the glass surface or the surface of the detection instrument, nonspecific binding of fluorescent residues after hybridization, local contamination from posthybridization slide handling, etc. Background noise is simulated by a normal distribution whose parameters are randomly chosen to describe the process, and for multiple arrays, the interarray difference is modeled by a uniform distribution (simulation 20). Rather than be constant across the entire microarray, the mean of the background noise may vary, owing to various scanning effects. It can take different shapes: parabolic, positive slope, or negative slope. In this case a function g(x,y) is first generated (parabolic, positive slope, or negative slope) to form a background surface and normal noise is added to it pixelwise. Figure 10 shows parabolic background noise at different levels. The addition of various noise types makes the microarray highly peaked, with high pixel differences. This stark irregularity can be mitigated by smoothing the image with either a flat or pyramidal convolution kernel. Our simulation study uses a flat smoothing function. Once a microarray image has been simulated, the signal extraction toolbox Dearray uses statistical methods to segment the signal and the background pixels.6 7 Different levels of significance can be set for this procedure. Once the signal pixels are identified, a trimmed mean of their values gives an estimate of the signal mean. Background information is extracted by taking pixel information from four corners of a given spot to estimate its mean. Actual signal expression is estimated by the difference between the two. If a spot’s irregularity in shape and signal (area of the spot, signal variation, etc.) is reflected by a low-quality metric, then the spot can be flagged. At the final step, a linear corrective normalization procedure is carried out to compensate for variation in the dye response. Ratio intensities are then computed. A logarithmic scale applied to the ratios can be used to map the data to a desirable range. 3.Experimental Design and Statistical Data AnalysisThe array model has more than twenty parameterized noise conditions. We consider thirteen commonly occurring noise conditions for this study. These are grouped into four categories, which then correspond to four experiments: (1) background noise, (2) shape noise, (3) surface noise, and (4) weak signal. Each category has five conditions, with some of the thirteen conditions occurring in more than one category. The experiments are described in Table 2. In experiments 1A through 4A, each factor can take on two levels, 0 or 1. In experiments 1B through 4B, the factors take on the levels −1 or 1. Assuming two levels for each noise factor, there are thirty-two conditions for each category. For each condition, 8 replicate arrays are generated so there are 256 arrays per experiment. Each array has 1600 spots in a 40×40 matrix format. These numbers have been chosen to provide sufficient replicates while not resulting in inordinate image-processing time. Table 2
3.1.Experimental ConditionsThe background–noise interaction involves noise that can alter the background and thereby influence signal extraction. Parabolic noise generates a concave background, and at different levels the backgrounds are expected to show more deviation. A high signal-to-background noise ratio reduces the gap between the average signal and background mean levels. Spike and snake noise create surface noise. Expresser variability simulates spots with expresser gene expressions. Noise degradations related to spot shapes are grouped together in the shape–noise interaction experiment. Noise related to spot shapes is grouped together. These include spot radius, inner-hole variation (from no hole to close to half the spot size), edge noise, and chord removal. To check the interaction of these with foreground noise, the latter is included. The third experiment, surface–noise interaction, combines shape variation with surface noise, both snake and scratch. In the last experiment, weak signal–noise interaction involves alterations in signal level, including foreground noise, spike noise, background unevenness, and signal-to-background ratio. This grouping is good for analyzing the effects of weak signals on the signal estimation process. The quality of microarray images is typically assessed by a trained microbiologist in the laboratory after image scanning. In this study, the noise-level parameters used for the different factor levels correspond to the kinds of noise distributions seen in practice. As noted in the original simulation paper,9 the exact parameters will vary, depending on the technology, and the ones used in this paper correspond to general conditions observed over many years of application since the development of Dearray in 1997.6 Although metrics have been proposed to quantify microarray quality,7 there is no direct way to determine the effect of each noise level on the metrics. This is mostly attributed to the multivariate influence of the various degradations on the estimated signal. While it is no doubt true that individual statistical results obtained in this paper may not apply for different noise distributions, the general methodology will apply, and we believe that the conclusions drawn here are indicative of what one might expect with similar technology (for specific issues regarding parameters, refer to the original paper). To quantify the relation between the factor levels (−1,0,+1), noise levels, and image quality, Table 3 provides measures corresponding to the different experiments and factor levels. All measures, except for the coefficient of variation, are defined at the spot level, and therefore have been averaged across all spots over all replicates. The table includes the means (expectations) of twelve measurements. There are four measurements for the red channel: SR_S.Dev is the standard deviation of the signal intensity; SR_SNR is the signal-to-noise ratio, which is defined as the ratio of the mean signal intensity to the local background standard deviation; SR_Quality is the channel quality metric defined in Ref. 7, which is formed as a minimum of four component qualities involving area, background, consistency, and saturation; and SR_BkDev is the standard deviation of the background intensity. There are four analogous measures for the green channel: SG_S.Dev, SG_SNR, SG_Quality, and SG_BkDev. There are four common measurements: |Error| is the absolute error for the signal estimation; Prop.Area is the proportional area relative to the mask size; Total-Q is the total quality, which is based on the intensity quality of both channels and the signal-to-noise ratio of both channels, and CV is the coefficient of variation of the intensity. In all experiments, the mean error, E|Error|], of the actual to estimated signal ratios increases as the degradation increases. Table 3
While most of the measurements in Table 3 show straightforward effects, there is an apparent anomaly in experiment 1, which treats background characteristics. The mean variation of the background (E[SR_bkDev],E[SG_bkDev]) shows an increase from +1 to −1 level, along with the mean SNR (E[SR_SNR],E[SG_SNR]), which goes from good to bad. Some decrease in the proportional area of the spots is also seen. A paradox occurs with respect to total quality: E[Total-Q] increases as the levels go from +1 to −1. This is due to the effect of the parabolic background on spots in the central portion of the array. There the image gets a very low background standard deviation, which improves the SNR, and therefore improves E[Total-Q]. 3.2.Statistical Analysis of DataFor each set of experiments we used a 2k factorial design, with k=5 experimental factors. Each factor consists of two levels.12 13 Since our primary objective is to determine how the experimental noise factors affect the accuracy of detecting gene expression, the appropriate basic response variable considered for analysis is the absolute difference between the detected (estimated) and the true expression ratio at each spot. Because the distribution of these measurements tends to have a long right tail, we therefore analyze the response variable in the log-log scale for the analysis of variance model.12 More precisely, a constant 1 has been added to a response before taking the log transformation. The goal here is to reduce the potential dominating influence from extremely large responses, yet not to dramatically increase the transformed absolute differences when the true expression ratios are close to 0, noting that log (0) goes to negative infinite. Here, taking a different transformation can be viewed as evaluating the responses at different scales. One advantage of considering the absolute difference rather than the original difference, beyond its being a meaningful measurement, is that the responses are now all positive so that regardless of what monotone transformation is taken, the relative order among responses is kept. Because of that, even though the outcomes are not transformation invariant among nonlinear monotone transformations, they are less sensitive toward the choice of transformation. In fact, we have conducted analyses using other concave transformations as well as rank-based methods, in which cases the conclusion of the analysis remains unchanged. To further avoid the situation that outlying observations have a dominating influence on the estimated main or interaction effects, we adopt the following screening procedure in our analysis. First, data points with an estimated expression ratio larger than 30 are excluded from the analysis. Such high-ratio points are often excluded in practice. Second, we have performed a regular least-squares estimation procedure12 and produced studentized residuals12 for each observation. A data point with an absolute studentized residual greater than 4 is considered as an extreme outlying observation and is further excluded from the main analysis. The chance of having an absolute studentized residual greater than 4 is less than 10−4 (for normally distributed data). The use of studentized residuals gives us a statistically meaningful way to exclude points with very high estimated ratios without requiring a subjective cutoff point lower than 30. This two-part screening procedure eliminates about 1 of the total observations in each experiment. We fit an analysis-of-variance model with main effects, two-way, and three-way interactions to the remaining data. Results for the main effects and two-way interactions based on F-tests are obtained. We test the significance of the five main affects and all ten first-order interactions simultaneously for each experiment. Thus, we have a total of 15 hypothesis tests per experiment. We use the Bonferonni adjustment12 to control the family wise error rate (FWER) in multiple testing (testing main and first-order interactions). At α=0.05 level, this gives 0.0033 as the significance threshold for each test. Thus, the probability of erroneously rejecting any null hypothesis is controlled at 0.05. When there are two levels in each factor, as in all of our experiments, we construct an equivalent t-test for each of the 15 F -tests. By equivalence, we mean that the p value of an F-test is the same as that of the corresponding two-sided t-test. The t-test statistics with sign and the p values, when significant, are reported. For each main effect, the t-test statistic is the difference, standardized by its standard error (S.E.), between the estimated effects of the two noise levels. Even though the S.E.s are not identical among all main effects, a consequence of using robust regression procedures, they are within 0.5 of each other. In other words, the size of the t-test statistic reflects the magnitude of changes associated with the noise factor. All the main effect t-test statistics are positive and this simply indicates that the presence of a high noise level creates more damage than that of a low noise level. For each two-way interaction, the t-test statistic is the standardized difference between the estimated cell mean when both high noise factors are present and that cell mean predicted based on outcomes from individual noise factors, assuming no interaction. A positive t-test statistic indicates a “synergistic” interaction; that is, the damage caused by the presence of both noise factors is worse than the additive effect from individual noise factors. A negative t-test statistic stands for an “antagonistic” interaction— the opposite of “synergistic” interaction. Finally, throughout, the experimental unit is the individual spot in each array. 4.Experimental ResultsAs noted in the introduction, signal-detection algorithms can recover the true signal more easily for images with less severe levels of noise. Thus, when comparing experiments 1A to 4A with experiments 1B to 4B, with the noise level 0 (less severe) and noise level −1 (more severe), respectively, we expect that the true gene expression can be more accurately estimated in experiments 1A to 4A. This means that for data with more noise (−1; experiments 1B to 4B) the difference between the estimated and true expression ratio is greater. This is shown in Fig. 11, where, for all experiments, the distributions of these absolute differences at their most extreme noise level (all 0, or all −1) in log-log scale are presented by box plots. The top and bottom edges of each box correspond to the upper and lower quartiles of the measurements, respectively. The solid dots in the middle give the locations of the medians. Figure 11 clearly shows that the medians and upper quartiles of 1B to 4B are larger than the corresponding medians and upper quartiles of 1A to 4A. In this paper our interest goes beyond this general statement; it is to determine the kinds of noise reduction that significantly affect signal estimates. For instance, in experiment 1, concerning background noise, if there is a significant difference between levels −1 and 1 for the parabolic background factor (p<0.0033), then lessening the curvature of the parabolic background significantly improves estimation at level α=0.05. We reach this conclusion because the response for the factorial experiment is the absolute difference between the estimated and actual signal values. Let us consider experiment 1 in detail, the results being given in Table 4. The four columns of the table correspond to experiment 1B for all signal levels, 1B for low signal levels, 1A for all signal levels, and 1A for low signal levels. For each experiment, data in the low signal level comprise the one-third of the original data points whose true signal values are in their lower tertile. We have considered low signal levels as a case in their own right (besides being included among all signal levels) because signal detection is made more difficult when a signal is low. The table is broken into main effects and interactions. For experiment 1B (level −1 versus level 1) using all signals, all five effects are significant. This means that reducing any of these effects can be helpful. They are also all significant for low signals. Note that all five factors in the experiment directly affect pixel values, either raising or lowering them for the affected pixels, and the difference in degrees between levels −1 and 1 significantly affects signal estimation. The magnitude of the t-test statistics suggests that the high outlier and spike noise levels are more damaging to the image than the others. Table 4
If we now consider experiment 1A (level 0 versus level 1) for all signals, both spike and snake effects become insignificant. This means that, relative to snake or spike noise, signal estimation is not significantly different at these two levels. Looking at the fourth column, we see that spike noise is still significant for level 0 versus level 1 for low signals. For these, there is a significant difference in performance of the algorithm relative to spike noise. Interpretation of interactions can often be difficult, but in some cases it can be revealing. For instance, confining ourselves to the case of all signal levels, in experiment 1B we see that there is interaction between the signal-to-background noise and the parabolic effect. This is not surprising because the ratio is affected by the background. The interaction of the outlier effect and spike noise is also reasonable since both produce extreme values on the microarray. The large positive t-test statistic suggests a strong “synergistic” interaction effect throughout all four scenarios. A similar type of interaction is observed when both signal-to-background noise and parabolic-background noise levels are high. Figures 12 and 13 illustrate the mixed visual effects between signal-to-background noise and parabolic-background noise and spike noise, respectively, with the underlying true spot-intensity distributions being the same in each part and with only the noise factors contributing to the differences. For experiment 2 (shape noise), in Table 5 we see that four of the factors are significant for experiment 2B, for all signals or just low signals. Among them, the strongest factors are the inner hole size and the foreground noise. The effect of foreground noise is similar to background noise in that it directly affects pixel values. The effect of low spot radius, large inner hole size, and excessive chord removal is to lessen the signal area, thereby reducing the pixel area over which the signal is to be estimated. Chord removal is not significant in experiment 2A for low signals, which means that at level 0 there is insufficient chord removal to significantly affect signal estimation relative to level 1. The fact that edge noise is not significant in experiment 2B indicates that the imaging algorithm can deal equally well with spot detection at both levels relative to handling edge noise. Table 5
There is an apparent anomaly with regard to edge noise in experiment 2A: edge noise is significant relative to levels 0 and 1, but not with respect to levels −1 and 1. This phenomenon is an “apparent” anomaly because one cannot compare p values across different experiments with full confidence—although we often do make such comparisons in a heuristic mode. Recall that the denominator of the F-statistic contains a variance estimator, and therefore a low variance will tend to make the F-statistic significant. Because the variance is very low in experiment 2A in contrast to experiment 2B, significance in the former and lack of significance in the latter is a reasonable consequence and does not imply that the difference in damage between two levels in experiment 2B is less than that in 2A. The damage effect of edge noise starts to show in 2A when the effects of inner hole size and foreground noise are not as dominating as they are in 2B. In experiment 2B, the effect of edge noise is still present in its significant interaction with both spot radius and chord noise. Figure 14 shows the mixed visual effect between the spot radius and chord noise. Regarding interaction in experiment 2B, the three distinctly geometric factors (spot radius, inner hole, and chord noise) interact significantly for both the overall signal and low-signal cases. This is reasonable because each affects the area over which signal estimation takes place. Interaction is greatly reduced in experiment 2A, particularly for low signals, where only interaction between the inner hole and chord removal is strongly significant. Figures 15 and 16 show the mixed visual effects of the inner hole with spot radius and chord noise, respectively. Whereas experiment 2 mixes shape effects with foreground noise and edge noise, experiment 3 mixes them with scratch and snake noise. Table 6 shows a fair amount of consistency between the two experiments with regard to the three geometric factors relative to both main effects and interaction. One notable change is that the interaction between spot radius and chord removal changes from being “antagonistic” in experiment 2 to being “synergistic” in experiment 3. Even though the order of estimated cell means in the four noise level combinations remains the same in both experiments, in experiment 3 the estimated cell mean when both noise factors are present is much higher than in the other three; consequently, a significant “synergistic” interaction is observed. For the most part, snake and scratch noise show no significant main effects. The exception is scratch noise for low signals in experiment 3B. This is quite plausible because scratch noise causes a strip of low values, thereby reducing an already low signal. Note also the interaction of snake and scratch noise in three of the four experiments. Table 6
Experiment 4 concerns signal conditions, in particular, signal deviation, signal-to-background ratio, and foreground noise. These conditions are bound to affect signal estimation, and the main-effects part of Table 7 demonstrates this. The only exception is for low-signal values when comparing levels 0 and 1 in experiment 4A. Since signal deviation is tied to the signal mean, a low signal diminishes this deviation and signal deviation is not significant for low signal values. Figure 17 shows the mixed visual effects between signal-to-background and spike noise. As has been common throughout, overall interaction between the factors is much less relative to levels 0 and 1 than with respect to levels −1 and 1. Table 7
5.ConclusionFactorial analysis has been applied to simulated microarray images to study the effects and interaction of noise types at different noise levels. This type of analysis provides a general paradigm for investigating the effects of noise within a comprehensive simulation environment, thereby providing a tool by which one can quantitatively determine which kinds of noise should be mitigated in microarray technology. For instance, from the analysis described in this paper, it can be concluded that elimination of the inner hole and the stabilizing of spot radius will have a strongly beneficial effect on signal estimation. Additional information can be found online.14 AppendixParameter settings for the microarray simulation. The notation N(a,b) denotes the normal distribution with mean a and variance b; U[a,b] is the uniform distribution on the interval [a,b]; U{a,b,c,…} is the uniform distribution on the indicated set of values; Β(a,b) is the beta distribution with parameters a and b; and exp(a) is the exponential distribution with mean a.
AcknowledgmentsY.B. was supported by the Center for Environmental and Rural Health at Texas A&M University. E.R.D. was supported by the National Human Genome Research Institute. D.V.N. was supported by the National Cancer Institute (CA-90301). R.J.C. was supported by a grant from the National Cancer Institute (CA-57030), and by the Texas A&M Center for Environmental and Rural Health via a grant from the National Institute of Environmental Health Sciences (P30-ES09106). N.W. was supported by the National Cancer Institute (CA-74552). REFERENCES
M. Schena
,
D. Shalon
,
R. W. Davis
, and
P. O. Brown
,
“Quantitative monitoring of gene expression patterns with a complementary DNA microarray,”
Science , 270 467
–470
(1995). Google Scholar
M. N. Arbeitman
,
E. E. Furlong
,
F. Imam
,
E. Johnson
,
B. H. Null
,
B. S. Baker
,
M. A. Krasnow
,
M. P. Scott
,
R. W. Davis
, and
K. P. White
,
“Gene expression during the life cycle of Drosophila melanogaster,”
Science , 297 2270
–2275
(2002). Google Scholar
S. Chu
,
J. DeRisi
,
M. Eisen
,
J. Mulholland
,
D. Botstein
,
P. O. Brown
, and
I. Herskowitz
,
“The transcriptional program of sporulation in budding yeast,”
Science , 282 699
–705
(1998). Google Scholar
J. DeRisi
,
L. Penland
,
P. O. Brown
,
M. L. Bittner
,
P. S. Meltzer
,
M. Ray
,
Y. Chen
,
Y. A. Su
, and
J. M. Trent
,
“Use of a cDNA microarray to analyse gene expression patterns in human cancer,”
Nat. Genet. , 14 457
–460
(1996). Google Scholar
I. S. Lossos
,
A. A. Alizadeh
,
M. Diehn
,
R. Warnke
,
Y. Thorstenson
,
P. J. Oefner
,
P. O. Brown
,
D. Botstein
, and
R. Levy
,
“Transformation of follicular lymphoma to diffuse large-cell lymphoma: alternative patterns with increased or decreased expression of c-myc and its regulated genes,”
Proc. Natl. Acad. Sci. U.S.A. , 99 8886
–8891
(2002). Google Scholar
Y. Chen
,
E. R. Dougherty
, and
M. L. Bittner
,
“Ratio-based decision and quantitative analysis of CDNA microarrays,”
J. Biomed. Opt. , 2
(4), 364
–374
(1997). Google Scholar
Y. Chen
,
V. Kamat
,
E. R. Dougherty
,
M. L. Bittner
,
P. S. Meltzer
, and
J. Trent
,
“Ratio statistics of gene expression levels and application to microarray data analysis,”
Bioinformatics , 18
(9), 1207
–1215
(2002). Google Scholar
D. V. Nguyen
,
A. B. Arpat
,
N. Wang
, and
R. J. Carroll
,
“DNA microarray experiments: biological and technological aspects,”
Biometrics , 58
(4), 701
–717
(2002). Google Scholar
Y. Balagurunathan
,
E. R. Dougherty
,
Y. Chen
,
M. L. Bittner
, and
J. M. Trent
,
“Simulation of cDNA microarrays via a parameterized random signal model,”
J. Biomed. Opt. , 7
(3), 507
–523
(2002). Google Scholar
K. Kerr
and
G. A. Churchill
,
“Experimental design for gene expression microarrays,”
Biostatistics, 2 183
–202
(2001). Google Scholar
M. K. Kerr
and
G. A. Churchill
,
“Statistical design and analysis of gene expression microarrays,”
Genet. Res. , 77
(2), 123
–128
(2001). Google Scholar
|