Tomokazu Konishi (firstname.lastname@example.org)
Faculty of Bioresource Sciences, Akita Prefectural University, Shimo-Shinjyo, Akita 010-0195, Japan
Gene expression microarray data often include problems caused by uneven hybridization and dust contamination. Such problems should be removed prior to analysis to prevent degradation of analytical accuracy and false positive results. This paper presents a parameter-scanning algorithm to detect such defects on the basis of the character of data distributions. The cell data is thoroughly scanned using a window algorithm, and windows with an index value greater than a threshold are recognized as defects and removed from the array data. The index is found from the differences between the target and an ideal standard of hybridization obtained as a trimmed mean among experiments, representing the statistical center of differences in each section. The threshold is derived as a screening level designated by the operator, but has only limited effect on the effectiveness of data cancellation. The validity of the algorithm and the effects of data cancellation are tested using GeneChip data obtained from a series of experiments. The algorithm is demonstrated to greatly improve the reproducibility of measurements, and removes only a small number of faultless data.