Motivation DNA copy number aberrations (CNAs) and gene expression (GE) changes provide valuable information for studying chromosomal instability and its consequences in cancer. several loci are associated with cancer-type buy 65899-73-2 specific biological pathways that have been described in the literature: CNAs of chromosome (chr) 7p13 were significantly correlated with epidermal growth factor receptor signaling pathway in glioblastoma multiforme, chr 13q with NF-kappaB cascades in bladder cancer, and chr 11p with Reck pathway in breast cancer. In all three data sets, gene sets related to cell cycle/division such as M phase, DNA replication, and cell division were also associated with CNAs. Our results suggest that CNAs are both directly and indirectly correlated with changes in expression and that it is beneficial to examine the indirect effects of CNAs. 1. INTRODUCTION Nearly all cancers are caused by abnormalities in the DNA (Vogelstein and Kinzler, 2004). Structural changes of chromosomal regions such as aneuploidies, translocations, copy number aberrations (CNAs), and point mutations have been observed in various tumors (Lengauer (2002) analyzed a set of aCGH and GE profiles from the same buy 65899-73-2 14 breast cancer cell lines hybridized on cDNA microarrays. They calculated the mean difference in gene expression between samples with and without amplifications divided by standard MEN2B deviations for each gene and compared with those from random permutations for estimating statistical significance. They reported that 44% of the buy 65899-73-2 highly amplified genes (>2.5 in copy number ratio) were up-regulated and that the percentage decreased with a lower level of amplification. Using the same statistical method, Jarvinen (2006) analyzed CNAs and GEs from laryngeal squamous cell carcinoma cell line and found that 39% of amplified regions were up-regulated and 14% of deleted regions were down-regulated. These percentages decrease in the primary tumors: only 18% of amplified regions are up-regulated and there were no changes in the deleted regions. Chaudhary and Schmidt (2006) stimulated the prostate cancer cell line DU145 with serum and found that a large proportion of genes in deleted regions were down-regulated, but most genes in amplified regions did not show any change in GE. Although different tumor types and quantification methods can give varied estimates, these results clearly demonstrate the high impact of copy number in the transcription of those genes contained in the aberration. This direct relationship between structural changes in the DNA and gene expression has been used to identify or verify candidate cancer genes and pathways (Soroceanu (2007) observed in glioblastoma that the DNA loss in PTEN, a known oncogene located in chr 10, is highly correlated with over-expression of IGFR or EGFR, both of which are located away from chr 10. In the following, we call the relationships between CNAs and GE in the same location as a interaction and those in the different locations as an one. In the current study, we investigate both the direct and indirect relationships between structural changes by measured by aCGH and functional changes measured by expression arrays, by analyzing three data sets in which both the copy number and expression were available. For this type of integration, there are several difficulties to overcome. The first is that the choice of buy 65899-73-2 data sets is limited. While both aCGH and expression data sets are plentiful, paired data sets with both DNA and RNA data on the same set of patients are scarce. It is possible to infer relationships from unpaired data sets, but that process is prone to false positives. The second issue is that the probes in the two platforms generally vary greatly, both in array type and in resolution. The newer aCGH arrays have oligonucleotide probes with much higher resolution, but the arrays in the data sets we use are two channel arrays using Bacterial Artificial Chromosomes (BACs) and thus have a low resolution, on the order of buy 65899-73-2 1 1 MB. The platforms for expression data, on the other hand, are generally oligonucleotide arrays with higher resolution. Reconciling between the two requires resolving the many-to-one or one-to-many mappings in each chromosomal segment and may require judicious averaging of the probe values in the higher resolution platform. The third difficulty is that many genes are co-expressed and that CNAs occur simultaneously in multiple locations (Chin and that are highly correlated, using a biclustering approach. Biclustering has been popular in expression profiles studies as it attempts to find a subset of genes having similar expression patterns under a group of conditions. Such an entity is often called a module. For a comparison of various.