DNA microarray and next-generation sequencing provide data that can be used

DNA microarray and next-generation sequencing provide data that can be used for the genetic analysis of multiple quantitative characteristics such as gene expression levels, transcription factor binding profiles, and epigenetic signatures. and RM11-1a) based on DNA microarrays [1,2,9,10,11]. In order to generate a matched dataset of chromatin convenience for this set of yeast individuals, we carried out Formaldehyde-Assisted Isolation of Regulatory Elements followed by sequencing (FAIRE-seq) for a total of 96 segregants from your cross of BY4716 and RM11-1a [12]. In this study, we sought to dissect the genetic architecture of the regulation of gene expression and chromatin convenience by analysing previous data generated in yeast and human based on different technical platforms and experimental designs. Our main goal was to find differences in the overall regulatory structure between open chromatin and gene expression. We were also interested to determine whether the two distant species, namely yeast and human, would be different in genetic regulatory architecture and to estimate the effect of the technical or experimental differences in genotyping and measuring the quantitative characteristics. Methods Processing of human genotype data Genotype data from your HapMap Rabbit polyclonal to COXiv project [13] and 1000 Genomes Project [14] for 70 Yoruba (YRI) lymphoblastoid cell lines were utilized for DNase-seq analysis [8]. The genotype of each single nucleotide polymorphism (SNP) locus was estimated based on the Bayesian framework by means of the BIMBAM tool [15] and the genotype estimates were made available at http://eqtl.uchicago.edu/dsQTL_data/GENOTYPES/. We first selected 2,157,286 genetic markers (SNPs) with the minor allele frequency greater than 30%. To reduce complexity and ease interpretation, we focused on the genetic variants that can switch the function of the protein (non-synonymous SNPs) or the large quantity of the protein (SNPs associated with the expression level of a nearby gene). The SIFT tool [16] was used to identify non-synonymous SNPs. We performed expression QTL mapping as explained below and recognized SNPs that were associated (p < 10-5) in (within 200 kb from your nearest gene). Taken together, 7,211 SNPs were recognized for QTL mapping. Processing of human gene expression data RNA-seq data for 69 YRI lymphoblastoid cell lines [5] were downloaded from http://eqtl.uchicago.edu/RNA_Seq_data/results. A total of 18,147 genes were used after normalization to zero imply and unit variance. Processing of human chromatin convenience data DNase-seq data for 70 YRI lymphoblastoid cell lines [8] were downloaded from http://eqtl.uchicago.edu/dsQTL_data/MAPPED_READS/. Sequence reads from multiple replicates for each sample were combined and F-Seq [17] was run to identify the peaks of the reads from each sample. Statistical significance of the peak was determined by fitting the data to a gamma distribution to obtain the p-value (script obtained from the F-Seq authors). p < 10-3 was used to identify significant peaks from each sample. The overlapping peaks across the YRI individuals 124832-26-4 were merged into a single peak by using the 124832-26-4 mergeBED command of BEDTools [18], resulting in a total of 265,130 accessible chromatin regions. For each sample, the number of the DNase-seq reads mapped to each region was counted and the go through count was normalized as previously suggested [19,20] to obtain normalized chromatin convenience, which was then further normalized to zero mean and unit variance across the YRI samples. Accessible regions falling on promoters or enhancers were recognized based on chromatin annotation by Ernst et al. [21]. A total of 45,781 chromatin 124832-26-4 regions were found to reside in active promoters, poor promoters, poised promoters, strong enhancers, and poor enhancers annotated in the GM12878 lymphoblastoid cell collection. Processing of yeast data Genotype and gene expression microarray data [10] used in previous expression QTL studies [1,2,9] for >100 segregants from a cross between two parental strains of yeast (BY4716 and RM11-1a) were obtained. As previously suggested [22], adjacent genetic markers with less than three genotypic mismatches across the yeast 124832-26-4 strains were merged into the 124832-26-4 average genotype profile, resulting in 1,533 unique markers. We employed the microarray dataset of normalized expression levels of 5,352 genes as previously used [10]. FAIRE experiments were performed based on the published protocol [23]. The FAIRE-seq data for the 96 yeast strains from our previous work [12] is usually available at the Gene Expression Omnibus (GEO) database with accession number “type”:”entrez-geo”,”attrs”:”text”:”GSE33466″,”term_id”:”33466″GSE33466. Briefly, we identified open chromatin regions in 96 yeast segregants by means of F-Seq [17]. The overlapping peaks across the 96 strains were merged into a single peak by using BEDTools [18], resulting in a total of 7,527 accessible.