Thursday, December 23, 2010

Charting histone modifications and the functional organization of genome

http://www.nature.com/nrg/journal/v12/n1/pdf/nrg2905.pdf

Friday, December 17, 2010

WNK and Diseases

http://hyper.ahajournals.org/cgi/reprint/51/3/588

http://www.ncbi.nlm.nih.gov/pubmed/18547946

http://www.ncbi.nlm.nih.gov/pubmed/18843116

http://www.ncbi.nlm.nih.gov/pubmed/16221868

http://www.ncbi.nlm.nih.gov/pubmed/19347040

http://www.ncbi.nlm.nih.gov/pubmed/18809789

http://www.ncbi.nlm.nih.gov/pubmed/18955660

Epigenetics: Let's run together

http://www.nature.com/nature/journal/v465/n7299/full/nature09230.html

DC emailed us this article and finally I would have an opportunity to work on epigenetics of complex diseases.

Thursday, December 16, 2010

Relative Risk (RR): The ratio of the probability in one group compared to the probability in another group. Although reported less often in SNP studies than odds ratios, the RR is more intuitive (and generally lower). Note that at least one editor of a scientific journal [1] has indicated that when judging whether to publish a paper with a new finding they hope to see an RR of three or more. Note that many SNP publications cited here in SNPedia do not meet this criteria. For a more detailed explanation of RR, see Wikipedia's RR entry.

Population attributable fraction

The percentage of cases of a disease in a given population that are theoretically explained by a certain genotype (or cause, such as exposure to a mutagen). To put it another way, this is how many occurences of a disease wouldn't occur if this genotype (or exposure) didn't exist. Example: by itself, even the most significant (SNP) genotype found so far for schizophrenia probably accounts for only 1-2% of schizophrenic diagnoses. Also known as the population attributable risk.

The population-attributable fraction (F_p) for each SNP was calculated as

where R_i indexes the estimate associated with heterozygous and homozygous carriage of risk-increasing genotypes compared to the normal heterozygote, and F_i denotes the genotype frequencies in the controls.

See http://www.nature.com/nature/journal/v466/n7302/full/nature09114.html#/affil-auth
See http://www.snpedia.com/index.php/Glossary

See http://www.apo-sys.eu/aposys/Publications/Publications2010-pdf/Prasad%20R.pdf

We calculated the population attributable fraction (PAF) as²⁶

where OR is the odds ratio and q is the proportion of exposed individuals (proportion of individuals with the risk allele) in the control group, which is the A allele of rs1934179 in DGKK.

See http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.721.html

The population genetic attributable risk percent (PAR) was estimated for each variant, which defines what percentage of the total risk for lung cancer is due to genetic effect of that variant:

$\text{[math]}$

where p_i is the prevalence of that i-th genotype associated with lung cancer among control subjects and OR_i is OR associated with that genotype (11, 12). We used the lowest-risk genotype as the reference to estimate ORs in the above logistic regression model with adjustment of covariates. Similarly, we also jointly estimated PAR for the two loci (rs1051730 and rs481134) using haplotype-specific ORs.

See http://cancerres.aacrjournals.org/content/70/8/3128.full
http://www.nejm.org/doi/suppl/10.1056/NEJMoa0810440/suppl_file/nejm_hirschfield_2544sa1.pdf
http://jech.bmj.com/content/55/7/508.full.pdf

GCTA: A Tool for Genome-wide Complex Trait Analysis

http://www.sciencedirect.com/science?_ob=MImg&_imagekey=B8JDD-51PYH57-2-1&_cdi=43612&_user=1072900&_pii=S0002929710005987&_coverDate=12%2F16%2F2010&_sk=%23TOC%2343612%239999%23999999999%2399999%23FLA%23display%23Articles_in_Press%23tagged%23Volume%23first%3D0%23date%23(16_December_2010)%23&view=c&_gw=y&wchp=dGLbVzz-zSkzS&md5=3b4e6c9e00c831a21c418c9a237b12c7&ie=/sdarticle.pdf

http://gump.qimr.edu.au/gcta/

Tuesday, December 14, 2010

How SYY Conducts a GWAS

http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.732.html

好好琢磨清楚这个例子，就很足够目前需要了。

他引用了一篇Nature2010文章（Ref 36），链接如下，也很有参考价值。

http://www.nature.com/nature/journal/v466/n7302/full/nature09114.html#/affil-auth

里面讲到怎么计算 The population-attributable fraction (F_p) for each SNP .

Statistical analysis.

PCA was conducted using EIGENSTRAT software. The first two eigenvectors generated were selected for plotting, and an EIGENSTRAT procedure was used to generate adjusted statistics for GWAS. Genome-wide association analysis at the single-marker level and Hardy-Weinberg equilibrium analysis were performed using PLINK. In the replication study, allelic association analysis was conducted using SHEsis³⁵.

In the logistic regression analysis considering age and BMI, we used the following model to fit the data: Y = b₀ + (b₁ × ADD) + (b₂ × age) + (b₃ × BMI) + e, where Y represents the phenotype (1 for disease; 0 for normal), ADD represents the additive effects of allele dosage (minor allele) and e represents the random error effect.

Conditional logistic regression was used to test for independent effects of an individual SNP. The basic principle³⁶ is that when several SNPs clustered together are all significantly associated with a trait, two alternative explanations exist. First, LD between the alleles may account for the association of each SNP. In such a scenario, a logistic regression analysis conditioning on any one of the clustered SNPs will remove evidence of association for the other SNPs. Alternatively, the effects of the SNPs may be statistically independent, residing on distinct haplotypes that are independently inherited. In this case, conditioning on one SNP will not change the effect estimate of the other. In our study, for each SNP among the significantly associated SNPs within 2p21 and 9q33.3, we compared the original estimate to an adjusted estimate obtained by entering other SNPs as covariates. SNPs were added by forward selection, one by one based on significance, using the model Y = b₀ + (b_a × SNP_add) + (b₁ × SNP₁) + ... + (b_n × SNP_n) + (b_s × stage) + e, where SNP_add indicates the SNP being added for testing of its independence with SNP₁ through SNP_n. If the estimated P-value of SNP_add was less than 0.05, we considered the effect of SNP_add to be independent from the effects of SNP₁ through SNP_n.

Haploview was used for the genome-wide P-value plot (Fig. 1)³⁷.The Q-Q plot was created using the R qq.plot function³⁸. The regional plots were generated using LocusZoom (see URLs). The GWAS and replication data were then combined using meta-analysis. The meta-analysis was conducted using the R 'meta' package. The heterogeneity across the three stages was evaluated using a Q-statistic P-value. The Mantel-Haenszel method was used to calculate the fixed effect estimate³⁹.

To compare the clinical phenotypes in the PCOS patients with different genotypes, one-way analysis of variance was used to analyze the key variants of BMI, testosterone and HOMA-IR. Calculations were performed using SPSS 16.0 (SPSS Inc.). Data were expressed as the mean ± s.d. Continuous variables were tested for distribution using a histogram in which abnormal values were excluded. The least significant difference (LSD) test was used for post hoc analysis. Appropriate transformations were applied (logarithmic, sine or square) as needed to ensure homogeneity of variance. For subgroups with a significant difference in age, covariance analysis was used to control for the age effect in a general linear model. The level of statistical significance was set at a P < 0.05 for all statistical analyses.

Wednesday, December 1, 2010

RNA-Seq: a revolutionary tool for transcriptomics

http://www.nature.com/nrg/journal/v10/n1/full/nrg2484.html

A genome-wide association study of global gene expression: NG 2007

http://www.nature.com/ng/journal/v39/n10/full/ng2109.html

Genetic variants regulating ORMDL3 expression in asthma: Nature 2007

http://www.nature.com/nature/journal/v448/n7152/full/nature06014.html#online-methods

Association testing

Tests of Hardy–Weinberg equilibrium were performed in cases and controls using the genhw procedure (http://www.biostat-resources.com/stata/) and Stata version 9.2, and SNPs showing Hardy–Weinberg disequilibrium in controls (χ² > 25) were excluded. As the data comprised a mixture of unrelated and related cases and controls, we used logistic regression models with robust sandwich estimation of the variance¹⁵ as implemented in the Stata logit function to model clustering of siblings’ genotypes. Simulations using the MRC-A family structures (data available on request) confirmed that this method appropriately controls the Type I error. Heterogeneity of association between the two main strata (UK and Germany) was assessed by a weighted linear combination test using the results of an additive-effects-only regression analysis within each stratum. X-linked markers were analysed by fitting an additive-effects-only logit model that equates the risks of male hemizygotes with female homozygotes. The TRANSMIT program¹⁶ was used to analyse nuclear family data (including parental genotypes), using the sandwich variance estimation option to robustly incorporate information from multiple affected siblings; confidence intervals for odds ratio estimates were computed as described. The false-discovery rate (FDR) method⁵ was used to assess the overall statistical significance of the genome-wide association results, taking into account the multiple hypothesis testing implications inherent in the analysis of more than 300K SNPs. The FDR thresholds were calculated by applying the QVALUE (http://faculty.washington.edu/%7C[sim]%7Cjstorey/qvalue/) software package¹⁷.

Association to transcript abundances

Data from the gene expression experiment were normalized together using the RMA package^18,¹⁹ to remove any technical or spurious background variation. An inverse normalization transformation step was also applied to each trait to avoid any outliers. Association analysis was applied with Merlin (FASTASSOC option)²⁰. We estimated an additive effect for each SNP and tested its significance using a score test that adjusts for familiality and takes into account uncertainty in the inference of missing genotypes. In the absence of a positive genomic control test, we did not adjust for stratification. We probabilistically inferred missing genotypes²¹ and adjusted for familiality, but not for linkage signal.

Understanding human gene expression variation: April 2010

Caucasian population
http://www.nature.com/nature/journal/v464/n7289/full/nature08903.html

Nigerian individuals
http://www.nature.com/nature/journal/v464/n7289/full/nature08872.html

Genome-wide association studies in nephrology research: Review

Am J Kidney Dis. 2010 Oct;56(4):743-58. Epub 2010 Aug 21.

Genome-wide association studies in nephrology research.

Köttgen A.

Renal Division, University Hospital Freiburg, Germany. anna.koettgen@uniklinik-freiburg.de

eQTL Site at U of Chicago

http://eqtl.uchicago.edu/Home.html

[eQTL Browser] Browse eQTLs identified in recent studies in multiple tissues.

Tuesday, November 30, 2010

Concept: Effect Size

EFFECT SIZE

The extent to which a factor influences the risk of the condition under study, rather than simply an indication of whether a factor is significantly related to the condition.

by Lon R. Cardon, NATURE REVIEWS | GENETICS, VOLUME 5 | FEBRUARY 2004 | 89

Several factors, such as genetic and phenotypic complexity, environmental influences, sub-optimal sampling and data overinterpretation, have been cited as contributors to the lack of success in detecting complex trait loci. Although these and other factors are almost certainly to blame, as a first approximation it is useful to consider the complex trait system in terms of four framework parameters:

the EFFECT SIZE of a disease locus;

the frequency of the disease allele(s);

the frequency of the marker allele(s);

and the extent of LD between the marker and disease locus.

These four parameters are the result of the more subtle hallmarks of MULTIFACTORIAL DISEASES, including interactions among disease loci that are related to effect size or the fact that the disease allele frequencies might reflect a range of mutations at the same locus. This set of parameters provides a convenient summary of the basic aim of the association-study design: to correlate genotypes and disease phenotypes that are obtained from a sample of individuals. In this review, we discuss how we can optimize our chances of finding complex disease associations by examining the interplay of the factors that influence the size of an observed association and, therefore, our ability to find associations. We focus here on genetic variants that are amenable to detection in population-based association studies; that is, we exclude those that are so rare (<0.01–0.001 in frequency) that only family-based studies would realistically provide sufficient numbers of cases to explore the association.

The case–control study and its effect size

The standard measure of effect in the case–control study is the odds ratio (OR), defined as the odds of exposure among cases divided by the odds of exposure among controls.

The OR provides a good approximation of the relative risk of the risk factor in question (that is, the ratio of risk of disease in people with the risk factor to that in people without) if sampling for the case–control study mimics that of a prospective cohort study.

Saturday, November 27, 2010

Wellcome Trust Lectures

http://www.well.ox.ac.uk/lectures

Wednesday, November 17, 2010

a wonderful site

http://manuals.bioinformatics.ucr.edu/home