Tuesday, December 14, 2010

How SYY Conducts a GWAS

http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.732.html

好好琢磨清楚这个例子,就很足够目前需要了。

他引用了一篇Nature2010文章(Ref 36),链接如下,也很有参考价值。

http://www.nature.com/nature/journal/v466/n7302/full/nature09114.html#/affil-auth

里面讲到怎么计算 The population-attributable fraction (Fp) for each SNP .

Statistical analysis.

PCA was conducted using EIGENSTRAT software. The first two eigenvectors generated were selected for plotting, and an EIGENSTRAT procedure was used to generate adjusted statistics for GWAS. Genome-wide association analysis at the single-marker level and Hardy-Weinberg equilibrium analysis were performed using PLINK. In the replication study, allelic association analysis was conducted using SHEsis35.
In the logistic regression analysis considering age and BMI, we used the following model to fit the data: Y = b0 + (b1 × ADD) + (b2 × age) + (b3 × BMI) + e, where Y represents the phenotype (1 for disease; 0 for normal), ADD represents the additive effects of allele dosage (minor allele) and e represents the random error effect.

Conditional logistic regression was used to test for independent effects of an individual SNP. The basic principle36 is that when several SNPs clustered together are all significantly associated with a trait, two alternative explanations exist. First, LD between the alleles may account for the association of each SNP. In such a scenario, a logistic regression analysis conditioning on any one of the clustered SNPs will remove evidence of association for the other SNPs. Alternatively, the effects of the SNPs may be statistically independent, residing on distinct haplotypes that are independently inherited. In this case, conditioning on one SNP will not change the effect estimate of the other. In our study, for each SNP among the significantly associated SNPs within 2p21 and 9q33.3, we compared the original estimate to an adjusted estimate obtained by entering other SNPs as covariates. SNPs were added by forward selection, one by one based on significance, using the model Y = b0 + (ba × SNPadd) + (b1 × SNP1) + ... + (bn × SNPn) + (bs × stage) + e, where SNPadd indicates the SNP being added for testing of its independence with SNP1 through SNPn. If the estimated P-value of SNPadd was less than 0.05, we considered the effect of SNPadd to be independent from the effects of SNP1 through SNPn.

Haploview was used for the genome-wide P-value plot (Fig. 1)37.The Q-Q plot was created using the R qq.plot function38. The regional plots were generated using LocusZoom (see URLs). The GWAS and replication data were then combined using meta-analysis. The meta-analysis was conducted using the R 'meta' package. The heterogeneity across the three stages was evaluated using a Q-statistic P-value. The Mantel-Haenszel method was used to calculate the fixed effect estimate39.

To compare the clinical phenotypes in the PCOS patients with different genotypes, one-way analysis of variance was used to analyze the key variants of BMI, testosterone and HOMA-IR. Calculations were performed using SPSS 16.0 (SPSS Inc.). Data were expressed as the mean ± s.d. Continuous variables were tested for distribution using a histogram in which abnormal values were excluded. The least significant difference (LSD) test was used for post hoc analysis. Appropriate transformations were applied (logarithmic, sine or square) as needed to ensure homogeneity of variance. For subgroups with a significant difference in age, covariance analysis was used to control for the age effect in a general linear model. The level of statistical significance was set at a P < 0.05 for all statistical analyses.

No comments:

Post a Comment