|
GWAS GUI v 0.0.2 Tutorial
README
README
GWAS GUI is a general tool to provide graphical overviews of whole-genome association studies of data
sets with various phenotypes for other users. Generally, the trait could be any outcome interested in
medical science, such case-control indicator, expression values, heights, blood pressure and many
other continuous or categorical measurements. Once the users have investigated the relations between different
traits and SNPs in the genome-wide scans, they could use this program to visualize their results and search
for useful conclusions. After loading the analysis results into the browser, the user can either browse
the results in a genome-wide view of a single trait or compare the signals for different traits in a
specific region graphically. The program is especially useful for GWAs of high-dimensional gene expression data
which contain tens of thousands of phenotypes as these datasets are usually far too big to explore using
conventional tools.
To browse your own dataset, you will first need to prepare a set of input text files, three required and one
optional. All the files should be tab delimited and start with a header row. Each file should start with the
columns described below; the SNP information and results file can also include any number of user-defined columns.
File |
Description |
result.txt |
A text file containing the association results of GWAS. Each row represents should start with a trait and
marker label. These two labels should be followed by a set of numeric statistics, defined in the header row.
For large datasets, it may be convenient to filter this file so that it only includes interesting SNPs
for each trait (e.g. those that are associated with p < .05 with that particular trait).
Sample header: Trait snp statistic1 statistic2 statistic3 ... |
snpinfo.txt |
A text file containing location information and, optionally, additional annotation for
all the SNPs in result.txt. Each row should start with marker name, chromosome (1, 2, 3,..., 22, X),
position, and a series of descriptive labels for the SNP (including for example, allele labels and
frequency information and the names of nearby SNPs).
Position should be in megabases with at least 1 digit after decimal.
Sample header: SNP chr pos col1 .... |
traitinfo.txt |
A text file containing the trait annotation. Each row should start with the trait name (to match result.txt)
and can include a series of descriptive labels or annotations (for example, trait heritability, number of
phenotyped individuals, trait mean, and trait variance are common types of useful information).
Sample header : trait anno1 anno2 |
geneinfo.txt |
An optional text file containing position information for interested genes. The program will plot those genes for biological
inference.
Sample header : Gene Chr StartPos EndPos |
traitgroup.txt |
An optional text file containing grouping information for the different traits. This file makes it easier to
browse large datasets. Each row should contain just two labels, a trait label and a group label.
Sample header : trait group |
Example 1:
This example shows the association results in a whole-genome association study of global gene expression.
The association results contain 522 probes and 27823 SNPs. After filtering, there are about 40,000 rows.
Click the caption to download the whole file.
File: result.txt |
Probe SNP LOD Pvalue Effect H2
1007_s_at rs10159998 3.391 7.76e-05 0.387 4.31
1007_s_at rs10185841 3.309 9.48e-05 -0.333 4.45
1007_s_at rs10520650 3.527 5.57e-05 -0.448 4.88
1007_s_at rs1166246 3.005 1.99e-04 0.492 4.03
1007_s_at rs11795624 3.242 1.12e-04 -0.244 2.6
1007_s_at rs11888779 3.542 5.37e-05 -0.668 4.26
1007_s_at rs1241650 3.313 9.38e-05 -0.544 4.19
1007_s_at rs12527958 4.255 9.57e-06 0.38 6.26
1007_s_at rs12537492 3.243 1.11e-04 0.368 4.26
1007_s_at rs12542660 3.099 1.58e-04 0.319 4.45
... ... ... ... ...
|
File: snpinfo.txt |
SNP chr Pos Allele
rs10509971 10 114.981618 A
rs7580303 2 2.065249 C
rs7527281 1 213.591486 C
rs1358064 7 86.58632 G
rs4237768 11 5.963848 G
... ... ... ...
|
File: traitinfo.txt |
Probe Mean Variance heritability chr StartPos
1007_s_at 6.04092 0.09056 0.33814 6 30964144
1053_at 7.5231 0.17694 0.46053 7 73090740
117_at 4.67898 0.19058 0.03757 1 158307503
121_at 6.55123 0.07555 0.47644 2 113691169
1255_g_at 2.67409 0.02049 0.06873 6 42248919
... ... ... ... ...
|
File: traitgroup.txt |
Probe Gene
1552590_a_at ABCC12
1552582_at ABCC13
1552583_s_at ABCC13
1552470_a_at ABHD11
1552800_at ABHD11
1552615_at ACACB
1552616_a_at ACACB
1552519_at ACVR1C
1552579_a_at ADAM21
1552266_at ADAM32
1552725_s_at ADAMTS17
... ...
|
File: geneinfo.txt |
geneName chrom txStart txEnd
OR4F5 1 0.058953 0.059871
OR4F3 1 0.357521 0.35846
OR4F16 1 0.357521 0.35846
OR4F29 1 0.357521 0.35846
OR4F3 1 0.610958 0.611897
OR4F16 1 0.610958 0.611897
OR4F29 1 0.610958 0.611897
SAMD11 1 0.850983 0.869824
NOC2L 1 0.869445 0.884542
... ...
|
Example 2:
The example shows the result of a recent meta analysis of genome wide associations scans
for HDL-C, LDL-C and triglycerides. Willer et al. (2008) and Kathiresan et al. (2008).
In the dataset, we include all the result with p-value < 0.1.
There are 3 traits and each trait is associated with about 300K SNPs.
(Click the caption to download the whole file)
File: result.txt |
Trait Marker Zscore P-value Weight
HDL rs8070048 3.466 0.0005276 8656
HDL rs631797 1.838 0.06604 8656
HDL rs547177 -1.701 0.08887 8656
... ... ... ... ...
LDL rs11756925 1.902 0.05714 8589
LDL rs7158516 -2.431 0.01507 8589
LDL rs2867749 2.067 0.03873 8589
... ... ... ... ...
TG rs11164418 -1.722 0.08507 8684
TG rs11766819 1.886 0.05926 8684
TG rs2108487 -1.755 0.07932 8684
... ... ... ... ...
|
File: snpinfo.txt |
rsID chromosome position Allele1 Allele2
rs10 7 92.221823 a c
rs10000029 4 138.905073 t c
rs10000042 4 5.288052 t c
rs1000005 21 33.35492 c g
rs10000064 4 128.02907 t c
rs10000068 4 36.600681 t c
rs10000075 4 179.725904 t c
... ... ... ...
|
File: traitinfo.txt |
Trait Annotation
HDL HDL-Colesterol
LDL LDL-Cholesterol
TG triglycerides
|
Example 3:
The example shows the result of a genome-wide association study with case-control design.
A1 is some interesting disease. The file has the association results for 300k SNPs.
(Click the caption to download the whole file)
File: result.txt |
Trait SNP lod pvalue statistic1 statistic2
A1 rs12709155 6.050 1.305e-07 0.047 1.408
A1 rs4799778 5.356 6.813e-07 0.044 1.123
A1 rs6997978 4.653 3.678e-06 0.034 0.982
A1 rs11834921 4.579 4.392e-06 0.068 1.031
A1 rs4799369 4.541 4.812e-06 0.042 0.876
A1 rs4799368 4.539 4.829e-06 0.042 0.876
A1 rs5746945 4.458 5.866e-06 0.043 0.915
A1 rs1013465 4.458 5.867e-06 0.032 1.018
A1 rs12920222 4.445 6.065e-06 0.044 1.026
A1 rs12127789 4.435 6.203e-06 0.055 0.994
... ... ... ... ...
|
File: snpinfo.txt |
SNP chr Pos
rs4880781 10 0.165653
rs12146291 10 0.178805
rs7476901 10 0.186639
rs9419431 10 0.192356
rs10903451 10 0.193471
... ... ... ...
|
Go to download page. Enjoy it!
| |