University of Michigan Center for Statistical 
Genetics
Search
 
 

 
 

GWAS GUI v 0.0.2 Tutorial

README
README
GWAS GUI is a general tool to provide graphical overviews of whole-genome association studies of data
sets with various phenotypes for other users. Generally, the trait could be any outcome interested in 
medical science, such case-control indicator, expression values, heights, blood pressure and many 
other continuous or categorical measurements. Once the users have investigated the relations between different 
traits and SNPs in the genome-wide scans, they could use this program to visualize their results and search 
for useful conclusions. After loading the analysis results into the browser, the user can either browse 
the results in a genome-wide view of a single trait or compare the signals for different traits in a 
specific region graphically. The program is especially useful for GWAs of high-dimensional gene expression data 
which contain tens of thousands of phenotypes as these datasets are usually far too big to explore using 
conventional tools.    



To browse your own dataset, you will first need to prepare a set of input text files, three required and one optional. All the files should be tab delimited and start with a header row. Each file should start with the columns described below; the SNP information and results file can also include any number of user-defined columns.

File Description
result.txt A text file containing the association results of GWAS. Each row represents should start with a trait and marker label. These two labels should be followed by a set of numeric statistics, defined in the header row. For large datasets, it may be convenient to filter this file so that it only includes interesting SNPs for each trait (e.g. those that are associated with p < .05 with that particular trait).
Sample header: Trait snp statistic1 statistic2 statistic3 ...
snpinfo.txt A text file containing location information and, optionally, additional annotation for all the SNPs in result.txt. Each row should start with marker name, chromosome (1, 2, 3,..., 22, X), position, and a series of descriptive labels for the SNP (including for example, allele labels and frequency information and the names of nearby SNPs).
Position should be in megabases with at least 1 digit after decimal.
Sample header: SNP chr pos col1 ....
traitinfo.txt A text file containing the trait annotation. Each row should start with the trait name (to match result.txt) and can include a series of descriptive labels or annotations (for example, trait heritability, number of phenotyped individuals, trait mean, and trait variance are common types of useful information).
Sample header : trait anno1 anno2
geneinfo.txt An optional text file containing position information for interested genes. The program will plot those genes for biological inference.
Sample header : Gene Chr StartPos EndPos
traitgroup.txt An optional text file containing grouping information for the different traits. This file makes it easier to browse large datasets. Each row should contain just two labels, a trait label and a group label.
Sample header : trait group

Click for Sample interface


Example 1: This example shows the association results in a whole-genome association study of global gene expression. The association results contain 522 probes and 27823 SNPs. After filtering, there are about 40,000 rows. Click the caption to download the whole file.

 File: result.txt
 
Probe		SNP		LOD	Pvalue		Effect	H2
1007_s_at	rs10159998	3.391	7.76e-05	0.387	4.31
1007_s_at	rs10185841	3.309	9.48e-05	-0.333	4.45
1007_s_at	rs10520650	3.527	5.57e-05	-0.448	4.88
1007_s_at	rs1166246	3.005	1.99e-04	0.492	4.03
1007_s_at	rs11795624	3.242	1.12e-04	-0.244	2.6
1007_s_at	rs11888779	3.542	5.37e-05	-0.668	4.26
1007_s_at	rs1241650	3.313	9.38e-05	-0.544	4.19
1007_s_at	rs12527958	4.255	9.57e-06	0.38	6.26
1007_s_at	rs12537492	3.243	1.11e-04	0.368	4.26
1007_s_at	rs12542660	3.099	1.58e-04	0.319	4.45
...  ...  ... ... ...

 File: snpinfo.txt
 
SNP		chr	Pos		Allele
rs10509971	10	114.981618	A
rs7580303	2	2.065249	C
rs7527281	1	213.591486	C
rs1358064	7	86.58632	G
rs4237768	11	5.963848	G

...  ...  ... ... 

 File: traitinfo.txt
 
Probe		Mean		Variance	heritability	chr	StartPos
1007_s_at	6.04092		0.09056		0.33814		6	30964144
1053_at		7.5231		0.17694		0.46053		7	73090740
117_at		4.67898		0.19058		0.03757		1	158307503
121_at		6.55123		0.07555		0.47644		2	113691169
1255_g_at	2.67409		0.02049		0.06873		6	42248919
...  ...  ... ... ...

 File: traitgroup.txt
 
Probe		Gene
1552590_a_at    ABCC12
1552582_at      ABCC13
1552583_s_at    ABCC13
1552470_a_at    ABHD11
1552800_at      ABHD11
1552615_at      ACACB
1552616_a_at    ACACB
1552519_at      ACVR1C
1552579_a_at    ADAM21
1552266_at      ADAM32
1552725_s_at    ADAMTS17
...  ...  

 File: geneinfo.txt
 
geneName	chrom	txStart	txEnd
OR4F5	1	0.058953	0.059871
OR4F3	1	0.357521	0.35846
OR4F16	1	0.357521	0.35846
OR4F29	1	0.357521	0.35846
OR4F3	1	0.610958	0.611897
OR4F16	1	0.610958	0.611897
OR4F29	1	0.610958	0.611897
SAMD11	1	0.850983	0.869824
NOC2L	1	0.869445	0.884542
...  ...  




Example 2: The example shows the result of a recent meta analysis of genome wide associations scans for HDL-C, LDL-C and triglycerides. Willer et al. (2008) and Kathiresan et al. (2008). In the dataset, we include all the result with p-value < 0.1. There are 3 traits and each trait is associated with about 300K SNPs. (Click the caption to download the whole file)

 File: result.txt
 
Trait	Marker	Zscore	P-value	Weight
HDL	rs8070048	3.466	0.0005276	8656
HDL	rs631797	1.838	0.06604	8656
HDL	rs547177	-1.701	0.08887	8656
...	...	...	...	...
LDL	rs11756925	1.902	0.05714	8589
LDL	rs7158516	-2.431	0.01507	8589
LDL	rs2867749	2.067	0.03873	8589
...  ...  ... ... ...
TG	rs11164418	-1.722	0.08507	8684
TG	rs11766819	1.886	0.05926	8684
TG	rs2108487	-1.755	0.07932	8684
...	...	...	...	...

 File: snpinfo.txt
 
rsID	chromosome	position	Allele1	Allele2
rs10	7	92.221823	a	c
rs10000029	4	138.905073	t	c
rs10000042	4	5.288052	t	c
rs1000005	21	33.35492	c	g
rs10000064	4	128.02907	t	c
rs10000068	4	36.600681	t	c
rs10000075	4	179.725904	t	c
...  ...  ... ... 

 File: traitinfo.txt
 
Trait 	Annotation
HDL	HDL-Colesterol
LDL	LDL-Cholesterol
TG	triglycerides	




Example 3: The example shows the result of a genome-wide association study with case-control design. A1 is some interesting disease. The file has the association results for 300k SNPs. (Click the caption to download the whole file)

 File: result.txt
 
Trait		SNP	lod	pvalue		statistic1	statistic2
A1	rs12709155	6.050	1.305e-07	0.047		1.408
A1	rs4799778	5.356	6.813e-07	0.044		1.123
A1	rs6997978	4.653	3.678e-06	0.034		0.982
A1	rs11834921	4.579	4.392e-06	0.068		1.031
A1	rs4799369	4.541	4.812e-06	0.042		0.876
A1	rs4799368	4.539	4.829e-06	0.042		0.876
A1	rs5746945	4.458	5.866e-06	0.043		0.915
A1	rs1013465	4.458	5.867e-06	0.032		1.018
A1	rs12920222	4.445	6.065e-06	0.044		1.026
A1	rs12127789	4.435	6.203e-06	0.055		0.994
...  ...  ... ... ...

 File: snpinfo.txt
 
SNP		chr	Pos
rs4880781	10	0.165653
rs12146291	10	0.178805
rs7476901	10	0.186639
rs9419431	10	0.192356
rs10903451	10	0.193471
...  ...  ... ... 

 File: traitinfo.txt
 
Trait 	Annotation	
A1	Case-Control-GWAS



Go to download page. Enjoy it!


 
 

University of Michigan | School of Public Health