Sample Files for FESTA

This section describes the input files and output files for FESTA. It also includes small examples for all types of files associated with FESTA.

Input Files

FESTA uses four kind of input files, viz., the linkage disequilibrium file, the map file, the frequency file and the include/exclude file. The LD file is required for the functioning of the algorithm, whereas the other three files can be used to tailor the algorithm. A sample of each of the four kinds of files is given below, to illustrate the format of the files.
The input files for FESTA are generated using FUGUE. FUGUE can be used to generate the LD file along with the map and frequency files.
Linkage Disequilibrium (LD) file: This file contains the pairwise LD parameter between all the SNPs (markers) in the region. Therefore, a line in the LD file must contain the names of the SNPs along with the LD between the pair of markers. If a pair is not present in this LD file, it is assumed that they are not in LD, i.e. the r² parameter between them is 0. The first few lines of a small sample file are given below. The '--cols' switch can be used to tell the program which columns in the input LD file contain the information, viz. the marker names and the LD value.
NOTE:
1. We use the r² value for the Linkage Disequilibrium information, but the user can use any measure, such as D', etc.
2. The first line of the LD file must not contain any information. It should be a header line.

 File: sample.xt
LABEL1		LABEL2		DELTASQ
rs10199046:119	rs10221549:119	0.77778
rs10199046:119	rs10221616:119	0.77778
rs10199046:119	rs1528799:88.2	0.70718
rs10199046:119	rs2091574:96.2	0.77778
rs10199046:119	rs6712493:116.2	0.77778
rs10199046:119	rs6721908:116.1	1.00000
rs10199046:119	rs6734029:116.1	1.00000
rs10199046:119	rs6737381:116.1	1.00000
rs10199046:119	rs7578318:116.2	1.00000
rs10199046:119	rs7591147:116.1	1.00000
rs10199046:119	rs888107:111.3	1.00000
rs10202962:119	102620507:0	0.00994
rs10202962:119	102892545:0	0.15944
The complete sample LD file can be viewed, in ASCII format, at the following location: Complete Sample LD file..
Map (Physical Position) file: This file contains a map (physical position) of the SNPs described by the LD file. It may also contain other SNPs not present in the LD file. A single line in the map file contains three whitespace seperated columns; (i) the first column contains the chromosome number/name, (ii) the second column contains the SNP name, and (iii) the third column contains the position of the SNP in the region (given in kb or in bases). Again, the first few lines of a sample map file are reproduced below.

 File: sample.map
2	rs10199046:119	51.644128
2	rs10202962:119	51.641796
2	rs10221549:119	51.656756
2	rs10221616:119	51.656141
2	rs1206397:100.2	51.642273
2	rs1206413:116.2	51.63795
2	rs1528799:88.2	51.657298
The complete sample map file can be viewed, in ASCII format, at the following location: Complete Sample Map file.
Frequency file: The frequency file contains allele frequencies of all the SNPs. The format of this file is very specific and its description can be found in detail in the manual. As a quick reference, a part of the sample frequency file is included below.

 File: sample.freq
M rs10199046:119
A   4 0.87500
A   2 0.12500
M rs10202962:119
A   4 0.54237
A   1 0.45763
M rs10221549:119
A   2 0.90000
A   3 0.10000
M rs10221616:119
A   4 0.90000
A   2 0.10000
The complete sample frequency file can be viewed, in ASCII format, at the following location: Complete Sample Frequency file.
Include/Exclude files: FESTA can be asked to include/exclude some markers from the final tagSNP set solution. This is accomplished by using other input file(s). The include/exclude file contains markers that must be included/excluded in/from the final tagSNP set. A sample include file is shown below. Each line in the include file contains the name of a marker that must be included in the final tagSNP solution. The exclude file format is identical.

 File: sample.include
rs6706917:116.1
rs6707563:116.1
rs6734029:116.1
rs6735432:116.1

Output Files

FESTA has one primary output file, which contains a summary of the operation and output of the algorithm. In addition to the result file, FESTA can be configured to output two other kind of files, viz., the Connection Information file and the 'Criterion tagSNP set' file, which contains the names of the markers in one possible solution that has been obtained by optimizing a criterion. In this section, we will take a look at the output files produced by FESTA.
Result file: The result file comes in different flavors/formats depending on how FESTA was configured. I may contain only the greedy results or it may contain both greedy and greedy-exhaustive tagSNP picking results. In addition, it may also contain the physical sizes of the precincts, the size of the double covers, etc. It will also include a summary of the results at the end of the file. Three sample output result files are attached below, along with an explanation.

The following result file contains only the greedy results of the FESTA algorithm.

 File: Algorithm_Results_Greedy
Cl no.	Cl size	Gr set size
1	17	1
2	9	1
...
10	2	1
11	2	1

Time taken = 0.03 seconds

Number of markers: 50
No of cluster: 11
Total tagSNP size of Greedy solution: 11

The next result file contains the greedy and greedy-exhaustive tagSNP picking output along with the physical sizes of the precincts.

 File: Algorithm_Results
Cl no.	Cl size	Gr set size	GrEx set size	No.Sols		Physical size
1	17	1		1		17		0.008231
2	9	1		1		9		0.003566
...
10	2	1		1		2		0.001477
11	2	1		1		2		0.000028

Time taken = 0.040000 seconds

Number of markers: 50
No of cluster: 11
Number of clusters where greedy marker removal had to be performed before exhaustive search: 0
Total tagSNP size of Greedy solution: 11
Total tagSNP size of Greedy Exhaust solution: 11

The last example result file contains double cover results instead of physical sizes in addition to the greedy and greedy-exhaustive results.

 File: Algorithm_Results_Double_Cover
Cl no.	Cl size	Gr set size	GrEx set size	No.Sols	Double Cover size
1	17	1		1		17	1
2	9	1		1		9	1
...
10	2	1		1		2	1
11	2	1		1		2	1

Time taken = 0.120000 seconds

Number of markers: 50
No of cluster: 11
Number of clusters where greedy marker removal had to be performed before exhaustive search: 0
Total tagSNP size of Greedy solution: 11
Total tagSNP size of Greedy Exhaust solution: 11

In order to view the complete result files in ASCII format, click on the following links: Result file 1, Result file 2, Result file 3.

Connection Information file: The Connection Information file contains the information regarding the memebers of the different precincts. It has 'precinct by precinct' information of the SNPs and their connected neighbors (for the given threshold). A part of an example file is detailed below.

 File: Connection_Information
...

Precinct number 7
rs1528800:116.2 ::	

Precinct number 8
rs2216132:111.3 ::	rs6735432:116.1, 
rs6735432:116.1 ::	rs2216132:111.3, 

Precinct number 9
rs2540989:116.2 ::	

Precinct number 10
BI112308:0 ::	rs1206413:116.2, 
rs1206413:116.2 ::	BI112308:0, 

...

To see the complete connection information file, in ASCII format, follow the link: Connection Information file.

'Criterion tagSNP Set' file: This file contains one set of SNPs that tag all the SNPs in the LD file. This set is chosen based on a criterion, such as maximizing or minimizing the average LD value between tagSNPs, or minimizing the minor allele frequency of the tagSNPs. For a longer, more exhaustive discussion on criteria files, please refer to the manual on FESTA. All criteria files have the same format. One such criterion file is reproduced below.

 File: Criteria_1
Single cover SNPs selected by criteria 1

rs1206397:100.2
rs1528800:116.2
rs2216132:111.3
rs2540989:116.2
rs6545221:116.1
rs7349275:116.1
102620507:0
102892545:0
BI112308:0
BI112570:0
BI112835:0

There are 5 sample criteria files; to view them, use the following links: Criteria file 1, Criteria file 2, Criteria file 3, Criteria file 4, Criteria file 5.