|
||||||
Input FilesFESTA uses four kind of input files, viz., the linkage disequilibrium file, the map file, the frequency file and the include/exclude file. The LD file is required for the functioning of the algorithm, whereas the other three files can be used to tailor the algorithm. A sample of each of the four kinds of files is given below, to illustrate the format of the files.The input files for FESTA are generated using FUGUE. FUGUE can be used to generate the LD file along with the map and frequency files. |
||||||
Linkage Disequilibrium (LD) file: This file contains the pairwise LD parameter between all the SNPs (markers) in the region. Therefore, a line in the LD file must
contain the names of the SNPs along with the LD between the pair of markers. If a pair is not present in this LD file, it is assumed that they are not in LD, i.e. the r²
parameter between them is 0. The first few lines of a small sample file are given below. The '--cols' switch can be used to tell the program which columns in the input LD file
contain the information, viz. the marker names and the LD value. NOTE: 1. We use the r² value for the Linkage Disequilibrium information, but the user can use any measure, such as D', etc. 2. The first line of the LD file must not contain any information. It should be a header line.
|
||||||
Map (Physical Position) file: This file contains a map (physical position) of the SNPs described by the LD file. It may also contain other SNPs not present in the LD file. A single line in the
map file contains three whitespace seperated columns; (i) the first column contains the chromosome number/name, (ii) the second column contains the SNP name, and (iii) the third
column contains the position of the SNP in the region (given in kb or in bases). Again, the first few lines of a sample map file are reproduced below.
|
||||||
Frequency file: The frequency file contains allele frequencies of all the SNPs. The format of this file is very specific and its
description can be found in detail in the manual. As a quick reference, a part of the sample frequency file is included below.
|
||||||
Include/Exclude files: FESTA can be asked to include/exclude some markers from the final tagSNP set solution. This is accomplished by using other input file(s).
The include/exclude file contains markers that must be included/excluded in/from the final tagSNP set. A sample include file is shown below. Each line in the include file
contains the name of a marker that must be included in the final tagSNP solution. The exclude file format is identical.
|
||||||
Output FilesFESTA has one primary output file, which contains a summary of the operation and output of the algorithm. In addition to the result file, FESTA can be configured to output two other kind of files, viz., the Connection Information file and the 'Criterion tagSNP set' file, which contains the names of the markers in one possible solution that has been obtained by optimizing a criterion. In this section, we will take a look at the output files produced by FESTA. |
||||||
Result file: The result file comes in different flavors/formats depending on how FESTA was configured. I may contain only the greedy results or it may contain both
greedy and greedy-exhaustive tagSNP picking results. In addition, it may also contain the physical sizes of the precincts, the size of the double covers, etc. It will also
include a summary of the results at the end of the file. Three sample output result files are attached below, along with an explanation.
The following result file contains only the greedy results of the FESTA algorithm.
The next result file contains the greedy and greedy-exhaustive tagSNP picking output along with the physical sizes of the precincts.
The last example result file contains double cover results instead of physical sizes in addition to the greedy and greedy-exhaustive results.
In order to view the complete result files in ASCII format, click on the following links: Result file 1, Result file 2, Result file 3. |
||||||
Connection Information file: The Connection Information file contains the information regarding the memebers of the different precincts. It has 'precinct by
precinct' information of the SNPs and their connected neighbors (for the given threshold). A part of an example file is detailed below.
To see the complete connection information file, in ASCII format, follow the link: Connection Information file. |
||||||
'Criterion tagSNP Set' file: This file contains one set of SNPs that tag all the SNPs in the LD file. This set is chosen based on a criterion, such as maximizing or
minimizing the average LD value between tagSNPs, or minimizing the minor allele frequency of the tagSNPs. For a longer, more exhaustive discussion on criteria files, please
refer to the manual on FESTA. All criteria files have the same format. One such criterion file is reproduced below.
There are 5 sample criteria files; to view them, use the following links: Criteria file 1, Criteria file 2, Criteria file 3, Criteria file 4, Criteria file 5. |