HPeak: An HMM-based algorithm for defining read-enriched regions from massive parallel sequencing data

Readme | Download | Home | FAQ | Update

Please check the following instructions to complete your installation.

Set up for HPeak v2.1


The software suite was written by Perl and C++. Users may require to download a copy of Perl disctribution and compile the C++ source code to complete the installation. Windows users may download a Perl distribution from ActivatePerl. For your convenience, we included pre-compiled executables for the C++ programs needed by HPeak. If for some reasons you need to recompile it. Use the following command under Linux/Unix system:

g++ -o chiphmm chiphmm.cpp

g++ -o hmmminus chiphmmminus.cpp

For installation, please extract the downloaded source package using any archive program. The package includes a directory called HPeak-2.1, which contains all perl scripts and C++ source code plus two subdirectory,  /data/ and /example. The first contains all working information files. The second contains sample data. One can either include the /HPeak-2.1/ path to the appropriate configuration file of your operating system, or add path in each command.  E.g.,

perl ~/program/HPeak-2.1/HPeak.pl.

 

To get DNA sequence data using the –seq option, for human data, one needs to download the human genome sequence files from either UCSC genome site (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromFa.zip), or extract from the package we provided on our site. The sequence files should be put to data/hg18/chromFa/ folder, for mouse data, user need to download the mouse genome data from UCSC site (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/bigZips/chromFa.tar.gz), the sequence files should be extracted to data/mm9/chromFa/ folder

 

To get detail genomic annotation information using –ann option, one needs to download additional information files from UCSC site. For human data we also provide download at our site. The package on HPeak website includes one refgene file (refFlat.out) and a set of phastCons score files, one per chromosome. You need to extract the phastCons score files (/data/phastCons/) to /data/hg18/phastCons/ folder. In order to quickly lookup the conservative scores for individual base positions, conservation scores for alignments of 16 vertebrate genomes with human were downloaded from UCSC genome site (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/) and have been converted to ASCII files (one character representing one phastcon score). This package can be downloaded here (395 MB). Move the file to the data/ folder and extract from there. For mouse usage, you can download the phastCons file from UCSC genome site (http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/vertebrate/) and convert the Phastcon Scores to ASCII.

In the examples shown in this document, we assume that the path has already been correctly set up (as default when you extract the HPeak package) so it is omitted from the commands. An example of directory structure is as follows:

<DIR>    HPeak-2.1
<DIR>    HPeak-2.1\data
<DIR>    HPeak-2.1\data\hg18
<DIR>    HPeak-2.1\data\mm9
FILE       Hpeak-2.1\data\hg18\refFlat.out
<DIR>    HPeak-2.1\data\hg18\chromFa\
<DIR>    HPeak-2.1\data\hg18\phastCons\
FILE       Hpeak-2.1\data\mm9\refFlat.out
<DIR>    HPeak-2.1\data\mm9\chromFa\
<DIR>    HPeak-2.1\data\mm9\phastCons\

 


Set up for HPeak v2.0 or older version


The software suite was written by Perl and C++. Users may require to download a copy of Perl disctribution and compile the C++ source code to complete the installation. Windows users may download a Perl distribution from ActivatePerl. For your convenience, we included pre-compiled executables for the C++ programs needed by HPeak. If for some reasons you need to recompile it. Use the following command under Linux/Unix system:

g++ -o chiphmm chiphmm.cpp

g++ -o hmmminus chiphmmminus.cpp

For installation, please extract the downloaded source package using any archive program. The package includes a directory called HPeak-1.0, which contains all perl scripts and C++ source code plus two subdirectory,  /data/ and /example. The first contains all working information files. The second contains sample data. One can either include the /HPeak-1.0/ path to the appropriate configuration file of your operating system, or add path in each command.  E.g.,

perl ~/program/HPeak-1.0/HPeak.pl.

 

To get DNA sequence data using the –seq option, one needs to download the human genome sequence files from either UCSC genome site (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/bigZips/chromFa.zip). The sequence files should be extracted to data/chromFa/ folder.

 

To get detail genomic annotation information using –ann option, one needs to download additional information files from HPeak site. The package includes one refgene file (refFlat.out) and a set of phastCons score files, one per chromosome. One should move the refFlat.out to data/ folder and the phastCons score files to data/phastCons/ folder. In order to quickly lookup the conservative scores for individual base positions, conservation scores for alignments of 16 vertebrate genomes with human were downloaded from UCSC genome site (http://hgdownload.cse.ucsc.edu/goldenPath/hg18/phastCons17way/) and have been converted to ASCII files. This package can be downloaded here (395 MB). Move the file to the data/ folder and extract from there.

In the examples shown in this document, we assume that the path has already been correctly set up so it is omitted from the commands. An example of directory structure is as follows (HPeak-1.0 could be HPeak-1.1 or HPeak-2.0, depends on the version you chose):

<DIR>    HPeak-1.0
<DIR>    HPeak-1.0\data
FILE       Hpeak-1.0\data\refFlat.out
<DIR>    HPeak-1.0\data\chromFa\
<DIR>    HPeak-1.0\data\phastCons\

 


[ HPeak ] | Steve Qin | Chinnaiyan Lab ]