popGenStat

The folder popGenStat.tar.gz contains the source codes and executables (linux 64x) of the softwares used to compute population genetic statistics used in the study Nabholz et al. 2014 Transcriptome population genomics reveals severe bottleneck and domestication cost in the African rice (O. glaberrima).

These softwares use the Bio++ library and, therefore, should be used and modified under the CeCILL free software licence (GPL-compatible).

COMPILATION:

Once, you have installed Bio++, you can compile the programs using the command:

g++ -g ./seq_stat_2pop_generic.cpp -o seq_stat_2pop -lbpp-popgen -lbpp-phyl -lbpp-seq -lbpp-core

g++ -g ./SNP_frequency_coding.cpp -o SNP_frequency_coding -lbpp-popgen -lbpp-phyl -lbpp-seq -lbpp-core

USAGE:

If you run the programs without option, they return a help file with the command line including all the options and the names and a short description of all the statistics computed.

./seq_stat_2pop computes various population genetics statistics.

./SNP_frequency_coding computes allele counts for both populations.
These concern either the derived allele, if the two outgroups are available, or otherwise, one allele chosen randomly. From the table returned by SNP_frequency_coding, it is easy to compute a site frequency spectrum.

OPTIONS:

-seq STR : A list containing the names of the alignments from which the statistics will be computed

-f STR : the format (phylip or fasta) of the alignments

-coding STR : A string indicating if the alignments are non-coding or coding. In the latter case, the program assumes that the first position of the alignment is the first codon position. Only the standard genetic code is considered. This option is NOT used in SNP_frequency_coding that only works on coding sequences.

-pop1 STR : Prefix of the sequences name belonging to the population 1

-pop2 STR : Prefix of the sequences name belonging to the population 2

-outgroup STR : Prefix of the sequences name belonging to the outgroup

-o STR : the name of the output file in which the statistics will be stored.

EXAMPLE:

10 alignments are stored in the folder alignment_fasta. You can compute the population genetic statistics of these alignments using the following commands :

ls alignment_fasta/* >listSeq

./seq_stat_2pop -seq listSeq -f fasta -coding coding -pop1 RC -pop2 RS -outgroup out -o popGenStat.txt

./SNP_frequency_coding -seq listSeq -f fasta -pop1 RC -pop2 RS -outgroup out -o SNPFreq.txt