HAPMIXMAP

a program to model HapMap haplotypes using tag SNP genotype data

 

back to HAPMIXMAP main page

User options

The program requires a list of options to be specified by the user in a text file, the name of which is given as a single argument to the program. In this file, options should be specified as optionname = value, with one option per line. All white space is ignored. Option names are case-insensitive and underscore (_) characters are ignored so for example, the following are all equivalent:

fixedallelefreqs = 1

fixed_allele_freqs = 1

FixedAlleleFreqs = 1

Fixed_allele_Freqs = 1

FIXEDALLELEFREQS = 1

An example options file, options.txt,  is provided with the program. This can be used as a template.

General options

Model specification

Data files

Prior Specification

Initial values

Output files

Tests and Diagnostics

Advanced Options


The following options used in ADMIXMAP also apply to HAPMIXMAP:

samples, burnin, every, numannealedruns, displaylevel, resultsdir, logfile, seed, priorallelefreqfile, fixedallelefreqs, locusfile, genotypesfile, outcomevarfile, coxoutcomevarfile, covariatesfile, targetindicator, outcomes, regressionpriorprecision, paramfile, regparamfile, ergodicaveragefile, thermo, allelicassociationscorefile, hwscorefile, residualallelicassocscorefile.

The following options share a name with an ADMIXMAP option but have a different function. Take care when using these:

allelicassociationscorefile, residualallelicassocscorefile.

A list of valid options is given in the following tables. Required arguments are in bold.

 

General Options

samples

Integer specifying total number of iterations of the Markov chain, including burn-in. With strong priors and informative markers, a run of about 500 should suffice for inference. Otherwise, a run of at least  20 000 iterations may be necessary. See here for how to determine if the run has been long enough.

burnin

Integer specifying number of iterations for burn-in of the Markov chain, before posterior samples are output. A burn-in of at least 50 iterations is recommended for inference. For analyses requiring long runs, a burn-in of up to 500 may be required.

every

Integer specifying the "thinning" of samples from the posterior distribution that are written to the output files, after the burn-in period.  For example, if every=10, sampled values are written to the output files every 10 iterations.  We recommend using a value of 5 to keep down the size of the output files.  Sampling more frequently than this does not much improve the precision of results, because successive draws are not independent.  Thinning the output samples does not affect the calculation of ergodic averages or test statistics, which are based on all sampled values.  

Note that every must be no greater than (samples - burnin) / 10 or some output files may be empy. 

numannealedruns

If thermo=0, this specifies the number of "annealing" runs during burnin. This usually improves mixing.

If thermo=1, this specifies the number of "temperatures" at which to run in order to estimate the marginal likelihood by thermodynamic integration.

Default is 20.

displaylevel

0 = silent mode; Only start and finish times output to screen.

1 = quiet mode; Model specification, priors, test results and diagnostics written to screen.

2 = normal mode; more verbose information and an iteration counter output to screen.

>2 = monitor mode; global parameters also written to screen with frequency specified by every.

The parameters are those that appear in the paramfile, freqprecisionfile, regparamfile and loglikelihoodfile, in that order, namely: 

  1. mixtureprops precision (if fixedmixturepropsprecision = 0)
  2. observed mixtureprops precision (if fixedmixturerops = 0)
  3. arrivals per unit distance shape parameter
  4. arrival rate distribution rate parameter
  5. expected number of arrivals per unit distance ( #3 / #4 )
  6. allele frequency precision observed mean (if fixedallelefreqs = 0)
  7. allele frequency precision observed variance (if fixedallelefreqs = 0)
  8. allele frequency precision rate parameter (if freqprecisionhiermodel = 1)
  9. regression parameters (if regression model)
  10. Energy ( after burnin )

resultsdir

Path of directory for output files. Default is 'results'.

logfile

Name of log file written by the program. Default is 'logfile.txt',

seed

Integer, specifying a seed for the random number generator. Default is 1.

 Model Specification

states

Integer specifying number of haplotype block states in the model.  This option is not required (and is ignored) if information about allele frequencies is supplied in priorallelefreqfile as the number of columns in this file defines the number of block states in the model.  

If this file is not specified, the parameters of the Dirichlet priors for allele or haplotype frequencies default to 1/n, where n is the number of alleles or haplotypes at each compound locus. 

fixedmixtureprops

1  -  Mixture proportions are held fixed at  1 / K, where K is specified by ‘states’.

0 (default)    -  Mixture proportions are sampled using a conjugate Dirichlet prior.

fixedmixturepropsprecision

1 (default)  -   mixture proportion precision is fixed at its initial value, specified by ‘mixturepropsprecision'

0  (valid only if fixedmixtureprops = 0) -   mixture proportion precision is sampled .

fixedallelefreqs

1  specifies that the allele frequencies are to remain fixed at their initial values.

0 (default)  otherwise

freqprecisionhiermodel

1 (default) specifies that a hierarchical model is to be used for the allele frequency precision. 

0 otherwise.

At each locus, the frequencies have a Dirichlet prior with mean m, with uniform prior, and precision h, with a Gamma prior.

With freqprecisionhiermodel = 1, The rate parameter of this gamma prior itself has a gamma prior, which allows a conjugate update.

Data Files

Details of file formats are under Input files

locusfile

path to file containing information about each locus typed
genotypesfile path to file containing genotypes for each individual typed
ccgenotypesfile path to file containing genotypes for a case-control study
priorallelefreqfile

Pathname of file containing parameters of the Dirichlet prior distributions for allele frequencies (or haplotype frequencies) at each compound locus in each block state.  Where allele frequencies have been estimated from a sample of unadmixed individuals, the prior distribution parameters for the corresponding subpopulation should be specified as the observed allele counts plus 0.5.  Where no allele frequency data are available, specify the prior parameters as 0.5 for each allele ("reference" prior). 

If this option is specified, the allele frequencies are initialised at their prior means.

 

outcomevarfile path to file containing values of outcome variables
coxoutcomevarfile path to file containing data for a Cox regression
covariatesfile path to file containing covariates for a regression model
targetindicator Integer specifying column in outcomevarfile that contains the first outcome variable to be modelled. This column number should be specified as an offset from column 1: thus to select the variable in column 1, specify targetindicator=0.  The default is 0. 
outcomes

valid only with outcomevarfile.

Integer specifying the number of columns of the outcomevarfile to use, starting with targetindicator

 

Prior Specification

arrivalrateprior

Vector of length 3 or 4, specifying the parameters of the prior on the arrival rates, l.

Each l is sampled from a Gamma(h*d, b) distribution, where d is the length of the relevant interval. 

If the vector has length 4, both h and b have Gamma priors; if the vector has length 3, only h has a Gamma prior and b is fixed at the the third value. 

arrivalrateprior = "a, b, c, d", specifies a Gamma(a, b) prior on h and a Gamma(c, d) prior on b.

arrivalrateprior = "a, b, c", specifies a Gamma(a, b) prior on h and b is fixed at c.

 

The default is "30, 0.1, 10, 1", specifying an expected arrival rate of 30 per unit distance, but this option should always be used where possible.

mixturepropsprecisionprior

Vector of length 2 specifying the parameters of the Gamma prior on the mixture proportion precision parameter. For example,  mixturepropsprecisionprior = 1, 1', specifies a reference (uninformative) Gamma(1,1) prior.

Valid only if fixedmixturepropsprecision = 0.

allelefreqprecisionprior

vector of length 3, specifying the parameters of the prior on the allele frequency precision parameter. At each locus, the frequencies are sampled from a Dirichlet distribution with mean m and precision h. The means have a uniform prior.

If freqprecisionhiermodel=0, allelefreqprecisionprior = "a, b, c" specifies a Gamma(a, b) prior on h.

If freqprecisionhiermodel=1, allelefreqprecisionprior = "a, b, c" specifies a Gamma(a, b) distribution on hand a Gamma(b, c) prior on b. 

The default is "0.2, 1, 1", which specifies a prior with most of its mass towards the extremes of 0 and 1. 

regressionpriorprecision

Prior precision (1 / variance) of regression parameters

Initial Values

If any of these are not specified, the respective parameters default to their prior means.

initialarrivalratefile

Pathname of file containing initial values of the parameters of the Gamma prior, followed by the initial values of the arrival rates, l, tab- or space-separated. 

initialmixturepropsfile Pathname of file containing initial values of the mixture proportions, tab- or space-separated. Should be listed by locus first, then by block state. Valid even if fixedmixtureprops = 1.

initialallelefreqfile

Pathname of file containing initial values of allele frequencies for each block state, at each locus. Valid even if fixedallelefreqs = 1. In particular, this means you can supply a priorallelefreqfile with fixedallelefreqs=1 and the initial values of the frequencies will be available as an initialallelefreqfile next time.

initialfreqpriorfile

Pathname of file containing initial values of allele frequency prior parameters, tab- or space-separated. Valid even if fixedallelefreqs = 1.

mixturepropsprecision

Specifies the initial value of the mixture proportion precision.

Output Files

Pathnames of output files, details of file formats in Output files

paramfile

Average mixture proportions across loci, sample mean and variance of arrival rates and their prior parameters.
regparamfile   Regression parameters
freqprecisionfile   Sample mean and variance of allele frequency precision parameters
finalallelefreqfile Final values of allele freqs, in the right format for use as an allelefreqfile in subsequent runs. Defaults to "state-allelefreqs.txt".
finalfreqpriorfile   Final values of allele frequency prior parameters, in the right format for use as initialfreqpriorfile in subsequent runs. Defaults to "state-freqpriors.txt". 
finalarrivalratefile   Final values of arrival rates, in the right format for use as initialarrivalratefile in subsequent runs.  Defaults to "state-arrivalrates.txt".
finalmixturepropsfile  Final values of mixture proportions, in the right format for use as initialmixturepropsfile in subsequent runs. Defaults to "state-mixtureprops.txt".
arrivalrateposteriormeanfile  Posterior means of the arrival rates, l.  
allelefreqprecisionposteriormeanfile   Posterior means of allele frequency precision parameter.  
ergodicaveragefile Ergodic averages of global parameters and of the mean and variance of the deviance.

Tests and Diagnostics

The options below specify additional tests or output,but do not change the model itself

thermo

1 - Use thermodynamic integration to compute marginal likelihood.

0 - default

allelicassociationscorefile

Name of output file containing score tests for association of the outcome variable with alleles at each simple locus.

This is not the same as the test in ADMIXMAP with the same name.

residualellelicassocscorefile

Name of output file containing score tests for residual allelic association between pairs of unlinked loci.

Unlike in ADMIXMAP, this test is evaluated for all individuals at all loci, even when untyped.

hwscoretestfile

Name of output file containing score tests for heterozygosity across loci, as a test for departure from Hardy-Weinberg equilibrium. These can be used to detect genotyping errors.  

mhscoretestfile

Name of outputfile containing Mantel-Haentszel test output.

 

Advanced Options

These are intended only for expert users

arrivalratesamplerparams

Vector of length 5 specifying the initial stepsize, min stepsize, max stepsize, number of leapfrog steps and target accept rate in arrival rate sampler.

maskedindivs

maskedloci

lists of indices of individuals whose genotypes have been masked (set to missing) and the loci at which they were masked. If both these options are specified, posterior predictive genotype probabilities are written to a file called PPGenotypeProbs.txt  .

printbuildinfo

Print information about the build

 

 

 

Back to top