
                                  frestml 



Function

   Restriction site maximum Likelihood method

Description

   Estimation of phylogenies by maximum likelihood using restriction
   sites data (not restriction fragments but presence/absence of
   individual sites). It employs the Jukes-Cantor symmetrical model of
   nucleotide change, which does not allow for differences of rate
   between transitions and transversions. This program is very slow.

Algorithm

   This program implements a maximum likelihood method for restriction
   sites data (not restriction fragment data). This program is one of the
   slowest programs in this package, and can be very tedious to run. It
   is possible to have the program search for the maximum likelihood
   tree. It will be more practical for some users (those that do not have
   fast machines) to use the U (User Tree) option, which takes less run
   time, optimizing branch lengths and computing likelihoods for
   particular tree topologies suggested by the user. The model used here
   is essentially identical to that used by Smouse and Li (1987) who give
   explicit expressions for computing the likelihood for three-species
   trees. It does not place prior probabilities on trees as they do. The
   present program extends their approach to multiple species by a
   technique which, while it does not give explicit expressions for
   likelihoods, does enable their computation and the iterative
   improvement of branch lengths. It also allows for multiple restriction
   enzymes. The algorithm has been described in a paper (Felsenstein,
   1992). Another relevant paper is that of DeBry and Slade (1985).

   The assumptions of the present model are:
    1. Each restriction site evolves independently.
    2. Different lineages evolve independently.
    3. Each site undergoes substitution at an expected rate which we
       specify.
    4. Substitutions consist of replacement of a nucleotide by one of the
       other three nucleotides, chosen at random.

   Note that if the existing base is, say, an A, the chance of it being
   replaced by a G is 1/3, and so is the chance that it is replaced by a
   T. This means that there can be no difference in the (expected) rate
   of transitions and transversions. Users who are upset at this might
   ponder the fact that a version allowing different rates of transitions
   and transversions would run an estimated 16 times slower. If it also
   allowed for unequal frequencies of the four bases, it would run about
   300,000 times slower! For the moment, until a better method is
   available, I guess I'll stick with this one!

   Subject to these assumptions, the program is an approximately correct
   maximum likelihood method.

Usage

   Here is a sample session with frestml


% frestml 
Restriction site maximum Likelihood method
Input file: restml.dat
Input tree file: 
Output file [restml.frestml]: 

numseqs: 1

Adding species:
   1. Alpha
   2. Beta
   3. Gamma
   4. Delta
   5. Epsilon

Output written to file "restml.frestml"

Tree also written onto file "restml.treefile"

Done.


   Go to the input files for this example
   Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-data]              discretestates File containing one or more sets of
                                  restriction data
  [-intreefile]        tree       (no help text) tree value
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers (* if not always prompted):
   -weights            properties Weights file
   -njumble            integer    Number of times to randomise
*  -seed               integer    Random number seed between 1 and 32767 (must
                                  be odd)
   -outgrno            integer    Species number to use as outgroup
   -[no]allsites       boolean    All sites detected
*  -lengths            boolean    Use lengths from user trees
   -sitelength         integer    Site length
*  -global             boolean    Global rearrangements
*  -[no]rough          boolean    Speedier but rougher analysis
   -[no]trout          toggle     Write out trees to tree file
*  -outtreefile        outfile    Tree file name
   -printdata          boolean    Print data at start of run
   -[no]progress       boolean    Print indications of progress of run
   -[no]treeprint      boolean    Print out tree

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-outfile" associated qualifiers
   -odirectory3        string     Output directory

   "-outtreefile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths


   Standard (Mandatory) qualifiers Allowed values Default
   [-data]
   (Parameter 1) File containing one or more sets of restriction data
   Discrete states file
   [-intreefile]
   (Parameter 2) (no help text) tree value Phylogenetic tree
   [-outfile]
   (Parameter 3) Output file name Output file <sequence>.frestml
   Additional (Optional) qualifiers Allowed values Default
   -weights Weights file Property value(s)
   -njumble Number of times to randomise Integer 0 or more 0
   -seed Random number seed between 1 and 32767 (must be odd) Integer
   from 1 to 32767 1
   -outgrno Species number to use as outgroup Integer 0 or more 0
   -[no]allsites All sites detected Boolean value Yes/No Yes
   -lengths Use lengths from user trees Boolean value Yes/No No
   -sitelength Site length Integer from 1 to 8 6
   -global Global rearrangements Boolean value Yes/No No
   -[no]rough Speedier but rougher analysis Boolean value Yes/No Yes
   -[no]trout Write out trees to tree file Toggle value Yes/No Yes
   -outtreefile Tree file name Output file
   -printdata Print data at start of run Boolean value Yes/No No
   -[no]progress Print indications of progress of run Boolean value
   Yes/No Yes
   -[no]treeprint Print out tree Boolean value Yes/No Yes
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

Input file format

   frestml input is fairly standard, with one addition. As usual the
   first line of the file gives the number of species and the number of
   sites, but there is also a third number, which is the number of
   different restriction enzymes that were used to detect the restriction
   sites. Thus a data set with 10 species and 35 different sites,
   representing digestion with 4 different enzymes, would have the first
   line of the data file look like this:


   10   35    4

   The first line of the data file will also contain a letter W following
   these numbers (and separated from them by a space) if the Weights
   option is being used. As with all programs using the weights option, a
   line or lines must then follow, before the data, with the weights for
   each site.

   The site data are in standard form. Each species starts with a species
   name whose maximum length is given by the constant "nmlngth" (whose
   value in the program as distributed is 10 characters). The name
   should, as usual, be padded out to that length with blanks if
   necessary. The sites data then follows, one character per site (any
   blanks will be skipped and ignored). Like the DNA and protein sequence
   data, the restriction sites data may be either in the "interleaved"
   form or the "sequential" form. Note that if you are analyzing
   restriction sites data with the programs DOLLOP or MIX or other
   discrete character programs, at the moment those programs do not use
   the "aligned" or "interleaved" data format. Therefore you may want to
   avoid that format when you have restriction sites data that you will
   want to feed into those programs.

   The presence of a site is indicated by a "+" and the absence by a "-".
   I have also allowed the use of "1" and "0" as synonyms for "+" and
   "-", for compatibility with MIX and DOLLOP which do not allow "+" and
   "-". If the presence of the site is unknown (for example, if the DNA
   containing it has been deleted so that one does not know whether it
   would have contained the site) then the state "?" can be used to
   indicate that the state of this site is unknown.

   User-defined trees may follow the data in the usual way. The trees
   must be unrooted, which means that at their base they must have a
   trifurcation.

  Input files for usage example

  File: restml.dat

   5   13   2
Alpha     ++-+-++--+++-
Beta      ++++--+--+++-
Gamma     -+--+-++-+-++
Delta     ++-+----++---
Epsilon   ++++----++---

Output file format

   frestml outputs a graph to the specified graphics device. outputs a
   report format file. The default format is ...

  Output files for usage example

  File: restml.frestml


Restriction site Maximum Likelihood method, version 3.6b


  Recognition sequences all 6 bases long

Sites absent from all species are assumed to have been omitted




  +----Gamma
  |
  |     +Epsilon
  |  +--3
  1--2  +Delta
  |  |
  |  +Beta
  |
  +Alpha


remember: this is an unrooted tree!

Ln Likelihood =   -40.34358


 Between        And            Length      Approx. Confidence Limits
 -------        ---            ------      ------- ---------- ------
   1          Gamma           0.10813     (  0.01154,     0.21901) **
   1             2            0.01156     (     zero,     0.04578)
   2             3            0.05885     (     zero,     0.12697) **
   3          Epsilon         0.00100     (     zero,     0.00617)
   3          Delta           0.01460     (     zero,     0.05036)
   2          Beta            0.00100     (     zero,    infinity)
   1          Alpha           0.01310     (     zero,     0.04806)

     *  = significantly positive, P < 0.05
     ** = significantly positive, P < 0.01


  File: restml.treefile

(Gamma:0.10813,((Epsilon:0.00100,Delta:0.01460):0.05885,
Beta:0.00100):0.01156,Alpha:0.01310);

Data files

   None

Notes

   None.

References

   None.

Warnings

   None.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0.

Known bugs

   None.

See also

   Program name                         Description
   ednacomp     DNA compatibility algorithm
   ednadist     Nucleic acid sequence Distance Matrix program
   ednainvar    Nucleic acid sequence Invariants method
   ednaml       Phylogenies from nucleic acid Maximum Likelihood
   ednamlk      Phylogenies from nucleic acid Maximum Likelihood with clock
   ednapars     DNA parsimony algorithm
   ednapenny    Penny algorithm for DNA
   eprotdist    Protein distance algorithm
   eprotpars    Protein parsimony algorithm
   erestml      Restriction site Maximum Likelihood method
   eseqboot     Bootstrapped sequences algorithm
   fdiscboot    Bootstrapped discrete sites algorithm
   fdnacomp     DNA compatibility algorithm
   fdnadist     Nucleic acid sequence Distance Matrix program
   fdnainvar    Nucleic acid sequence Invariants method
   fdnaml       Estimates nucleotide phylogeny by maximum likelihood
   fdnamlk      Estimates nucleotide phylogeny by maximum likelihood
   fdnamove     Interactive DNA parsimony
   fdnapars     DNA parsimony algorithm
   fdnapenny    Penny algorithm for DNA
   fdolmove     Interactive Dollo or Polymorphism Parsimony
   ffreqboot    Bootstrapped genetic frequencies algorithm
   fproml       Protein phylogeny by maximum likelihood
   fpromlk      Protein phylogeny by maximum likelihood
   fprotdist    Protein distance algorithm
   fprotpars    Protein pasimony algorithm
   frestboot    Bootstrapped restriction sites algorithm
   frestdist    Distance matrix from restriction sites or fragments
   fseqboot     Bootstrapped sequences algorithm
   fseqbootall  Bootstrapped sequences algorithm

Author(s)

   This program is an EMBOSS conversion of a program written by Joe
   Felsenstein as part of his PHYLIP package.

   Although we take every care to ensure that the results of the EMBOSS
   version are identical to those from the original package, we recommend
   that you check your inputs give the same results in both versions
   before publication.

   Please report all bugs in the EMBOSS version to the EMBOSS bug team,
   not to the original author.

History

   Written (2004) - Joe Felsenstein, University of Washington.

   Converted (August 2004) to an EMBASSY program by the EMBOSS team.

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
