
                                  fclique 



Function

   Largest clique program

Description

   Finds the largest clique of mutually compatible characters, and the
   phylogeny which they recommend, for discrete character data with two
   states. The largest clique (or all cliques within a given size range
   of the largest one) are found by a very fast branch and bound search
   method. The method does not allow for missing data. For such cases the
   T (Threshold) option of PARS or MIX may be a useful alternative.
   Compatibility methods are particular useful when some characters are
   of poor quality and the rest of good quality, but when it is not known
   in advance which ones are which.

Algorithm

   This program uses the compatibility method for unrooted two-state
   characters to obtain the largest cliques of characters and the trees
   which they suggest. This approach originated in the work of Le Quesne
   (1969), though the algorithms were not precisely specified until the
   later work of Estabrook, Johnson, and McMorris (1976a, 1976b). These
   authors proved the theorem that a group of two-state characters which
   were pairwise compatible would be jointly compatible. This program
   uses an algorithm inspired by the Kent Fiala - George Estabrook
   program CLINCH, though closer in detail to the algorithm of Bron and
   Kerbosch (1973). I am indebted to Kent Fiala for pointing out that
   paper to me, and to David Penny for decribing to me his
   branch-and-bound approach to finding largest cliques, from which I
   have also borrowed. I am particularly grateful to Kent Fiala for
   catching a bug in versions 2.0 and 2.1 which resulted in those
   versions failing to find all of the cliques which they should. The
   program computes a compatibility matrix for the characters, then uses
   a recursive procedure to examine all possible cliques of characters.

   After one pass through all possible cliques, the program knows the
   size of the largest clique, and during a second pass it prints out the
   cliques of the right size. It also, along with each clique, prints out
   the tree suggested by that clique.

  ASSUMPTIONS

   Basically the following assumptions are made:
    1. Each character evolves independently.
    2. Different lineages evolve independently.
    3. The ancestral state is not known.
    4. Each character has a small chance of being one which evolves so
       rapidly, or is so thoroughly misinterpreted, that it provides no
       information on the tree.
    5. The probability of a single change in a character (other than in
       the high rate characters) is low but not as low as the probability
       of being one of these "bad" characters.
    6. The probability of two changes in a low-rate character is much
       less than the probability that it is a high-rate character.
    7. The true tree has segments which are not so unequal in length that
       two changes in a long are as easy to envisage as one change in a
       short segment.

   The assumptions of compatibility methods have been treated in several
   of my papers (1978b, 1979, 1981b, 1988b), especially the 1981 paper.
   For an opposing view arguing that the parsimony methods make no
   substantive assumptions such as these, see the papers by Farris (1983)
   and Sober (1983a, 1983b), but also read the exchange between
   Felsenstein and Sober (1986).

   A constant available for alteration at the beginning of the program is
   the form width, "FormWide", which you may want to change to make it as
   large as possible consistent with the page width available on your
   output device, so as to avoid the output of cliques and of trees
   getting wrapped around unnecessarily.

Usage

   Here is a sample session with fclique


% fclique 
Largest clique program
Input file: clique.dat
Output file [clique.fclique]: 


Output written to file "clique.fclique"

Tree written on file "clique.treefile"

Done.


   Go to the input files for this example
   Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-infile]            discretestates (no help text) discretestates value
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers (* if not always prompted):
   -ancfile            properties Ancestral states file
   -factorfile         properties Multistate factors file
   -weights            properties Weights file
   -cliqmin            integer    Minimum clique size
   -outgrno            integer    Species number to use as outgroup
   -[no]trout          toggle     Write out trees to tree file
*  -outtreefile        outfile    Tree file name
   -printdata          boolean    Print data at start of run
   -[no]progress       boolean    Print indications of progress of run
   -[no]treeprint      boolean    Print out tree
   -printcomp          boolean    Print out compatibility matrix

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-outfile" associated qualifiers
   -odirectory2        string     Output directory

   "-outtreefile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths


   Standard (Mandatory) qualifiers Allowed values Default
   [-infile]
   (Parameter 1) (no help text) discretestates value Discrete states file

   [-outfile]
   (Parameter 2) Output file name Output file <sequence>.fclique
   Additional (Optional) qualifiers Allowed values Default
   -ancfile Ancestral states file Property value(s)
   -factorfile Multistate factors file Property value(s)
   -weights Weights file Property value(s)
   -cliqmin Minimum clique size Integer 0 or more 0
   -outgrno Species number to use as outgroup Integer 0 or more 0
   -[no]trout Write out trees to tree file Toggle value Yes/No Yes
   -outtreefile Tree file name Output file
   -printdata Print data at start of run Boolean value Yes/No No
   -[no]progress Print indications of progress of run Boolean value
   Yes/No Yes
   -[no]treeprint Print out tree Boolean value Yes/No Yes
   -printcomp Print out compatibility matrix Boolean value Yes/No No
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

Input file format

   Input to the algorithm is standard, but the "?", "P", and "B" states
   are not allowed. This is a serious limitation of this program. If you
   want to find large cliques in data that have "?" states, I recommend
   that you use fmix instead with the -Threshold option and the value of
   the threshold set to 2.0. The theory underlying this is given in my
   paper on character weighting (Felsenstein, 1981b).

   fclique reads discrete character data with 2 states.

  Input files for usage example

  File: clique.dat

     5    6
Alpha     110110
Beta      110000
Gamma     100110
Delta     001001
Epsilon   001110

Output file format

   fclique writes the cliques to the text output file and a tree to a
   separate output file

  Output files for usage example

  File: clique.fclique


Largest clique program, version 3.6b




Largest Cliques
------- -------


Characters: (  1  2  3  6)


  Tree and characters:

     2  1  3  6
     0  0  1  1

             +1-Delta
       +0--1-+
  +--0-+     +--Epsilon
  !    !
  !    +--------Gamma
  !
  +-------------Alpha
  !
  +-------------Beta

remember: this is an unrooted tree!


  File: clique.treefile

   (((Delta,Epsilon),Gamma),Alpha,Beta);

Data files

   None

Notes

   None.

References

   None.

Warnings

   None.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0.

Known bugs

   None.

See also

   Program name                Description
   eclique      Largest clique program
   edollop      Dollo and polymorphism parsimony algorithm
   edolpenny    Penny algorithm Dollo or polymorphism
   efactor      Multistate to binary recoding program
   emix         Mixed parsimony algorithm
   epenny       Penny algorithm, branch-and-bound
   fdollop      Dollo and polymorphism parsimony algorithm
   fdolpenny    Penny algorithm Dollo or polymorphism
   ffactor      Multistate to binary recoding program
   fmix         Mixed parsimony algorithm
   fmove        Interactive mixed method parsimony
   fpars        Discrete character parsimony
   fpenny       Penny algorithm, branch-and-bound

Author(s)

   This program is an EMBOSS conversion of a program written by Joe
   Felsenstein as part of his PHYLIP package.

   Although we take every care to ensure that the results of the EMBOSS
   version are identical to those from the original package, we recommend
   that you check your inputs give the same results in both versions
   before publication.

   Please report all bugs in the EMBOSS version to the EMBOSS bug team,
   not to the original author.

History

   Written (2004) - Joe Felsenstein, University of Washington.

   Converted (August 2004) to an EMBASSY program by the EMBOSS team.

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.
