
                          SIGGENLIG documentation
                                      
   

CONTENTS

   1.0 SUMMARY 
   2.0 INPUTS & OUTPUTS 
   3.0 INPUT FILE FORMAT 
   4.0 OUTPUT FILE FORMAT 
   5.0 DATA FILES 
   6.0 USAGE 
   7.0 KNOWN BUGS & WARNINGS 
   8.0 NOTES 
   9.0 DESCRIPTION 
   10.0 ALGORITHM 
   11.0 RELATED APPLICATIONS 
   12.0 DIAGNOSTIC ERROR MESSAGES 
   13.0 AUTHORS 
   14.0 REFERENCES 

1.0 SUMMARY

   Generates ligand-binding signatures from a CON file (contacts file) of
   residue-ligand contacts. Generate ligand-binding signatures from a CON
   file

2.0 INPUTS & OUTPUTS

   SIGGENLIG reads a CON file of residue-ligand contacts generated by
   using SITES and a directory of CCF files (clean coordinate files)
   containing a CCF file for each protein or domain in the CON file. One
   or more signature files, each containing a ligand-binding signature,
   are generated for each ligand-binding site in the CON file. The user
   specifies whether 1D (sequence) or 3D (structural) signatures are
   generated and whether they are 'full-length' (signature corresponds to
   entire ligand-binding site) or 'patch' (signature corresponds to part
   of ligand-binding site). For 3D signatures, the environment definition
   is specified and for patch signatures, a 'Minimum patch size' and
   'Maximum gap distance' are specified. A 'Window size' is specified for
   all signatures. The paths of the CCF files (input) and signature files
   (output) are specified by the user and the file extensions are
   specified in the ACD file. A log file is also written.

3.0 INPUT FILE FORMAT

   The format of the CON file (contacts file) is described in SITES
   documentation. The format of the CCF files is described in PDBPARSE
   documentation (proteins) and the DOMAINER documentation (domains).

  Input files for usage example

  File: ../sites-keep/SITES.con

XX   Residue-ligand contact data (for domains).
XX
TY   LIGAND
XX
EX   THRESH 1.0; IGNORE .; NMOD .; NCHA .;
XX
NE   11
XX
EN   [1]
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG 101;
XX
DE   2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
SI   SN 1; NS 2
XX
CN   MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 .
XX
S1   SEQUENCE    52 AA;   5817 MW;  47362A43 CRC32;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
NC   SM .; LI 6
XX
LI   ASP 2
LI   PHE 6
LI   THR 7
LI   LEU 44
LI   GLY 45
LI   ASP 46
XX
//
EN   [2]
XX
ID   PDB 1ii7; DOM d1ii7a_; LIG 101;
XX
DE   2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
SI   SN 2; NS 2
XX
CN   MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 65; NRES2 .
XX
S1   SEQUENCE    65 AA;   7396 MW;  0CFB92A3 CRC32;
     MKFAHLADIH LGYEQFHKPQ REEEFAEAFK NALEIAVQEN VDFILIAGDL FHSSRPSPGT
     LKKAI
XX
NC   SM .; LI 2
XX
LI   HIS 10
LI   ASP 49
XX


  [Part of this file has been deleted for brevity]

NC   SM .; LI 3
XX
LI   ASP 8
LI   HIS 10
LI   ASP 49
XX
//
EN   [10]
XX
ID   PDB 2hhb; DOM .; LIG PO4;
XX
DE   PHOSPHATE ION
XX
SI   SN 1; NS 1
XX
CN   MO .; CN1 1; CN2 .; ID1 D; ID2 .; NRES1 146; NRES2 .
XX
S1   SEQUENCE   146 AA;  15868 MW;  EC9744C9 CRC32;
     VHLTPEEKSA VTALWGKVNV DEVGGEALGR LLVVYPWTQR FFESFGDLST PDAVMGNPKV
     KAHGKKVLGA FSDGLAHLDN LKGTFATLSE LHCDKLHVDP ENFRLLGNVL VCVLAHHFGK
     EFTPPVQAAY QKVVAGVANA LAHKYH
XX
NC   SM .; LI 2
XX
LI   VAL 1
LI   LEU 81
XX
//
EN   [11]
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG POP;
XX
DE   PYROPHOSPHATE 2-
XX
SI   SN 1; NS 1
XX
CN   MO .; CN1 1; CN2 .; ID1 A; ID2 .; NRES1 52; NRES2 .
XX
S1   SEQUENCE    52 AA;   5817 MW;  47362A43 CRC32;
     ADIEGFTSLA SQCTAQELVM TLNELFARFD KLAAENHCLR IKILGDCYYC VS
XX
NC   SM .; LI 6
XX
LI   ASP 2
LI   ILE 3
LI   GLU 4
LI   GLY 5
LI   PHE 6
LI   THR 7
XX
//

4.0 OUTPUT FILE FORMAT

   The output file (Figure 1) uses the standard signature file format and
   is described in the SIGGEN documentation. For the ligand-binding
   signatures generated by SIGGENLIG, however, four additional lines of
   bibliographic information taken from the CON (input) file are written
   to a signature file. The records have the following meaning:
     * ID - identifier codes: (1) PDB: 4-character PDB identifier code.
       (2) DOM: 7-character domain identifier code from SCOP or CATH. (3)
       LIG: Ligand identifier (an abbreviation of its full name).
     * DE - full name of the ligand, see HETPARSE documentation.
     * IS - ligand-binding site information: (1) SN: Site number. This
       uniquely identifies a ligand-binding site in the CON file
       generated by running SITES. (2) NS: The total number of binding
       sites for a given ligand.
     * IP - path information: (1) PN: Patch number. The sequential order
       (from the N-terminus) of the patch in the ligand-binding site. (2)
       NP: Number of patches. The total number of patches in this
       ligand-binding site. (3) MP: Minimum patch size. (4) MG: Maximum
       gap distance. MP and MG were specified in the generation of the
       signature (see 'Algorithm' below).

   In addition, where 3D signatures are generated, the following records
   have different meanings than for 1D signatures (e.g. those generated
   by using SIGGEN):
     * IN - Informative line about signature position. The number of
       different observed amino acid environments (rather than residue
       identities) is given after 'NRES'
     * AA - The identifier of an environment (rather than residue) seen
       in this position and the frequency of its occurence are delimited
       by ';'.

  Output files for usage example

  File: 101.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG 101;
XX
DE    2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
IS   SN 1; NS 2
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   6
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   1 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   3 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   36 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   G ; 1
XX
GA   0 ; 1
XX
NN   [6]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   0 ; 1
//

  File: 101.2.F.#.1ii7.d1ii7a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1ii7; DOM d1ii7a_; LIG 101;
XX
DE    2'-DEOXY-ADENOSINE 3'-MONOPHOSPHATE
XX
IS   SN 2; NS 2
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   2
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   9 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   38 ; 1
//

  File: FOK.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG FOK;
XX
DE    FORSKOLIN
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   1
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   43 ; 1
//

  File: HEM.1.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 1; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   17
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   Y ; 1
XX
GA   41 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   1 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [16]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [17]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

  File: HEM.2.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 2; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   17
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   30 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   9 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   20 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [16]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [17]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

  File: HEM.3.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 3; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   19
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   M ; 1
XX
GA   31 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 1
XX
GA   6 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   Y ; 1
XX
GA   2 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [16]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [17]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [18]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [19]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

  File: HEM.4.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG HEM;
XX
DE    PROTOPORPHYRIN IX CONTAINING FE
XX
IS   SN 4; NS 4
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   15
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   30 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   9 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   20 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX


  [Part of this file has been deleted for brevity]

XX
GA   0 ; 1
XX
NN   [10]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   3 ; 1
XX
NN   [11]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   1 ; 1
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 1
XX
GA   3 ; 1
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   2 ; 1
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   34 ; 1
//

  File: MG.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG MG;
XX
DE    MAGNESIUM ION
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   4
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   1 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   I ; 1
XX
GA   0 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   2 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   39 ; 1
//

  File: MN.1.F.#.1ii7.d1ii7a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1ii7; DOM d1ii7a_; LIG MN;
XX
DE    MANGANESE (II) ION
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   3
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   7 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 1
XX
GA   1 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   38 ; 1
//

  File: PO4.1.F.#.2hhb...sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 2hhb; DOM .; LIG PO4;
XX
DE    PHOSPHATE ION
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   2
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   V ; 1
XX
GA   0 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   L ; 1
XX
GA   79 ; 1
//

  File: POP.1.F.#.1cs4.d1cs4a_.sig

TY   LIGAND
XX
TS   1D
XX
ID   PDB 1cs4; DOM d1cs4a_; LIG POP;
XX
DE    PYROPHOSPHATE 2-
XX
IS   SN 1; NS 1
XX
IP   PN 0; NP 0; MP 5; MG 2
XX
NP   6
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 1
XX
GA   1 ; 1
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   I ; 1
XX
GA   0 ; 1
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   E ; 1
XX
GA   0 ; 1
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   G ; 1
XX
GA   0 ; 1
XX
NN   [5]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 1
XX
GA   0 ; 1
XX
NN   [6]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   T ; 1
XX
GA   0 ; 1
//

  File: siggenlig.log

5.0 DATA FILES

   SIGGENLIG does not use any data files.

6.0 USAGE

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-confile]           infile     This option specifies the name of the CON
                                  file (contact file) (input). A 'contact
                                  file' contains contact data for a protein or
                                  a domain from SCOP or CATH, in the CON
                                  format (EMBL-like). The contacts may be
                                  intra-chain residue-residue, inter-chain
                                  residue-residue or residue-ligand. The files
                                  are generated by using CONTACTS, INTERFACE
                                  and SITES.
   -ccfpdir            directory  This option specifies the location of
                                  protein CCF file (clean coordinate files)
                                  (input). A 'clean cordinate file' contains
                                  protein coordinate and derived data for a
                                  single PDB file ('protein clean coordinate
                                  file') or a single domain from SCOP or CATH
                                  ('domain clean coordinate file'), in CCF
                                  format (EMBL-like). The files, generated by
                                  using PDBPARSE (PDB files) or DOMAINER
                                  (domains), contain 'cleaned-up' data that is
                                  self-consistent and error-corrected.
                                  Records for residue solvent accessibility
                                  and secondary structure are added to the
                                  file by using PDBPLUS. default: "./
   -ccfddir            directory  This option specifies the location of
                                  protein CCF file (clean coordinate files)
                                  (input). A 'clean cordinate file' contains
                                  protein coordinate and derived data for a
                                  single PDB file ('protein clean coordinate
                                  file') or a single domain from SCOP or CATH
                                  ('domain clean coordinate file'), in CCF
                                  format (EMBL-like). The files, generated by
                                  using PDBPARSE (PDB files) or DOMAINER
                                  (domains), contain 'cleaned-up' data that is
                                  self-consistent and error-corrected.
                                  Records for residue solvent accessibility
                                  and secondary structure are added to the
                                  file by using PDBPLUS. default: "./
   -mode               menu       This option specifies the mode of signature
                                  generation. In 'Full-length signature mode'
                                  (mode 1) a single signature incorporating
                                  all residue positions that contact the
                                  ligand plus intervening gaps is generated.
                                  In 'Patch signature mode' (mode 2) one or
                                  more signatures corresponding to 'patches'
                                  of residue positions are generated. A patch
                                  is a set of residues that are
                                  near-neighbours in sequence and is described
                                  by two user-defined parameters: minimum
                                  patch size and maximum gap distance.
   -type               menu       This option specifies the type of signature
                                  generated. In '1D (sequence) signature'
                                  sequence-based signatures are generated. In
                                  '3D (structural) signature' structure-based
                                  signatures are generated.
*  -patchsize          integer    This option specifies the minimum patch
                                  size. This is the minimum number of contact
                                  positions that must be incorporated in a
                                  signature.
*  -gapdistance        integer    This option specifies the maximum gap
                                  distance. This is the maximum allowable gap
                                  (residues) between two residue in a patch.
                                  If two contact residues are further than
                                  this distance apart in sequence, they would
                                  not belong to the same patch.
*  -environment        menu       This option specifies the environment
                                  definition. See matgen3d documentation for
                                  description of definitions.
   -wsiz               integer    This option specifies the window size. When
                                  a signature is aligned to a protein
                                  sequence, the permissible gaps between two
                                  signature positions is determined by the
                                  empirical gaps and the window size. The user
                                  is prompted for a window size that is used
                                  for every position in the signature. Likely
                                  this is not optimal. A future implementation
                                  will provide a range of methods for
                                  generating values of window size depending
                                  upon the alignment (window size is
                                  identified by the WSIZ record in the
                                  signature output file).
  [-sigoutdir]         outdir     This option specifies the location of
                                  signature files (output). A 'signature file'
                                  contains a sparse sequence signature
                                  suitable for use with the SIGSCAN and
                                  SIGSCANLIG programs. The files are generated
                                  by using SIGGEN and SIGGENLIG". default:
                                  "./
  [-logfile]           outfile    Name of siggenlig logfile

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-logfile" associated qualifiers
   -odirectory3        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths

  6.1 COMMAND-LINE ARGUMENTS

   Standard (Mandatory) qualifiers Allowed values Default
   [-confile]
   (Parameter 1) This option specifies the name of the CON file (contact
   file) (input). A 'contact file' contains contact data for a protein or
   a domain from SCOP or CATH, in the CON format (EMBL-like). The
   contacts may be intra-chain residue-residue, inter-chain
   residue-residue or residue-ligand. The files are generated by using
   CONTACTS, INTERFACE and SITES. Input file Required
   -ccfpdir This option specifies the location of protein CCF file (clean
   coordinate files) (input). A 'clean cordinate file' contains protein
   coordinate and derived data for a single PDB file ('protein clean
   coordinate file') or a single domain from SCOP or CATH ('domain clean
   coordinate file'), in CCF format (EMBL-like). The files, generated by
   using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up'
   data that is self-consistent and error-corrected. Records for residue
   solvent accessibility and secondary structure are added to the file by
   using PDBPLUS. default: "./ Directory
   -ccfddir This option specifies the location of protein CCF file (clean
   coordinate files) (input). A 'clean cordinate file' contains protein
   coordinate and derived data for a single PDB file ('protein clean
   coordinate file') or a single domain from SCOP or CATH ('domain clean
   coordinate file'), in CCF format (EMBL-like). The files, generated by
   using PDBPARSE (PDB files) or DOMAINER (domains), contain 'cleaned-up'
   data that is self-consistent and error-corrected. Records for residue
   solvent accessibility and secondary structure are added to the file by
   using PDBPLUS. default: "./ Directory
   -mode This option specifies the mode of signature generation. In
   'Full-length signature mode' (mode 1) a single signature incorporating
   all residue positions that contact the ligand plus intervening gaps is
   generated. In 'Patch signature mode' (mode 2) one or more signatures
   corresponding to 'patches' of residue positions are generated. A patch
   is a set of residues that are near-neighbours in sequence and is
   described by two user-defined parameters: minimum patch size and
   maximum gap distance.
   1 (Full-length signature mode)
   2 (Patch signature mode)
   1
   -type This option specifies the type of signature generated. In '1D
   (sequence) signature' sequence-based signatures are generated. In '3D
   (structural) signature' structure-based signatures are generated.
   1 (1D (sequence) signature)
   2 (3D (structural) signature)
   1
   -patchsize This option specifies the minimum patch size. This is the
   minimum number of contact positions that must be incorporated in a
   signature. Integer 3 or more 5
   -gapdistance This option specifies the maximum gap distance. This is
   the maximum allowable gap (residues) between two residue in a patch.
   If two contact residues are further than this distance apart in
   sequence, they would not belong to the same patch. Integer 0 or more 2
   -environment This option specifies the environment definition. See
   matgen3d documentation for description of definitions.
   1  (Env1)
   2  (Env2)
   3  (Env3)
   4  (Env4)
   5  (Env5)
   6  (Env6)
   7  (Env7)
   8  (Env8)
   9  (Env9)
   10 (Env10)
   11 (Env11)
   12 (Env12)
   13 (Env13)
   14 (Env14)
   15 (Env15)
   16 (Env16)
   1
   -wsiz This option specifies the window size. When a signature is
   aligned to a protein sequence, the permissible gaps between two
   signature positions is determined by the empirical gaps and the window
   size. The user is prompted for a window size that is used for every
   position in the signature. Likely this is not optimal. A future
   implementation will provide a range of methods for generating values
   of window size depending upon the alignment (window size is identified
   by the WSIZ record in the signature output file). Any integer value 0
   [-sigoutdir]
   (Parameter 2) This option specifies the location of signature files
   (output). A 'signature file' contains a sparse sequence signature
   suitable for use with the SIGSCAN and SIGSCANLIG programs. The files
   are generated by using SIGGEN and SIGGENLIG". default: "./ Output
   directory
   [-logfile]
   (Parameter 3) Name of siggenlig logfile Output file siggenlig.log
   Additional (Optional) qualifiers Allowed values Default
   (none)
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

  6.2 EXAMPLE SESSION

   An example of interactive use of SIGGENLIG is shown below. Here is a
   sample session with siggenlig


% siggenlig 
Generate ligand-binding signatures from a CON file.
Name of CON file (contact files) (input): ../sites-keep/SITES.con
Location of protein CCF files (clean coordinate files) (input). [.]: ../pdbplus
-keep
Location of domain CCF files (clean coordinate files) (input). [.]: ../domainer
-keep
Available modes
         1 : Full-length signature mode
         2 : Patch signature mode
Select mode of operation. [1]: 1
Available types
         1 : 1D (sequence) signature
         2 : 3D (structural) signature
Select type of signature. [1]: 1
Window size [0]: 
Name of directory of signature files (output) [.]: 
Name of siggenlig logfile [siggenlig.log]: 

   Go to the input files for this example
   Go to the output files for this example

7.0 KNOWN BUGS & WARNINGS

   Window size
   The user is prompted for a window size that is used for every position
   in the signature. Likely this is not optimal. A future implementation
   will provide a range of methods for generating values of window size
   depending upon the alignment (window size is identified by the WSIZ
   record in the signature output file).

8.0 NOTES

  8.1 GLOSSARY OF FILE TYPES

   FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
   Clean coordinate file (for domain) CCF format (EMBL-like). Protein
   coordinate and derived data for a single domain from SCOP or CATH. The
   data are 'cleaned-up': self-consistent and error-corrected. DOMAINER
   Records for residue solvent accessibility and secondary structure are
   added to the file by using PDBPLUS.
   Clean coordinate file (for domain) CCF format (EMBL-like). Protein
   coordinate and derived data for a single domain from SCOP or CATH. The
   data are 'cleaned-up': self-consistent and error-corrected. DOMAINER
   Records for residue solvent accessibility and secondary structure are
   added to the file by using PDBPLUS.
   Contact file (residue-ligand contacts) CON format (EMBL-like.)
   Residue-ligand contact data for a protein or a domain from SCOP or
   CATH. SITES N.A.
   Signature file SIG format Contains a sparse sequence signature
   suitable for use with the SIGSCAN program. Contains a sparse sequence
   signature. SIGGEN, LIBGEN The files are generated by using SIGGENLIG.

9.0 DESCRIPTION

   A protein or a single domain of a protein may contain one or more
   ligand-binding sites. SIGGENLIG provides an automated means to
   generate signatures of ligand-binding from a CON file (contacts file)
   of residue-ligand contacts. The signatures generated may be of two
   types, 1D or 3D. 1D signatures represent protein sites as residue
   identitites whereas 3D signatures represent sites as residue
   environments in space.

10.0 ALGORITHM

   The user specifies whether 1D (sequence) or 3D (structural) signatures
   are generated and whether they are 'full-length' (signature
   corresponds to entire ligand-binding site) or 'patch' (signature
   corresponds to part of ligand-binding site). For 3D signatures, the
   environment definition is specified and for patch signatures, a
   'Minimum patch size' and 'Maximum gap distance' are specified. A
   'Window size' is specified for all signatures.
   Definition of full-length and patch signatures
   For each ligand-binding site represented in the CON file (input), one
   or more signatures are generated as follows: (1) a single
   'full-length' signature incorporating all residue positions that
   contact the ligand plus intervening gaps, or (2) one or more
   signatures corresponding to 'patches' of residue positions.
   A patch is a set of residues that are near-neighbours in sequence and
   is described by two user-defined parameters as follows. (1) Minimum
   patch size; the minimum number of contact positions that must be
   incorporated for a patch to be defined. (2) Maximum gap distance. The
   maximum allowable gap (residues) between two residue in a patch. If
   two contact residues are further than this distance apart in sequence,
   they would not belong to the same patch.
   Environment definitions. 
   See MATGEN3d documentation for environment definitions.
   Naming of output files. The naming convention of the signature
   (output) files is as follows:
   Ligand identifier.Site number.F or P.Patch number-Total patches.PDB
   identifier.Domain identifier.
     * The ligand identifier is a 3-letter code that uniquely identifies
       the ligand.
     * The site number uniquely identifies a ligand-binding site in the
       CON file generated by running SITES.
     * F signifies a 'Full-length' and P a 'Patch' signature.
     * Patch number is the sequential order (from the N-terminus) of the
       patch in the ligand-binding site.
     * Total patches is the total number of patches for this
       ligand-binding site. The patch number and total number of patches
       are only specified for a 'patch'.
     * PDB identifier code.
     * Domain identifier code. Only specified if the binding site is
       mapped to SCOP.

   For example,
   101.1.P.1-1.1cs4.d1cs4a_.sig

11.0 RELATED APPLICATIONS

See also

   Program name                       Description
   contactcount Count specific versus non-specific contacts
   contacts     Generate intra-chain CON files from CCF files
   domainalign  Generate alignments (DAF file) for nodes in a DCF file
   domainrep    Reorder DCF file to identify representative structures
   domainreso   Remove low resolution domains from a DCF file
   interface    Generate inter-chain CON files from CCF files
   libgen       Generate discriminating elements from alignments
   matgen3d     Generate a 3D-1D scoring matrix from CCF files
   psiphi       Phi and psi torsion angles from protein coordinates
   rocon        Generates a hits file from comparing two DHF files
   rocplot      Performs ROC analysis on hits files
   scorecmapdir Contact scores for cleaned protein chain contact files
   seqalign     Extend alignments (DAF file) with sequences (DHF file)
   seqfraggle   Removes fragment sequences from DHF files
   seqsearch    Generate PSI-BLAST hits (DHF file) from a DAF file
   seqsort      Remove ambiguous classified sequences from DHF files
   seqwords     Generates DHF files from keyword search of UniProt
   siggen       Generates a sparse protein signature from an alignment
   sigscan      Generate hits (DHF file) from a signature search
   sigscanlig   Search ligand-signature library & write hits (LHF file)

13.0 DIAGNOSTIC ERROR MESSAGES

   None.

14.0 AUTHORS

   Jon Ison (jison  hgmp.mrc.ac.uk)
   HGMP-RC, Genome Campus, Hinxton, Cambridge CB10 1SB, UK
   Waqas Awan (mblades@rfcgr.mrc.ac.uk)
   MRC Rosalind Franklin Centre for Genomics Research, Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

14.0 REFERENCES

   Please cite the authors and EMBOSS.
   Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European
   Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.
   
   See also http://emboss.sourceforge.net/

  14.1 Other useful references
