
                           sigscan documentation
                                      
   

CONTENTS

   1.0 SUMMARY 
   2.0 INPUTS & OUTPUTS 
   3.0 INPUT FILE FORMAT 
   4.0 OUTPUT FILE FORMAT 
   5.0 DATA FILES 
   6.0 USAGE 
   7.0 KNOWN BUGS & WARNINGS 
   8.0 NOTES 
   9.0 DESCRIPTION 
   10.0 ALGORITHM 
   11.0 RELATED APPLICATIONS 
   12.0 DIAGNOSTIC ERROR MESSAGES 
   13.0 AUTHORS 
   14.0 REFERENCES 

1.0 SUMMARY

   Generates a DHF (domain hits file) of hits (sequences) from scanning a
   signature against a sequence database. Generate hits (DHF file) from a
   signature search

2.0 INPUTS & OUTPUTS

   SIGSCAN reads a signature from a protein signature file, scans the
   signature against a protein sequence database and generates a DHF file
   (domain hits file) of hits to database sequences and a DAF file
   (domain alignment file) of corresponding signature-sequence
   alignments. The names of the signature file, DHF file and DAF file are
   provided by the user. The user specifies a maximum number of
   high-scoring hits that will be generated.

3.0 INPUT FILE FORMAT

   The format of the signature file is described in SIGGEN documentation.

  Input files for usage example

  File: ../siggen-keep/54894.sig

TY   SCOP
XX
TS   1D
XX
CL   Alpha and beta proteins (a+b)
XX
FO   Ferredoxin-like
XX
SF   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
FA   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
XX
SI   54894
XX
NP   15
XX
NN   [1]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   H ; 2
XX
GA   12 ; 2
XX
NN   [2]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   1 ; 2
XX
NN   [3]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   26 ; 2
XX
NN   [4]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   F ; 2
XX
GA   16 ; 2
XX
NN   [5]
XX


  [Part of this file has been deleted for brevity]

XX
GA   4 ; 2
XX
NN   [10]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   D ; 2
XX
GA   2 ; 2
XX
NN   [11]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   N ; 2
XX
GA   0 ; 2
XX
NN   [12]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   Y ; 2
XX
GA   0 ; 2
XX
NN   [13]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   G ; 2
XX
GA   3 ; 2
XX
NN   [14]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   3 ; 2
XX
NN   [15]
XX
IN   NRES 1 ; NGAP 1 ; WSIZ 0
XX
AA   P ; 2
XX
GA   2 ; 2
//

  File: swsmall

> Q9WVI4
DDVTMLFSDIVGFTAICAQCTPMQVISMLNELYTRFDHQCGFLDIYKVETIGDAYCVASG
LHRKSLCHAKPIALMALKMMELSEEVLTPDGRPIQMRIGIHSGSVLAGVVGVRMPRYCLF
GNNVTLASKFESGSHPRRINISPTTYQLL
> Q9ERL9
VTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLH
RESDTHAVQIALMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGN
NVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> Q9DGG6
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAG
CPEPRADHAYCCIEMGLGMIKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVW
SNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKVTERVGQSAVADQLKGLKTYL
I
> Q99396
KELADPVTLIFTDIESSTAQWATQPELMPDAVATHHSMVRSLIENYDCYEVKTVGDSFMI
ACKSPFAAVQLAQELQLRFLRLDWGTTVFDEFYREFEERHAEEGDGKYKPPTARLDPEVY
RQLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGQTANTAARTESVGNGGQVLMTCETYHS
LSTAERSQFDVTPLGGVPLRGVSEPVEVYQLN
> Q99280
NDSAPKEPTGPVTLIFTDIESSTALWAAHPDLMPDAVATHHRLIRSLITRYECYEVKTVG
DSFMIASKSPFAAVQLAQELQLRFLRLDWETNALDESYREFEEQRAEGECEYTPPTAHMD
PEVYSRLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGRTSNMAARTESVANGGQVLMTHA
AYMSLSGEDRNQLDVTTLGATVLRGVPEPVRMYQLN
> Q99279
NNNRAPKEPTDPVTLIFTDIESSTALWAAHPDLMPDAVAAHHRMVRSLIGRYKCYEVKTV
GDSFMIASKSPFAAVQLAQELQLCFLHHDWGTNALDDSYREFEEQRAEGECEYTPPTAHM
DPEVYSRLWNGLRVRVGIHTGLCDIIRHDEVTKGYDYYGRTPNMAARTESVANGGQVLMT
HAAYMSLSAEDRKQIDVTALGDVALRGVSDPVKMYQLN
> Q91WF3
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTY
MAATGLNATSGQDTQQDSERSCSHLGTMVEFAVALGSKLGVINKHSFNNFRLRVGLNHGP
VVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEETARAL
> Q91WF3
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKIL
GDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRVATGVDINMRVGVHSGSVLCGVIG
LQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q8VHH7
NNFMLRIGMNKGGVLAGVIGARKPHYDIWGNTVNVASRMESTGVMGNIQVVEET
> Q8VHH7
FNTMYMYRHENVSILFADIVGFTQLSSACSAQELVKLLNELFARFDKLAAKYHQLRIKIL
GDCYYCICGLPDYREDHAVCSILMGLAMVEAISYVREKTKTGVDMRVGVHTGTVLGGVLG
QKRWQYDVWSTDVTVANKMEAGGIPGRVHISQSTMDCLKGEFDVEPGDGGSRCDYLDEKG
IETYLI
> Q8NFM4
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTY
MAATGLNATSGQDAQQDAERSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGP
VVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVTEET
> Q8NFM4
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKIL
GDCYYCVSGLPLSLPDHAINCVRMGLDMCRAIRKLRAATGVDINMRVGVHSGSVLCGVIG


  [Part of this file has been deleted for brevity]

> Q83IL8
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSE
EQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> Q7P144
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTE
EQANELALFAPKATVNVIDNFEVVKKHKLTLP
> Q7MZ14
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTE
QQANQLAMYAPNATVNCIENYEVVKKLPINLP
> Q7MX57
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEE
EELNRIALIAPNVRLNIIRDYEVVEKRQVEVP
> Q7MHF0
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINE
EQASKLALYAPHATVNQIEDYQVVKKLALELP
> Q58801
VKKITNGTVIDHIDAGKALMVFKVLNVPKETSVMIAINVPSKKKGKKDILKIEGIELKKE
DVDKISLISPDVTINIIRNGKVVEKLKPQIP
> P96175
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITK
SQANQLALLAPNATVNIIENFKVTDKHSLALP
> P96111
GIKPIENGTVIDHIAKGKTPEEIYSTILKIRKILRLYDVDSADGIFRSSDGSFKGYISLP
DRYLSKKEIKKLSAISPNTTVNIIKNSTVVEKYRIKLP
> P77919
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFL
SEEEVNKIALVAPNATVNIIRDYKVVEKFKVEVP
> P74766
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEIS
DTEANLITLIAPTATINIVREYEVVKKTKLEVP
> P57451
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSD
EQINQLAIYAPHATVNYINEYNLVRKVFPTLP
> P19936
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTE
QQANQLAMYAPKATVNRIDNYEVVRKLTLSLP
> P08421
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTE
EQVNQLALYAPQATVNRIDNYDVVGKSRPSLP
> P00478
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSE
DQVDQLALYAPQATVNRIDNYEVVGKSRPSLP
> O58452
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFL
SEEEVNKIALVAPTATVNIIRNYKVVEKFKVEVP
> O30129
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIR
DEELNKIALISPNATINLIRDYEIERKFKVSPP
> O26938
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKP
SEVDQIALIAPRATINIVRDYKIVEKAKVRL

4.0 OUTPUT FILE FORMAT

   DHF file (domain hits file)
   The format of the DHF file (domain hits file) of hit sequences
   generated by SIGSCAN (Figure 1) is described fully in SEQSEARCH
   documentation and only summarised here. The file contains two lines
   per hit, the first is a description of the hit in 16 text tokens
   delimited by '^'. The second line contains the protein sequence. The
   first 4 tokens refer to the hit (sequence) itself, the tokens are
     * (i) Accession number
     * (ii) Database code,
     * (iii - iv) Start and end positions of the hit relative to the full
       length sequence.

   The next 9 tokens refer to the domain family, superfamily etc for
   which the signature was derived and are as follows:
     * (v) Type of domain (one of 'SCOP' or 'CATH'),
     * (vi) SCOP or CATH domain identifier.
     * (vii) SCOP or CATH node unique identifier, e.g. SCOP Family Sunid.
     * (viii) Domain class. Textual description of the 'Class' (SCOP and
       CATH domains).
     * (ix) Domain architecture. Textual description of the
       'Architecture' (CATH only).
     * (x) Domain topology. Textual description of the 'Topology' (CATH
       only).
     * (xi) Domain fold. Textual description of the 'Fold' (SCOP domains
       only).
     * (xii) Domain superfamily. Textual description of the 'Superfamily'
       (SCOP and CATH domains).
     * (xiii) Domain family. Textual description of the 'Fold' (SCOP
       only).

   The next 4 tokens refer to the hit, specifically, information about
   the search result as follows:
     * (xiv) Model type. The type of model that was used to generate the
       hit. For DHF files generated by using SIGSCAN a value of SPARSE
       (sparse protein signature) is given. Several other values are
       possible, however, see SEQSEARCH documentation.
     * (xv) SC - Score of hit from search algoritm (not written by
       SIGSCAN).
     * (xvi) P-value of hit (not written by SIGSCAN).
     * (xvii) E-value of hit (not written by SIGSCAN).

   DAF file (domain alignment file)
   The format of the DAF file (domain alignment file, Figure 2) generated
   by SIGSCAN is described fully in DOMAINALIGN documentation and is only
   summarised here.
   It conforms to EMBOSS "simple" multiple sequence alignment format and
   includes domain classification records (in comment lines beginning
   with '#') for the node for which the signature was generated. The
   classification records are TY (domain type, either SCOP or CATH), CL
   (class), FO (fold), SF (superfamily) and FA (family). For CATH
   domains, AR (architecture) and TP (topology) may also be given. A
   unique identifier for the node is given after SI.
   There are multiple blocks that contain the accession numbers,
   positions and aligned sequences. An accession number is given for each
   hit. The positions are the start and end residue positions of the
   appropriate section of sequence. The sequence uses '-' as a gap
   character. A 'SIGNATURE' line is given as a markup line underneath the
   sequence (signature positions are marked with a '*').

  Output files for usage example

  File: SIGSCAN.dhf

> P00478^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.67^0.000e+00
^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEDQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> Q83IL8^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.40^0.000e+00
^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEEQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> Q8Z130^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.07^0.000e+00
^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> P08421^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^4.07^0.000e+00
^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTEEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> Q8K9H8^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.67^0.000e+00
^0.000e+00
VEAIKSGSVIDHIPAHIGFKLLSLFRFTETEKRITIGLNLPSQKLDKKDIIKIENTFLSDDQINQLAIYAPCATVNYIE
KYNLVGKIFPSLP
> Q9HKM3^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^3.20^0.000e+00
^0.000e+00
ISKIRDGTVIDHVPSGKGIRVIGVLGVHEDVNYTVSLAIHVPSNKMGFKDVIKIENRFLDRNELDMISLIAPNATISII
KNYEISEKFQVELP
> Q9HHN3^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.87^0.000e+00
^0.000e+00
VSKIQAGTVIDHIPAGQALQVLQILGTNGASDDQITVGMNVTSERHHRKDIVKIEGRELSQDEVDVLSLIAPDATINIV
RDYEVDEKRRVDRP
> P57451^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.80^0.000e+00
^0.000e+00
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSDEQINQLAIYAPHATVNYIN
EYNLVRKVFPTLP
> Q8ZB38^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.73^0.000e+00
^0.000e+00
VEAIKCGTVIDHIPAQIGFKLLSLFKLTATDQRITIGLNLPSKRSGRKDLIKIENTFLTEQQANQLAMYAPDATVNRID
NYEVVKKLTLSLP
> P19936^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.73^0.000e+00
^0.000e+00
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTEQQANQLAMYAPKATVNRID
NYEVVRKLTLSLP
> Q97B28^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.60^0.000e+00
^0.000e+00
ISKIKDGTVIDHIPSGKALRVLSILGIRDDVDYTVSVGMHVPSSKMEYKDVIKIENRSLDKNELDMISLTAPNATISII
KNYEISEKFKVELP
> Q87LF7^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.53^0.000e+00
^0.000e+00
VEAIKNGTVIDHIPAQIGIKVLKLFDMHNSSQRVTIGLNLPSSALGHKDLLKIENVFINEEQASKLALYAPHATVNQIE
NYEVVKKLALELP
> Q7MZ14^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.47^0.000e+00
^0.000e+00
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTEQQANQLAMYAPNATVNCIE
NYEVVKKLPINLP
> Q8ZTG2^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.33^0.000e+00
^0.000e+00
VSKIENGTVIDHIPAGRALTVLRILGISGKEGLRVALVMNVESKKLGKKDIVKIEGRELTPEEVNIISAVAPTATINII
RNFAVVKKFKVTPP
> Q9KP65^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.33^0.000e+00
^0.000e+00
VEAIKNGTVIDHIPAKVGIKVLKLFDMHNSAQRVTIGLNLPSSALGSKDLLKIENVFISEAQANKLALYAPHATVNQIE
NYEVVKKLALQLP
> O30129^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.27^0.000e+00
^0.000e+00
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIRDEELNKIALISPNATINLI
RDYEIERKFKVSPP
> Q8D1W6^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.27^0.000e+00
^0.000e+00
VEAIFGGTVIDHIPAQVGLKLLSLFKWLHTKERITMGLNLPSNQQKKKDLIKLENVLLNEDQANQLSIYAPLATVNQIK
NYIVIKKQKLKLP
> Q7MHF0^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.20^0.000e+00
^0.000e+00
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q8DCF7^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.20^0.000e+00
^0.000e+00
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q9UX07^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^2.07^0.000e+00
^0.000e+00
VSKIRNGTVIDHIPAGRALAVLRILGIRGSEGYRVALVMNVESKKIGRKDIVKIEDRVIDEKEASLITLIAPSATINII
RDYVVTEKRHLEVP
> Q9K1K9^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00
^0.000e+00
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
NFKVVQKRHLNLP
> O58452^.^12^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00
^0.000e+00
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IRNYKVVEKFKVEVP
> P74766^.^11^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.93^0.000e+00
^0.000e+00
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEISDTEANLITLIAPTATINIV
REYEVVKKTKLEVP
> Q7P144^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.80^0.000e+00
^0.000e+00
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTEEQANELALFAPKATVNVID
NFEVVKKHKLTLP
> Q9JWY6^.^10^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^1.60^0.000e+00
^0.000e+00
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
HFKVVQKRHLNLP


  [Part of this file has been deleted for brevity]

VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGRLHACEV
ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALKIHLSSETKAVL
EEFGGFELEL
> Q891I9^.^9^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+00^
0.000e+00
ITSIKDGIVIDHIKSGYGIKIFNYLNLKNVEYSVALIMNVFSSKLGKKDIIKIANKEIDIDFTVLGLIDPTITINIIED
EKIKEKLNLELP
> P18293^.^38^119^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+0
0^0.000e+00
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGQLHAREV
ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALRIHLSSETKAVL
EEFDGFELEL
> O02740^.^32^113^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+0
0^0.000e+00
DLVTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPKRNGMRHAAEIANMSLD
ILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSHSTVTILRTLGEG
YEVE
> P51841^.^32^113^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+0
0^0.000e+00
DLVTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPKRNGSRHAAEIANMSLD
ILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSLSTVTILQNLSEG
YEVE
> Q8NFM4^.^49^130^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+0
0^0.000e+00
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATSGQDAQQDAE
RSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGPVVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVT
EET
> P46197^.^38^119^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.07^0.000e+0
0^0.000e+00
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAIIDNFDVYKVETIGDAYMVVSGLPGRNGQRHAPEI
ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGQALKIHVSSTTKDAL
DELGCFQLEL
> P19686^.^65^146^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VQAKKFNEVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHRESDTHAVQIA
LMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDC
PG
> P19687^.^66^147^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
AVQAKRFGNVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDRQCGELDVYKVETIGDAYCVAGGLHKESDTHAVQI
ALMALKMMELSHEVVSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKD
CPG
> O60503^.^60^141^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIK
AIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKV
IERLGQSVVADQLKGLKTYLI
> P97490^.^75^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
DAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEIIADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCEDKW
GHLCALADFSLALTESIQEINKHSFNNFELRIGISHGSVVAGVIGAKKPQYDIWGKTVNLASRMDSTGVSGRIQVPEET
YLIL
> P51830^.^60^141^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEQTKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIK
AIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGRV
IERLGQSVVADQLKGLKTYLI
> Q02108^.^65^146^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VQAKKFSNVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHKESDTHAVQIA
LMALKMMELSDEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDC
PG
> Q9DGG6^.^62^143^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGM
IKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDG
KVTERVGQSAVADQLKGLKTYLI
> Q9ERL9^.^57^138^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHRESDTHAVQIALMALKMME
LSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> P30803^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+00^
0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRVLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHI
KALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQV
L
> P40145^.^75^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
DAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEIIADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCEDKW
GHLCALADFSLALTESIQEINKHSFNNFELRIGISHGSVVAGVIGAKKPQYDIWGKTVNLASRMDSTGVSGRIQVPEET
YLIL
> O19179^.^30^111^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMALDIL
SAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRILHALDEGFQ
TEV
> P51840^.^30^111^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMSLDIL
SAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRIL
> P40146^.^75^156^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
DAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEIIADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCEDKW
GHLCALADFSLALTESIQEINKHSFNNFELRIGISHGSVVAGVIGAKKPQYDIWGKTVNLASRMDSTGVSGRIQVPEET
YLIL
> P52785^.^30^111^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGAHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMSLDIL
SAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRILRSLDQGFQ
ME
> P98999^.^62^143^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-lik
e^Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate
 carbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^0.00^0.000e+0
0^0.000e+00
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRPDHAYCCIEMGLGM
IEAIDQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEKTARYLD
> P30804^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00
^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQAGRSHI
TALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQV
L
> Q01341^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00
^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVGRSHI
TALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQV
L
> Q03343^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00
^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVGRSHI
TALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQV
L
> O95622^.^0^81^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^SPARSE^-0.07^0.000e+00
^0.000e+00
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHI
KALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQV
L

  File: SIGSCAN.aln

# DE   Results of signature search
# XX
# TY   SCOP
# XX
# CL   Alpha and beta proteins (a+b)
# XX
# FO   Ferredoxin-like
# XX
# SF   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# FA   Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain
# XX
# SI   54894
# XX
P00478    1      VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
P00478    54     ENTFLSEDQVDQLALYAPQATVNRIDNYEVVGKSRPSLP               106
SIGNATURE -      ---*---*--*----*-*----*--***---*---*--*
P00478    107    .                                                     159
SIGNATURE -      .
P00478    160    .                                                     212
SIGNATURE -      .
P00478    213    .                                                     265
SIGNATURE -      .
# XX
Q83IL8    1      VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
Q83IL8    54     ENTFLSEEQVDQLALYAPQATVNRIDNYEVVGKSRPSLP               106
SIGNATURE -      ---*---*--*----*-*----*--***---*---*--*
Q83IL8    107    .                                                     159
SIGNATURE -      .
Q83IL8    160    .                                                     212
SIGNATURE -      .
Q83IL8    213    .                                                     265
SIGNATURE -      .
# XX
Q8Z130    1      VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
Q8Z130    54     ENTFLTDEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP               106
SIGNATURE -      ---*---*--*----*-*----*--***---*---*--*
Q8Z130    107    .                                                     159
SIGNATURE -      .
Q8Z130    160    .                                                     212
SIGNATURE -      .
Q8Z130    213    .                                                     265
SIGNATURE -      .
# XX
P08421    1      VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKI 53
SIGNATURE -      ----------*-*--------------------------*-------------
P08421    54     ENTFLTEEQVNQLALYAPQATVNRIDNYDVVGKSRPSLP               106


  [Part of this file has been deleted for brevity]

SIGNATURE -      ---------*-*--------------------------*--------------
P98999    107    CGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEKTARYLD          159
SIGNATURE -      --*---*--*----*-*----*--***---*---*--*------
P98999    160    .                                                     212
SIGNATURE -      .
P98999    213    .                                                     265
SIGNATURE -      .
# XX
P30804    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
P30804    54     KTIGSTYMAASGLNASTYDQAGRSHITALADYAMRLMEQMKHINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
P30804    107    KIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL 159
SIGNATURE -      -----------------------------------------------------
P30804    160    .                                                     212
SIGNATURE -      .
P30804    213    .                                                     265
SIGNATURE -      .
# XX
Q01341    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
Q01341    54     KTIGSTYMAASGLNASTYDQVGRSHITALADYAMRLMEQMKHINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
Q01341    107    KIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL 159
SIGNATURE -      -----------------------------------------------------
Q01341    160    .                                                     212
SIGNATURE -      .
Q01341    213    .                                                     265
SIGNATURE -      .
# XX
Q03343    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
Q03343    54     KTIGSTYMAASGLNASTYDQVGRSHITALADYAMRLMEQMKHINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
Q03343    107    KIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQVL 159
SIGNATURE -      -----------------------------------------------------
Q03343    160    .                                                     212
SIGNATURE -      .
Q03343    213    .                                                     265
SIGNATURE -      .
# XX
O95622    1      VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKI 53
SIGNATURE -      *-*--------------------------*----------------*---*--
O95622    54     KTIGSTYMAASGLNDSTYDKVGKTHIKALADFAMKLMDQMKYINEHSFNNFQM 106
SIGNATURE -      *----*-*----*--***---*---*--*------------------------
O95622    107    KIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQVL 159
SIGNATURE -      -----------------------------------------------------
O95622    160    .                                                     212
SIGNATURE -      .
O95622    213    .                                                     265
SIGNATURE -      .

5.0 DATA FILES

   SIGSCAN requires a residue substitution matrix.

6.0 USAGE

   Standard (Mandatory) qualifiers:
  [-siginfile]         infile     This option specifies the name of the
                                  signature file (input). A 'signature file'
                                  contains a sparse sequence signature
                                  suitable for use with the SIGSCAN and
                                  SIGSCANLIG programs. The files are generated
                                  by using SIGGEN and SIGGENLIG.
  [-dbsequence]        seqall     This option specifies the name of the
                                  database to search.
   -sub                matrixf    This option specifies the residue
                                  substitution matrix.
   -gapo               float      This option specifies the gap insertion
                                  penalty. The gap insertion penalty is the
                                  score taken away when a gap is created. The
                                  best value depends on the choice of
                                  comparison matrix. The default value assumes
                                  you are using the EBLOSUM62 matrix for
                                  protein sequences, and the EDNAMAT matrix
                                  for nucleotide sequences.
   -gape               float      This option specifies the gap extension
                                  penalty. The gap extension penalty is added
                                  to the standard gap penalty for each base or
                                  residue in the gap. This is how long gaps
                                  are penalized. Usually you will expect a few
                                  long gaps rather than many short gaps, so
                                  the gap extension penalty should be lower
                                  than the gap penalty.
   -nterm              menu       This option specifies the N-terminal
                                  matching option. This determines how the
                                  first signature position is aligned to a
                                  sequence from the database.
   -nhits              integer    This option specifies the maximum number of
                                  hits to output.
  [-hitsfile]          outfile    This option specifies the name of the DHF
                                  file (domain hits file) (output). A 'domain
                                  hits file' contains database hits
                                  (sequences) with domain classification
                                  information, in the DHF format (FASTA-like).
                                  The hits are relatives to a SCOP or CATH
                                  family (or other node in the structural
                                  hierarchies) and are found from a search of
                                  a sequence database, in this case, by using
                                  SIGSCAN. Files containing hits retrieved by
                                  PSIBLAST are generated by using SEQSEARCH or
                                  various types of HMM and profile by using
                                  LIBSCAN.
  [-alignfile]         outfile    Name of SAF file (signature alignment file)
                                  for signature-sequence alignments (output)
                                  help: "This option specifies the name of the
                                  SAF (signature alignment file) (output).A
                                  'signature alignment file' contains one or
                                  more signnature-sequence alignments. The
                                  file is in DAF format (CLUSTAL-like) and is
                                  annotated with bibliographic information,
                                  either the domain family classification (for
                                  SIGSCAN output) or ligand classification
                                  (for SIGSCANLIG output). The files generated
                                  by SIGSCAN will contain a
                                  signature-sequence alignment for a single
                                  signature against a library of one or more
                                  sequences. The files generated by using
                                  SIGSCANLIG will contain a signature-sequence
                                  alignment for a single query sequence
                                  against a library of one or more signatures.

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-dbsequence" associated qualifiers
   -sbegin2            integer    Start of each sequence to be used
   -send2              integer    End of each sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-hitsfile" associated qualifiers
   -odirectory3        string     Output directory

   "-alignfile" associated qualifiers
   -odirectory4        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths

  6.1 COMMAND LINE ARGUMENTS

   Standard (Mandatory) qualifiers Allowed values Default
   [-siginfile]
   (Parameter 1) This option specifies the name of the signature file
   (input). A 'signature file' contains a sparse sequence signature
   suitable for use with the SIGSCAN and SIGSCANLIG programs. The files
   are generated by using SIGGEN and SIGGENLIG. Input file Required
   [-dbsequence]
   (Parameter 2) This option specifies the name of the database to
   search. Readable sequence(s) Required
   -sub This option specifies the residue substitution matrix. Comparison
   matrix file in EMBOSS data path EBLOSUM62
   -gapo This option specifies the gap insertion penalty. The gap
   insertion penalty is the score taken away when a gap is created. The
   best value depends on the choice of comparison matrix. The default
   value assumes you are using the EBLOSUM62 matrix for protein
   sequences, and the EDNAMAT matrix for nucleotide sequences. Floating
   point number from 1.0 to 100.0 10.0 for any sequence
   -gape This option specifies the gap extension penalty. The gap
   extension penalty is added to the standard gap penalty for each base
   or residue in the gap. This is how long gaps are penalized. Usually
   you will expect a few long gaps rather than many short gaps, so the
   gap extension penalty should be lower than the gap penalty. Floating
   point number from 0.0 to 10.0 0.5 for any sequence
   -nterm This option specifies the N-terminal matching option. This
   determines how the first signature position is aligned to a sequence
   from the database.
   1 (Align anywhere and allow only complete signature-sequence fit)
   2 (Align anywhere and allow partial signature-sequence fit)
   3 (Use empirical gaps only)
   1
   -nhits This option specifies the maximum number of hits to output. Any
   integer value 100
   [-hitsfile]
   (Parameter 3) This option specifies the name of the DHF file (domain
   hits file) (output). A 'domain hits file' contains database hits
   (sequences) with domain classification information, in the DHF format
   (FASTA-like). The hits are relatives to a SCOP or CATH family (or
   other node in the structural hierarchies) and are found from a search
   of a sequence database, in this case, by using SIGSCAN. Files
   containing hits retrieved by PSIBLAST are generated by using SEQSEARCH
   or various types of HMM and profile by using LIBSCAN. Output file
   SIGSCAN.dhf
   [-alignfile]
   (Parameter 4) Name of SAF file (signature alignment file) for
   signature-sequence alignments (output) help: "This option specifies
   the name of the SAF (signature alignment file) (output).A 'signature
   alignment file' contains one or more signnature-sequence alignments.
   The file is in DAF format (CLUSTAL-like) and is annotated with
   bibliographic information, either the domain family classification
   (for SIGSCAN output) or ligand classification (for SIGSCANLIG output).
   The files generated by SIGSCAN will contain a signature-sequence
   alignment for a single signature against a library of one or more
   sequences. The files generated by using SIGSCANLIG will contain a
   signature-sequence alignment for a single query sequence against a
   library of one or more signatures. Output file SIGSCAN.aln
   Additional (Optional) qualifiers Allowed values Default
   (none)
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

  6.2 EXAMPLE SESSION

   An example of interactive use of sigscan is shown below. Here is a
   sample session with sigscan


% sigscan 
Generate hits (DHF file) from a signature search.
Name of signature file (input): ../siggen-keep/54894.sig
Name of database to search.: swsmall
Residue substitution matrix [EBLOSUM62]: 
Gap insertion penalty [10]: 
Gap extension penalty [0.5]: 
N-terminal matching options
         1 : Align anywhere and allow only complete signature-sequence fit
         2 : Align anywhere and allow partial signature-sequence fit
         3 : Use empirical gaps only
Select number [1]: 
Max. number of hits to output [100]: 
Name of DHF file (domain hits file) (output) [SIGSCAN.dhf]: 
Name of SAF file (signature alignment file) for signature-sequence alignments (
output) help: 
"This option specifies the name of the SAF (signature alignment file) (output).
A 'signature alignment file' contains one or more signnature-sequence alignment
s. The file is in DAF format (CLUSTAL-like) and is annotated with bibliographic
 information, either the domain family classification (for SIGSCAN output) or l
igand classification (for SIGSCANLIG output). The files generated by SIGSCAN wi
ll contain a signature-sequence alignment for a single signature against a libr
ary of one or more sequences. The files generated by using SIGSCANLIG will cont
ain a signature-sequence alignment for a single query sequence against a librar
y of one or more signatures. [SIGSCAN.aln]: 


Signature file read ok
Signature compiled ok
Signature aligned to db ok
Hits file written ok
Alignments file written ok

   Go to the input files for this example
   Go to the output files for this example

7.0 KNOWN BUGS & WARNINGS

   None.

8.0 NOTES

   SIGSCAN does not generate p-values or E-values. DHF files of hits for
   which p-values or E-values are calculated may be generated by using
   LIBSCAN . LIBSCAN provides searches for sparse protein signatures as
   well as various types of hidden Markov models and other profiles.
   In the case a signature file is generated by hand, it is essential
   that the gap data given is listed in order of increasing gap size (see
   SIGGEN documentation ).

  8.1 GLOSSARY OF FILE TYPES

   FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
   Domain hits file DHF format (FASTA-like). Database hits (sequences)
   with domain classification information. The hits are relatives to a
   SCOP or CATH family (or other node in the structural hierarchies) and
   are found from a search of a discriminating element (e.g. a protein
   signature, hidden Markov model, simple frequency matrix, Gribskov
   profile or Hennikoff profile) against a sequence database. SEQSEARCH
   (hits retrieved by PSIBLAST). SIGSCAN (hits retrieved by sparse
   protein signature). LIBSCAN (hits retrieved by various types of HMM
   and profile). N.A.
   Domain alignment file DAF format (CLUSTAL-like). Sequence alignment of
   domains belonging to the same SCOP or CATH family (or other node in
   the structural hierarchies). The file is annotated with domain family
   classification information. DOMAINALIGN (structure-based sequence
   alignment of domains of known structure). DOMAINALIGN alignments can
   be extended with sequence relatives (of unknown structure) to the
   family in question by using SEQALIGN.
   Hits file Text file of classified hits A list of hits (e.g. from a
   prediction method) that are classified and rank-ordered on the basis
   of score, p-value, E-value etc. ROCON and LIBSCAN (hits from searches
   of a discriminating element (hidden Markov model, profile or
   signature) against a sequence database). ROCPLOT is run on the files
   to perform Receiver Operator Characteristic (ROC) analysis on the
   hits.
   Signature file SIG format Contains a sparse sequence signature
   suitable for use with the SIGSCAN program. Contains a sparse sequence
   signature. SIGGEN, SIGGENLIG, LIBGEN The files are generated by using
   SIGGEN.

   None

9.0 DESCRIPTION

   See Blades et al., Ison et al. and Daniel et al. for a description of
   protein signatures and their application.

10.0 ALGORITHM

   The algorithm is based on approach first described in Daniel et al
   (1999) that was applied to the definition of protein families (Ison et
   al, 2000) and later to automatically-generated signatures (Blades et
   al, 2005).

11.0 RELATED APPLICATIONS

See also

   Program name                       Description
   contactcount Count specific versus non-specific contacts
   contacts     Generate intra-chain CON files from CCF files
   domainalign  Generate alignments (DAF file) for nodes in a DCF file
   domainrep    Reorder DCF file to identify representative structures
   domainreso   Remove low resolution domains from a DCF file
   interface    Generate inter-chain CON files from CCF files
   libgen       Generate discriminating elements from alignments
   matgen3d     Generate a 3D-1D scoring matrix from CCF files
   psiphi       Phi and psi torsion angles from protein coordinates
   rocon        Generates a hits file from comparing two DHF files
   rocplot      Performs ROC analysis on hits files
   scorecmapdir Contact scores for cleaned protein chain contact files
   seqalign     Extend alignments (DAF file) with sequences (DHF file)
   seqfraggle   Removes fragment sequences from DHF files
   seqsearch    Generate PSI-BLAST hits (DHF file) from a DAF file
   seqsort      Remove ambiguous classified sequences from DHF files
   seqwords     Generates DHF files from keyword search of UniProt
   siggen       Generates a sparse protein signature from an alignment
   siggenlig    Generate ligand-binding signatures from a CON file
   sigscanlig   Search ligand-signature library & write hits (LHF file)

12.0 DIAGNOSTIC ERROR MESSAGES

   None.

13.0 AUTHORS

   Jon Ison (jison@rfcgr.mrc.ac.uk)
   MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

14.0 REFERENCES

   Please cite the authors and EMBOSS.
   Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European
   Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.
   
   See also http://emboss.sourceforge.net/
   Automatic generation and evaluation of sparse protein signatures for
   families of protein structural domains. MJ Blades, JC Ison, R
   Ranasinghe, and JBC Findlay. Protein Science. 2005 (accepted)
   A key residues approach to the definition of protein families and
   analysis of sparse family signatures. JC Ison, AJ Bleasby, MJ Blades,
   SC Daniel, JH Parish, JBC Findlay. PROTEINS: Structure, Function &
   Genetics. 2000, 40:330-341
   Alignment of a sparse protein signature with protein sequences:
   application to fold prediction for three small globulins. SC Daniel,
   JH Parish, JC Ison, MJ Blades & JBC Findlay. FEBS Letters. 1999,
   459:349-352.

  14.1 Other useful references
