
                            SEQNR documentation
                                      
   

CONTENTS

   1.0 SUMMARY 
   2.0 INPUTS & OUTPUTS 
   3.0 INPUT FILE FORMAT 
   4.0 OUTPUT FILE FORMAT 
   5.0 DATA FILES 
   6.0 USAGE 
   7.0 KNOWN BUGS & WARNINGS 
   8.0 NOTES 
   9.0 DESCRIPTION 
   10.0 ALGORITHM 
   11.0 RELATED APPLICATIONS 
   12.0 DIAGNOSTIC ERROR MESSAGES 
   13.0 AUTHORS 
   14.0 REFERENCES 

1.0 SUMMARY

   Removes redundancy from DHF files

2.0 INPUTS & OUTPUTS

   SEQNR removes redundancy from DHF files (domain hits files) or other
   files of sequences. A directory of DHF files (all sequences) is read
   and a directory of new DHF files (non-redundant sequences) plus
   (optionally) a second directory of DHF files (redundant sequences) is
   written. Optionally, up to two further directories of filter sequences
   may be read: these are considered in the redundancy calculation but
   never appear in the output files. Typically, one of the further
   directories contains DHF files each with a single sequence and the
   other DAF files (domain alignment files) each containing a sequence
   alignment, but any sequence(s) may be given. Each filter directory
   must contain a file for each file in main input directory and the
   files must have the same base name. For example, sequences from
   "family.dhf" and "family.daf" are considered for the input DHF file
   "family.hits".
   Redundancy is removed at either (i) a user-defined threshold of
   sequence similarity or (ii) a user-defined range of threshold sequence
   similarity. Files of sequences in any supported format may be read and
   written (not just DHF or DAF files). A log file is also written.
   The path for all files (input and output) are specified by the user.
   The file extensions are set in the ACD file. The name of the log file
   is set by the user.

3.0 INPUT FILE FORMAT

   The format of the domain hits file is described in SEQSEARCH
   documentation.
   The format of the domain alignment file is described in DOMAINALIGN
   documentation.
   If other sequences or sequence sets (aligned or unaligned) are used as
   input, all of the common file formats are supported.

4.0 OUTPUT FILE FORMAT

   The format of the domain hits file is described in SEQSEARCH
   documentation.

  Output files for usage example

  File: seqnr.log

//
/ebi/services/idata/pmr/hgmp/test/qa/seqfraggle-keep/55074.dhf
//
/ebi/services/idata/pmr/hgmp/test/qa/seqfraggle-keep/54894.dhf

  Directory: hitsnr

   This directory contains output files, for example 54894.dhf and
   55074.dhf.

  File: hitsnr/54894.dhf

> Q9YBD5^.^1^95^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^55.30^0.000e+00^2.00
0e-11
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q97FS4^.^1^90^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^42.60^0.000e+00^1.00
0e-07
INSIKNGIVIDHIKAGHGIKIYNYLKLGEAEFPTALIMNAISKKNKAKDIIKIENVMDLDLAVLGFLDPNITVNIIEDE
KIRQKIQLKLP
> Q7MX57^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^72.70^0.000e+00^1.00
0e-16
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEEEELNRIALIAPNVRLNIIR
DYEVVEKRQVEVP
> P96111^.^1^98^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^42.20^0.000e+00^1.00
0e-07
GIKPIENGTVIDHIAKGKTPEEIYSTILKIRKILRLYDVDSADGIFRSSDGSFKGYISLPDRYLSKKEIKKLSAISPNT
TVNIIKNSTVVEKYRIKLP

  File: hitsnr/55074.dhf

> Q08462^.^1^167^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^49.70^0.000e+00^3.000e-09
DCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEIIADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAVPSQEHSQE
PERQYMHIGTMVEFAFALVGKLDAINKHSFNDFKLRVGINHGPVIAGVIGAQKPQYDIWGNTVNVASRMDSTGVLDKIQ
VTEETSLVL
> Q03101^.^1^149^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^70.90^0.000e+00^1.000e-15
NNACVFFLDIAGFTRFSSIHSPEQVIQVLIKIFNSMDLLCAKHGIEKIKTIGDAYMATCGIFPKCDDIRHNTYKMLGFA
MDVLEFIPKEMSFHLGLQVRVGIHCGPVISGVISGYAKPHFDVWGDTVNVASRMESTGIAGQIHVSDRVY
> Q02153^.^1^165^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^75.90^0.000e+00^4.000e-17
HKRPVPAKRYDNVTILFSGIVGFNAFCSKHASGEGAMKIVNLLNDLYTRFDTLTDSRKNPFVYKVETVGDKYMTVSGLP
EPCIHHARSICHLALDMMEIAGQVQVDGESVQITIGIHTGEVVTGVIGQRMPRYCLFGNTVNLTSRTETTGEKGKINVS
EYTYRCL
> P46197^.^1^168^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^84.40^0.000e+00^1.000e-19
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAIIDNFDVYKVETIGDAYMVVSGLPGRNGQRHAPEI
ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGQALKIHVSSTTKDAL
DELGCFQLEL
> P40137^.^1^139^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^50.90^0.000e+00^1.000e-09
VTLLFADIRDFTSLSERLRPEQVVTLLNEYYGRMVEVVFRHGGTLDKFIGDALMVYFGAPIADPAHARRGVQCALDMVQ
ELETVNALRSARGEPCLRIGVGVHTGPAVLGNIGSATRRLEYTAIGDTVNLASRIESLTK
> P23466^.^1^154^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^55.90^0.000e+00^4.000e-11
PTGNVAIVFTDIKNSTFLWELFPDAMRAAIKTHNDIMRRQLRIYGGYEVKTEGDAFMVAFPTPTSALVWCLSVQLKLLE
AEWPEEITSIQDGCLITDNSGTKVYLGLSVRMGVHWGCPVPEIDLVTQRMDYLGPVVNKAARVSGVADGGQITLS
> O30820^.^1^149^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^80.20^0.000e+00^2.000e-18
DEASVLFADIVGFTERASSTAPADLVRFLDRLYSAFDELVDQHGLEKIKVSGDSYMVVSGVPRPRPDHTQALADFALDM
TNVAAQLKDPRGNPVPLRVGLATGPVVAGVVGSRRFFYDVWGDAVNVASRMESTDSVGQIQVPDEVYERL

  Directory: hitsred

   This directory contains output files, for example 54894.dhf and
   55074.dhf.

  File: hitsred/54894.dhf

> Q9UX07^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^59.20^0.000e+00^1.00
0e-12
VSKIRNGTVIDHIPAGRALAVLRILGIRGSEGYRVALVMNVESKKIGRKDIVKIEDRVIDEKEASLITLIAPSATINII
RDYVVTEKRHLEVP
> Q9KP65^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^119.00^0.000e+00^9.0
00e-31
VEAIKNGTVIDHIPAKVGIKVLKLFDMHNSAQRVTIGLNLPSSALGSKDLLKIENVFISEAQANKLALYAPHATVNQIE
NYEVVKKLALQLP
> Q9K1K9^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^91.90^0.000e+00^2.00
0e-22
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
NFKVVQKRHLNLP
> Q9JWY6^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^90.40^0.000e+00^5.00
0e-22
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
HFKVVQKRHLNLP
> Q9HKM3^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^71.90^0.000e+00^2.00
0e-16
ISKIRDGTVIDHVPSGKGIRVIGVLGVHEDVNYTVSLAIHVPSNKMGFKDVIKIENRFLDRNELDMISLIAPNATISII
KNYEISEKFQVELP
> Q9HHN3^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^70.40^0.000e+00^4.00
0e-16
VSKIQAGTVIDHIPAGQALQVLQILGTNGASDDQITVGMNVTSERHHRKDIVKIEGRELSQDEVDVLSLIAPDATINIV
RDYEVDEKRRVDRP
> Q97B28^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^71.90^0.000e+00^2.00
0e-16
ISKIKDGTVIDHIPSGKALRVLSILGIRDDVDYTVSVGMHVPSSKMEYKDVIKIENRSLDKNELDMISLTAPNATISII
KNYEISEKFKVELP
> Q970X3^.^1^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^71.10^0.000e+00^3.00
0e-16
VSKIKNGTVIDHIPAGRALAVLRILKIAEGYRIALVMNVESKKMGKKDIVKIENKEVDEKEANLITLIAPTATINIIRD
YEVVEKKKLKIP
> Q8ZTG2^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^58.00^0.000e+00^3.00
0e-12
VSKIENGTVIDHIPAGRALTVLRILGISGKEGLRVALVMNVESKKLGKKDIVKIEGRELTPEEVNIISAVAPTATINII
RNFAVVKKFKVTPP
> Q8ZB38^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^143.00^0.000e+00^3.0
00e-38
VEAIKCGTVIDHIPAQIGFKLLSLFKLTATDQRITIGLNLPSKRSGRKDLIKIENTFLTEQQANQLAMYAPDATVNRID
NYEVVKKLTLSLP
> Q8Z130^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^167.00^0.000e+00^0.0
00e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> Q8U374^.^1^94^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^82.70^0.000e+00^1.00
0e-19
VSAIKEGTVIDHIPAGKGLKVIQILGLGELKNGGAVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IREYKVVEKFKVEIP
> Q8TVB1^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^65.40^0.000e+00^2.00
0e-14
VKRIEMGTVLDHLPPGTAPQIMRILDIDPTETTLLVAINVESSKMGRKDILKIEGKILSEEEANKVALVAPNATVNIVR
DYSVAEKFQVKPP
> Q8THL3^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^66.50^0.000e+00^7.00
0e-15
IQAIENGTVIDHITAGQALNVLRILRISSAFRATVSFVMNAPGARGKKDVVKIEGKELSVEELNRIALISPKATINIIR
DFEVVQKNKVVLP
> Q8PXK6^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^60.70^0.000e+00^4.00
0e-13
VQAIESGTVIDHIKSGQALNVLRILGISSAFRATISFVMNAPGAGGKKDVVKIEGKELSVEELNRIALISPKATINIIR
DFVVVQKNNVVLP
> Q8K9H8^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^135.00^0.000e+00^1.0
00e-35
VEAIKSGSVIDHIPAHIGFKLLSLFRFTETEKRITIGLNLPSQKLDKKDIIKIENTFLSDDQINQLAIYAPCATVNYIE
KYNLVGKIFPSLP
> Q8DCF7^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^116.00^0.000e+00^5.0
00e-30
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q8D1W6^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^113.00^0.000e+00^5.0
00e-29
VEAIFGGTVIDHIPAQVGLKLLSLFKWLHTKERITMGLNLPSNQQKKKDLIKLENVLLNEDQANQLSIYAPLATVNQIK
NYIVIKKQKLKLP
> Q8A9S4^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^57.30^0.000e+00^4.00
0e-12
VAALKNGTVIDHIPSEKLFTVVQLLGVEQMKCNITIGFNLDSKKLGKKGIIKIADKFFCDEEINRISVVAPYVKLNIIR
DYEVVEKKEVRMP
> Q891I9^.^1^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^46.10^0.000e+00^1.00
0e-08
ITSIKDGIVIDHIKSGYGIKIFNYLNLKNVEYSVALIMNVFSSKLGKKDIIKIANKEIDIDFTVLGLIDPTITINIIED
EKIKEKLNLELP
> Q87LF7^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^121.00^0.000e+00^2.0
00e-31
VEAIKNGTVIDHIPAQIGIKVLKLFDMHNSSQRVTIGLNLPSSALGHKDLLKIENVFINEEQASKLALYAPHATVNQIE
NYEVVKKLALELP
> Q83IL8^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^174.00^0.000e+00^0.0
00e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEEQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> Q7P144^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^117.00^0.000e+00^3.0
00e-30
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTEEQANELALFAPKATVNVID
NFEVVKKHKLTLP
> Q7MZ14^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^139.00^0.000e+00^6.0
00e-37
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTEQQANQLAMYAPNATVNCIE
NYEVVKKLPINLP
> Q7MHF0^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^116.00^0.000e+00^5.0
00e-30
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q58801^.^1^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^53.40^0.000e+00^6.00
0e-11
VKKITNGTVIDHIDAGKALMVFKVLNVPKETSVMIAINVPSKKKGKKDILKIEGIELKKEDVDKISLISPDVTINIIRN
GKVVEKLKPQIP
> P96175^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^98.10^0.000e+00^2.00
0e-24
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITKSQANQLALLAPNATVNIIE
NFKVTDKHSLALP
> P77919^.^1^94^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^83.80^0.000e+00^4.00
0e-20
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFLSEEEVNKIALVAPNATVNI
IRDYKVVEKFKVEVP
> P74766^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^67.30^0.000e+00^4.00
0e-15
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEISDTEANLITLIAPTATINIV
REYEVVKKTKLEVP
> P57451^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^131.00^0.000e+00^2.0
00e-34
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSDEQINQLAIYAPHATVNYIN
EYNLVRKVFPTLP
> P19936^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^147.00^0.000e+00^0.0
00e+00
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTEQQANQLAMYAPKATVNRID
NYEVVRKLTLSLP
> P08421^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^168.00^0.000e+00^0.0
00e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTEEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> P00478^.^1^92^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^175.00^0.000e+00^0.0
00e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEDQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> O58452^.^1^94^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^85.40^0.000e+00^2.00
0e-20
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IRNYKVVEKFKVEVP
> O30129^.^1^93^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^70.40^0.000e+00^5.00
0e-16
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIRDEELNKIALISPNATINLI
RDYEIERKFKVSPP
> O26938^.^1^91^SCOP^.^54894^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like^
Aspartate carbamoyltransferase, Regulatory-chain, N-terminal domain^Aspartate c
arbamoyltransferase, Regulatory-chain, N-terminal domain^.^73.80^0.000e+00^4.00
0e-17
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKPSEVDQIALIAPRATINIVR
DYKIVEKAKVRL

  File: hitsred/55074.dhf

> Q9WVI4^.^1^149^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^84.40^0.000e+00^1.000e-19
DDVTMLFSDIVGFTAICAQCTPMQVISMLNELYTRFDHQCGFLDIYKVETIGDAYCVASGLHRKSLCHAKPIALMALKM
MELSEEVLTPDGRPIQMRIGIHSGSVLAGVVGVRMPRYCLFGNNVTLASKFESGSHPRRINISPTTYQLL
> Q9ERL9^.^1^152^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^72.10^0.000e+00^5.000e-16
VTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHRESDTHAVQIALMALKMME
LSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDCPG
> Q9DGG6^.^1^181^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^142.00^0.000e+00^4.000e-37
EQVSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEDTKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGM
IKAIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDG
KVTERVGQSAVADQLKGLKTYLI
> Q99396^.^1^212^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^192.00^0.000e+00^0.000e+00
KELADPVTLIFTDIESSTAQWATQPELMPDAVATHHSMVRSLIENYDCYEVKTVGDSFMIACKSPFAAVQLAQELQLRF
LRLDWGTTVFDEFYREFEERHAEEGDGKYKPPTARLDPEVYRQLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGQTANT
AARTESVGNGGQVLMTCETYHSLSTAERSQFDVTPLGGVPLRGVSEPVEVYQLN
> Q99280^.^1^216^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^218.00^0.000e+00^0.000e+00
NDSAPKEPTGPVTLIFTDIESSTALWAAHPDLMPDAVATHHRLIRSLITRYECYEVKTVGDSFMIASKSPFAAVQLAQE
LQLRFLRLDWETNALDESYREFEEQRAEGECEYTPPTAHMDPEVYSRLWNGLRVRVGIHTGLCDIRYDEVTKGYDYYGR
TSNMAARTESVANGGQVLMTHAAYMSLSGEDRNQLDVTTLGATVLRGVPEPVRMYQLN
> Q99279^.^1^218^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^258.00^0.000e+00^0.000e+00
NNNRAPKEPTDPVTLIFTDIESSTALWAAHPDLMPDAVAAHHRMVRSLIGRYKCYEVKTVGDSFMIASKSPFAAVQLAQ
ELQLCFLHHDWGTNALDDSYREFEEQRAEGECEYTPPTAHMDPEVYSRLWNGLRVRVGIHTGLCDIIRHDEVTKGYDYY
GRTPNMAARTESVANGGQVLMTHAAYMSLSAEDRKQIDVTALGDVALRGVSDPVKMYQLN
> Q91WF3^.^1^165^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^54.30^0.000e+00^1.000e-10
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATSGQDTQQDSE
RSCSHLGTMVEFAVALGSKLGVINKHSFNNFRLRVGLNHGPVVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVT
EETARAL
> Q91WF3^.^1^158^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^160.00^0.000e+00^0.000e+00
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKILGDCYYCVSGLPLSLPDHAI
NCVRMGLDMCRAIRKLRVATGVDINMRVGVHSGSVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q8VHH7^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^178.00^0.000e+00^0.000e+00
FNTMYMYRHENVSILFADIVGFTQLSSACSAQELVKLLNELFARFDKLAAKYHQLRIKILGDCYYCICGLPDYREDHAV
CSILMGLAMVEAISYVREKTKTGVDMRVGVHTGTVLGGVLGQKRWQYDVWSTDVTVANKMEAGGIPGRVHISQSTMDCL
KGEFDVEPGDGGSRCDYLDEKGIETYLI
> Q8NFM4^.^1^161^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^54.70^0.000e+00^9.000e-11
VCVLFASVPDFKEFYSESNINHEGLECLRLLNEIIADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATSGQDAQQDAE
RSCSHLGTMVEFAVALGSKLDVINKHSFNNFRLRVGLNHGPVVAGVIGAQKPQYDIWGNTVNVASRMESTGVLGKIQVT
EET
> Q8NFM4^.^1^158^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^160.00^0.000e+00^0.000e+00
FHSLYVKRHQGVSVLYADIVGFTRLASECSPKELVLMLNELFGKFDQIAKEHECMRIKILGDCYYCVSGLPLSLPDHAI
NCVRMGLDMCRAIRKLRAATGVDINMRVGVHSGSVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHITGATLALL
> Q29450^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^179.00^0.000e+00^0.000e+00
FHNLYVKRHQNVSILYADIVGFTRLASDCSPKELVVVLNELFGKFDQIAKANECMRIKILGDCYYCVSGLPVSLPNHAR
NCVKMGLDMCEAIKQVREATGVDISMRVGIHSGNVLCGVIGLRKWQYDVWSHDVSLANRMEAAGVPGRVHITEATLKHL
DKAYEVEDGHGQQRDPYLKEMNIRTYLV
> Q27675^.^1^217^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^167.00^0.000e+00^0.000e+00
NNDAAPKDGDEPVTLLFTDIESSTALWAALPQLMSDAIAAHHRVIRQLVKKYGCYEVKTIGDSFMIACRSAHSAVSLAC
EIQTKLLKHDWGTEALDRAYREFELARVDTLDDYEPPTARLSEEEYAALWCGLRVRVGIHTGLTDIRYDEVTKGYDYYG
DTSNMAARTEAVANGGQVVATEAAWWALSNDERAGIAHTAMGPQGLRGVPFAVEMFQLN
> Q26896^.^1^216^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^195.00^0.000e+00^0.000e+00
NDSAPKEFTDPVTLIFTDIESSTALWAAHPGMMADAVATHHRLIRSLIALYGAYEVKTVGDSFMIACRSAFAAVELARD
LQLTLVHHDWGTVAIDESYRKFEEERAVEDSDYAPPTARLDSAVYCKLWNGLRVRAGIHTGLCDIAHDEVTKGYDYYGR
TPNLAARTESAANGGQVLVTGATYYSLSVAERARLDATPIGPVPLRGVPEPVEMYQLN
> Q26721^.^1^206^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^223.00^0.000e+00^0.000e+00
PVTLIFTDIESSTALWAAHPEVMPDAVATHHRLIRTLISKYECYEVKTVGDSFMIASKSPFAAVQLAQELQLCFLHHDW
GTNAIDESYQQFEQQRAEDDSDYTPPTARLDPKVYSRLWNGLRVRVGIHTGLCDIRRDEVTKGYDYYGRTSNMAARTES
VANGGQVLMTHAAYMSLSAEERQQIDVTALGDVPLRGVPKPVEMYRLN
> Q25263^.^1^217^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^166.00^0.000e+00^0.000e+00
NNDAAPKDGDEPVTLLFTDIESSTALWAALPQLMSDAIAAHHRVIRQLVKKYGCYEVKTIGDSFMIACRSAHSAVSLAC
EIQTKLLKHDWGTEALDRAYREFELARVDTLDDYEPPTARLSEEEYAALWCGLRVRVGIHTGLTDIRYDEVTRGYDYYG
DTSNMAARTEAVANGGQVVATEAAWWALSNDERAGIAHTAMGPQGLRGVPFAVEMFQLN
> Q09435^.^1^161^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^79.80^0.000e+00^2.000e-18
DSVTVFFSDVVKFTILASKCSPFQTVNLLNDLYSNFDTIIEQHGVYKVESIGDGYLCVSGLPTRNGYAHIKQIVDMSLK
FMEYCKSFNIPHLPRENVELRIGVNSGPCVAGVVGLSMPRYCLFGDTVNTASRMESNGKPSLIHLTNDAHSLLTTHYPN
QYE
> Q08828^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^217.00^0.000e+00^0.000e+00
FHKIYIQRHDNVSILFADIVGFTGLASQCTAQELVKLLNELFGKFDELATENHCRRIKILGDCYYCVSGLTQPKTDHAH
CCVEMGLDMIDTITSVAEATEVDLNMRVGLHTGRVLCGVLGLRKWQYDVWSNDVTLANVMEAAGLPGKVHITKTTLACL
NGDYEVEPGYGHERNSFLKTHNIETFFI
> Q08462^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^180.00^0.000e+00^0.000e+00
FHNLYVKRHTNVSILYADIVGFTRLASDCSPGELVHMLNELFGKFDQIAKENECMRIKILGDCYYCVSGLPISLPNHAK
NCVKMGLDMCEAIKKVRDATGVDINMRVGVHSGNVLCGVIGLQKWQYDVWSHDVTLANHMEAGGVPGRVHISSVTLEHL
NGAYKVEEGDGDIRDPYLKQHLVKTYFV
> Q07553^.^1^152^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^82.50^0.000e+00^4.000e-19
DCVTILFSDIVGFTELCTTSTPFEVVEMLNDWYTCCDSIISNYDVYKVETIGDAYMVVSGLPLQNGSRHAGEIASLALH
LLETVGNLKIRHKPTETVQLRIGVHSGPCAAGVVGQKMPRYCLFGDTVNTASRMESTGDSMRIHISEATYQLL
> Q07093^.^1^158^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^65.50^0.000e+00^5.000e-14
VTILFSDIVGFTSICSRATPFMVISMLEGLYKDFDEFCDFFDVYKVETIGDAYCVASGLHRASIYDAHRCLDGLKMIDA
CSKHITHDGEQIKMRIGLHTGTVLAGVVGRKMPRYCLFGHSVTIANKFESGSEALKINVSPTTKDWLTKHEGFEFELQP
> Q04400^.^1^189^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^299.00^0.000e+00^0.000e+00
MMFHKIYIQKHDNVSILFADIEGFTSLASQCTAQELVMTLNELFARFDKLAAENHCLRIKILGDCYYCVSGLPEARADH
AHCCVEMGMDMIEAISSVREVTGVNVNMRVGIHSGRVHCGVLGLRKWQFDVWSNDVTLANHMEAGGKAGRIHITKATLN
YLNGDYEVEPGCGGERNAYLKEHSIETFLIL
> Q04400^.^1^159^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^55.10^0.000e+00^7.000e-11
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKAGKTHI
KALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQV
L
> Q03343^.^1^189^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^285.00^0.000e+00^0.000e+00
MMFHKIYIQKHDNVSILFADIEGFTSLASQCTAQELVMTLNELFARFDKLAAENHCLRIKILGDCYYCVSGLPEARADH
AHCCVEMGVDMIEAISLVREVTGVNVNMRVGIHSGRVHCGVLGLRKWQFDVWSNDVTLANHMEAGGRAGRIHITRATLQ
YLNGDYEVEPGRGGERNGYLKEQCIETFLIL
> Q03343^.^1^159^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^55.90^0.000e+00^3.000e-11
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVGRSHI
TALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQV
L


  [Part of this file has been deleted for brevity]

DCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEIIADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAIPSQEHAQE
PERQYMHIGTMVEFAYALVGKLDAINKHSFNDFKLRVGINHGPVIAGVIGAQKPQYDIWGNTVNVASRMDSTGVLDKIQ
VTEET
> P26338^.^1^216^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^233.00^0.000e+00^0.000e+00
NNLAPKELTDPVTLIFTDIESSTALWAAHPELMPDAVATHHRLIRSLIGRYGCYEVKTVGDSFMIASKSPFAAVQLAQE
LQLCFLHHDWGTNAIDESYQQLEQQRAEEDAKYTPPTARLDLKVYSRLWNGLRVRVGIHTGLCDIRRDEVTKGYDYYGR
TSNMAARTESVGNGGQVLMTTAAYMSLSAEEREQIDVTALGDVPLRGVAKPVEMYQLN
> P25092^.^1^150^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^71.70^0.000e+00^6.000e-16
VTIYFSDIVGFTTICKYSTPMEVVDMLNDIYKSFDHIVDHHDVYKVETIGDAYMVASGLPKRNGNRHAIDIAKMALEIL
SFMGTFELEHLPGLPIWIRIGVHSGPCAAGVVGIKMPRYCLFGDTVNTASRMESTGLPLRIHVSGSTIAIL
> P23897^.^1^150^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^71.70^0.000e+00^8.000e-16
VTIYFSDIVGFTTICKYSTPMEVVDMLNDIYKSFDQIVDHHDVYKVETIGDAYVVASGLPMRNGNRHAVDISKMALDIL
SFMGTFELEHLPGLPVWIRIGVHSGPCAAGVVGIKMPRYCLFGDTVNTASRMESTGLPLRIHMSSSTIAIL
> P22717^.^1^147^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^63.20^0.000e+00^3.000e-13
TILFSDVVTFTNICAACEPIQIVNMLNSMYSKFDRLTSVHDVYKVETIGDAYMVVGGVPVPVESHAQRVANFALGMRIS
AKEVMNPVTGEPIQIRVGIHTGPVLAGVVGDKMPRYCLFGDTVNTASRMESHGLPSKVHLSPTAHRAL
> P21932^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^178.00^0.000e+00^0.000e+00
FNTMYMYRHENVSILFADIVGFTQLSSACSAQELVKLLNELFARFDKLAAKYHQLRIKILGDCYYCICGLPDYREDHAV
CSILMGLAMVEAISYVREKTKTGVDMRVGVHTGTVLGGVLGQKRWQYDVWSTDVTVANKMEAGGIPGRVHISQSTMDCL
KGEFDVEPGDGGSRCDYLDEKGIETYLI
> P20595^.^1^165^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^75.90^0.000e+00^4.000e-17
HKRPVPAKRYDNVTILFSGIVGFNAFCSKHASGEGAMKIVNLLNDLYTRFDTLTDSRKNPFVYKVETVGDKYMTVSGLP
EPCIHHARSICHLALDMMEIAGQVQVDGESVQITIGIHTGEVVTGVIGQRMPRYCLFGNTVNLTSRTETTGEKGKINVS
EYTYRCL
> P20594^.^1^168^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^84.40^0.000e+00^1.000e-19
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAIIDNFDVYKVETIGDAYMVVSGLPGRNGQRHAPEI
ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGQALKIHVSSTTKDAL
DELGCFQLEL
> P19754^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^217.00^0.000e+00^0.000e+00
FHKIYIQRHDNVSILFADIVGFTGLASQCTAQELVKLLNELFGKFDELATENHCRRIKILGDCYYCVSGLTQPKTDHAH
CCVEMGLDMIDTITSVAEATEVDLNMRVGLHTGRVLCGVLGLRKWQYDVWSNDVTLANVMEAAGLPGKVHITKTTLACL
NGDYEVEPGHGHERNSFLKTHNIETFFI
> P19687^.^1^161^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^77.10^0.000e+00^1.000e-17
AVQAKRFGNVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDRQCGELDVYKVETIGDAYCVAGGLHKESDTHAVQI
ALMALKMMELSHEVVSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKD
CPG
> P19686^.^1^160^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^72.50^0.000e+00^4.000e-16
VQAKKFNEVTMLFSDIVGFTAICSQCSPLQVITMLNALYTRFDQQCGELDVYKVETIGDAYCVAGGLHRESDTHAVQIA
LMALKMMELSNEVMSPHGEPIKMRIGLHSGSVFAGVVGVKMPRYCLFGNNVTLANKFESCSVPRKINVSPTTYRLLKDC
PG
> P18910^.^1^168^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^84.40^0.000e+00^1.000e-19
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGQLHAREV
ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALKIHLSSETKAVL
EEFDGFELEL
> P18293^.^1^168^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^85.90^0.000e+00^4.000e-20
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGQLHAREV
ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALRIHLSSETKAVL
EEFDGFELEL
> P16068^.^1^165^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^75.90^0.000e+00^4.000e-17
HKRPVPAKRYDNVTILFSGIVGFNAFCSKHASGEGAMKIVNLLNDLYTRFDTLTDSRKNPFVYKVETVGDKYMTVSGLP
EPCIHHARSICHLALDMMEIAGQVQVDGESVQITIGIHTGEVVTGVIGQRMPRYCLFGNTVNLTSRTETTGEKGKINVS
EYTYRCL
> P16067^.^1^168^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^84.40^0.000e+00^1.000e-19
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAIIDNFDVYKVETIGDAYMVVSGLPGRNGQRHAPEI
ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGQALKIHVSSTTKDAL
DELGCFQLEL
> P16066^.^1^168^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^84.80^0.000e+00^7.000e-20
VQAEAFDSVTIYFSDIVGFTALSAESTPMQVVTLLNDLYTCFDAVIDNFDVYKVETIGDAYMVVSGLPVRNGRLHACEV
ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVVGLKMPRYCLFGDTVNTASRMESNGEALKIHLSSETKAVL
EEFGGFELEL
> P16065^.^1^143^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^82.10^0.000e+00^5.000e-19
VSIFFSDIVGFTALSAASTPIQVVNLLNDLYTLFDAIISNYDVYKVETIGDAYMLVSGLPLRNGDRHAGQIASTAHHLL
ESVKGFIVPHKPEVFLKLRIGIHSGSCVAGVVGLTMPRYCLFGDTVNTASRMESNGLALRIHVS
> O95622^.^1^189^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^301.00^0.000e+00^0.000e+00
MMFHKIYIQKHDNVSILFADIEGFTSLASQCTAQELVMTLNELFARFDKLAAENHCLRIKILGDCYYCVSGLPEARADH
AHCCVEMGMDMIEAISLVREVTGVNVNMRVGIHSGRVHCGVLGLRKWQFDVWSNDVTLANHMEAGGKAGRIHITKATLN
YLNGDYEVEPGCGGERNAYLKEHSIETFLIL
> O95622^.^1^159^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^55.10^0.000e+00^7.000e-11
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVGKTHI
KALADFAMKLMDQMKYINEHSFNNFQMKIGLNIGPVVAGVIGARKPQYDIWGNTVNVASRMDSTGVPDRIQVTTDMYQV
L
> O75343^.^1^147^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^68.20^0.000e+00^8.000e-15
TILFSDVVTFTNICTACEPIQIVNVLNSMYSKFDRLTSVHAVYKVETIGDAYMVVGGVPVPIGNHAQRVANFALGMRIS
AKEVTNPVTGEPIQLRVGIHTGPVLADVVGDKMPRYCLFGDTVNTASRMESHGLPNKVHLSPTAYRAL
> O60503^.^1^179^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^143.00^0.000e+00^2.000e-37
VSILFADIVGFTKMSANKSAHALVGLLNDLFGRFDRLCEETKCEKISTLGDCYYCVAGCPEPRADHAYCCIEMGLGMIK
AIEQFCQEKKEMVNMRVGVHTGTVLCGILGMRRFKFDVWSNDVNLANLMEQLGVAGKVHISEATAKYLDDRYEMEDGKV
IERLGQSVVADQLKGLKTYLI
> O60266^.^1^186^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^179.00^0.000e+00^0.000e+00
FNTMYMYRHENVSILFADIVGFTQLSSACSAQELVKLLNELFARFDKLAAKYHQLRIKILGDCYYCICGLPDYREDHAV
CSILMGLAMVEAISYVREKTKTGVDMRVGVHTGTVLGGVLGQKRWQYDVWSTDVTVANKMEAGGIPGRVHISQSTMDCL
KGEFDVEPGDGGSRCDYLEEKGIETYLI
> O43306^.^1^189^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^287.00^0.000e+00^0.000e+00
MMFHKIYIQKHDNVSILFADIEGFTSLASQCTAQELVMTLNELFARFDKLAAENHCLRIKILGDCYYCVSGLPEARADH
AHCCVEMGVDMIEAISLVREVTGVNVNMRVGIHSGRVHCGVLGLRKWQFDVWSNDVTLANHMEAGGRAGRIHITRATLQ
YLNGDYEVEPGRGGERNAYLKEQHIETFLIL
> O43306^.^1^159^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^55.90^0.000e+00^3.000e-11
VAVMFASIANFSEFYVELEANNEGVECLRLLNEIIADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVGRSHI
TALADYAMRLMEQMKHINEHSFNNFQMKIGLNMGPVVAGVIGARKPQYDIWGNTVNVSSRMDSTGVPDRIQVTTDLYQV
L
> O19179^.^1^161^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^82.50^0.000e+00^4.000e-19
VTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPQRNGQRHAAEIANMALDIL
SAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVNMSTVRILHALDEGFQ
TEV
> O02740^.^1^162^SCOP^.^55074^Alpha and beta proteins (a+b)^.^.^Ferredoxin-like
^Adenylyl and guanylyl cyclase catalytic domain^Adenylyl and guanylyl cyclase c
atalytic domain^.^85.20^0.000e+00^7.000e-20
DLVTLYFSDIVGFTTISAMSEPIEVVDLLNDLYTLFDAIIGSHDVYKVETIGDAYMVASGLPKRNGMRHAAEIANMSLD
ILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSHSTVTILRTLGEG
YEVE

5.0 DATA FILES

   SEQNR requires a residue substitution matrix.

6.0 USAGE

  6.1 COMMAND LINE ARGUMENTS

   Standard (Mandatory) qualifiers (* if not always prompted):
  [-dhfinpath]         dirlist    This option specifies the location of DHF
                                  files (domain hits files) (input). A 'domain
                                  hits file' contains database hits
                                  (sequences) with domain classification
                                  information, in the DHF format (FASTA or
                                  EMBL-like). The hits are relatives to a SCOP
                                  or CATH family and are found from a search
                                  of a sequence database. Files containing
                                  hits retrieved by PSIBLAST are generated by
                                  using SEQSEARCH.
   -[no]dosing         toggle     This option specifies whether to use singlet
                                  sequences (e.g. DHF files) to filter input.
                                  Optionally, up to two further directories
                                  of sequences may be read: these are
                                  considered in the redundancy calculation but
                                  never appear in the output files.
*  -singletsdir        directory  This option specifies the location of
                                  singlet filter sequences (e.g. DHF files)
                                  (input). A 'domain hits file' contains
                                  database hits (sequences) with domain
                                  classification information, in the DHF
                                  format (FASTA or EMBL-like). The hits are
                                  relatives to a SCOP or CATH family and are
                                  found from a search of a sequence database.
                                  Files containing hits retrieved by PSIBLAST
                                  are generated by using SEQSEARCH.
   -[no]dosets         toggle     This option specifies whether to use sets of
                                  sequences (e.g. DHF files) to filter input.
                                  Optionally, up to two further directories
                                  of sequences may be read: these are
                                  considered in the redundancy calculation but
                                  never appear in the output files.
*  -insetsdir          directory  This option specifies location of sets of
                                  filter sequences (e.g. DAF files) (input). A
                                  'domain alignment file' contains a sequence
                                  alignment of domains belonging to the same
                                  SCOP or CATH family. The file is in clustal
                                  format annotated with domain family
                                  classification information. The files
                                  generated by using SCOPALIGN will contain a
                                  structure-based sequence alignment of
                                  domains of known structure only. Such
                                  alignments can be extended with sequence
                                  relatives (of unknown structure) by using
                                  SEQALIGN.
   -mode               menu       This option specifies whether to remove
                                  redundancy at a single threshold % sequence
                                  similarity or remove redundancy outside a
                                  range of acceptable threshold % similarity.
                                  All permutations of pair-wise sequence
                                  alignments are calculated for each set of
                                  input sequences in turn using the EMBOSS
                                  implementation of the Needleman and Wunsch
                                  global alignment algorithm. Redundant
                                  sequences are removed in one of two modes as
                                  follows: (i) If a pair of proteins achieve
                                  greater than a threshold percentage sequence
                                  similarity (specified by the user) the
                                  shortest sequence is discarded. (ii) If a
                                  pair of proteins have a percentage sequence
                                  similarity that lies outside an acceptable
                                  range (specified by the user) the shortest
                                  sequence is discarded.
*  -thresh             float      This option specifies the % sequence
                                  identity redundancy threshold. The %
                                  sequence identity redundancy threshold
                                  determines the redundancy calculation. If a
                                  pair of proteins achieve greater than this
                                  threshold the shortest sequence is
                                  discarded.
*  -threshlow          float      This option specifies the % sequence
                                  identity redundancy threshold (lower limit).
                                  The % sequence identity redundancy
                                  threshold determines the redundancy
                                  calculation. If a pair of proteins have a
                                  percentage sequence similarity that lies
                                  outside an acceptable range the shortest
                                  sequence is discarded.
*  -threshup           float      This option specifies the % sequence
                                  identity redundancy threshold (upper limit).
                                  The % sequence identity redundancy
                                  threshold determines the redundancy
                                  calculation. If a pair of proteins have a
                                  percentage sequence similarity that lies
                                  outside an acceptable range the shortest
                                  sequence is discarded.
  [-dhfoutdir]         outdir     This option specifies the location of DHF
                                  files (domain hits files) of non-redundant
                                  sequences (output). A 'domain hits file'
                                  contains database hits (sequences) with
                                  domain classification information, in the
                                  DHF format (FASTA or EMBL-like). The hits
                                  are relatives to a SCOP or CATH family and
                                  are found from a search of a sequence
                                  database. Files containing hits retrieved by
                                  PSIBLAST are generated by using SEQSEARCH.
   -dored              toggle     This option specifies whether to retain
                                  redundant sequences. If this option is set a
                                  DHF file (domain hits file) of redundant
                                  sequences is written.
*  -redoutdir          outdir     This option specifies the location of DHF
                                  files (domain hits files) of redundant
                                  sequences (output). A 'domain hits file'
                                  contains database hits (sequences) with
                                  domain classification information, in the
                                  DHF format (FASTA or EMBL-like). The hits
                                  are relatives to a SCOP or CATH family and
                                  are found from a search of a sequence
                                  database. Files containing hits retrieved by
                                  PSIBLAST are generated by using SEQSEARCH.
   -logfile            outfile    This option specifies the name of SEQNR log
                                  file (output). The log file contains
                                  messages about any errors arising while
                                  SEQNR ran.

   Additional (Optional) qualifiers:
   -matrix             matrixf    This option specifies the residue
                                  substitution matrix that is used for
                                  sequence comparison.
   -gapopen            float      This option specifies the gap insertion
                                  penalty. The gap insertion penalty is the
                                  score taken away when a gap is created. The
                                  best value depends on the choice of
                                  comparison matrix. The default value assumes
                                  you are using the EBLOSUM62 matrix for
                                  protein sequences, and the EDNAFULL matrix
                                  for nucleotide sequences.
   -gapextend          float      This option specifies the gap extension
                                  penalty. The gap extension, penalty is added
                                  to the standard gap penalty for each base
                                  or residue in the gap. This is how long gaps
                                  are penalized. Usually you will expect a
                                  few long gaps rather than many short gaps,
                                  so the gap extension penalty should be lower
                                  than the gap penalty.

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-logfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths


   Standard (Mandatory) qualifiers Allowed values Default
   [-dhfinpath]
   (Parameter 1) This option specifies the location of DHF files (domain
   hits files) (input). A 'domain hits file' contains database hits
   (sequences) with domain classification information, in the DHF format
   (FASTA or EMBL-like). The hits are relatives to a SCOP or CATH family
   and are found from a search of a sequence database. Files containing
   hits retrieved by PSIBLAST are generated by using SEQSEARCH. Directory
   with files ./
   -[no]dosing This option specifies whether to use singlet sequences
   (e.g. DHF files) to filter input. Optionally, up to two further
   directories of sequences may be read: these are considered in the
   redundancy calculation but never appear in the output files. Toggle
   value Yes/No Yes
   -singletsdir This option specifies the location of singlet filter
   sequences (e.g. DHF files) (input). A 'domain hits file' contains
   database hits (sequences) with domain classification information, in
   the DHF format (FASTA or EMBL-like). The hits are relatives to a SCOP
   or CATH family and are found from a search of a sequence database.
   Files containing hits retrieved by PSIBLAST are generated by using
   SEQSEARCH. Directory ./
   -[no]dosets This option specifies whether to use sets of sequences
   (e.g. DHF files) to filter input. Optionally, up to two further
   directories of sequences may be read: these are considered in the
   redundancy calculation but never appear in the output files. Toggle
   value Yes/No Yes
   -insetsdir This option specifies location of sets of filter sequences
   (e.g. DAF files) (input). A 'domain alignment file' contains a
   sequence alignment of domains belonging to the same SCOP or CATH
   family. The file is in clustal format annotated with domain family
   classification information. The files generated by using SCOPALIGN
   will contain a structure-based sequence alignment of domains of known
   structure only. Such alignments can be extended with sequence
   relatives (of unknown structure) by using SEQALIGN. Directory ./
   -mode This option specifies whether to remove redundancy at a single
   threshold % sequence similarity or remove redundancy outside a range
   of acceptable threshold % similarity. All permutations of pair-wise
   sequence alignments are calculated for each set of input sequences in
   turn using the EMBOSS implementation of the Needleman and Wunsch
   global alignment algorithm. Redundant sequences are removed in one of
   two modes as follows: (i) If a pair of proteins achieve greater than a
   threshold percentage sequence similarity (specified by the user) the
   shortest sequence is discarded. (ii) If a pair of proteins have a
   percentage sequence similarity that lies outside an acceptable range
   (specified by the user) the shortest sequence is discarded.
   1 (Remove redundancy at a single threshold % sequence similarity)
   2 (Remove redundancy outside a range of acceptable threshold %
   similarity)
   1
   -thresh This option specifies the % sequence identity redundancy
   threshold. The % sequence identity redundancy threshold determines the
   redundancy calculation. If a pair of proteins achieve greater than
   this threshold the shortest sequence is discarded. Any numeric value
   95.0
   -threshlow This option specifies the % sequence identity redundancy
   threshold (lower limit). The % sequence identity redundancy threshold
   determines the redundancy calculation. If a pair of proteins have a
   percentage sequence similarity that lies outside an acceptable range
   the shortest sequence is discarded. Any numeric value 30.0
   -threshup This option specifies the % sequence identity redundancy
   threshold (upper limit). The % sequence identity redundancy threshold
   determines the redundancy calculation. If a pair of proteins have a
   percentage sequence similarity that lies outside an acceptable range
   the shortest sequence is discarded. Any numeric value 90.0
   [-dhfoutdir]
   (Parameter 2) This option specifies the location of DHF files (domain
   hits files) of non-redundant sequences (output). A 'domain hits file'
   contains database hits (sequences) with domain classification
   information, in the DHF format (FASTA or EMBL-like). The hits are
   relatives to a SCOP or CATH family and are found from a search of a
   sequence database. Files containing hits retrieved by PSIBLAST are
   generated by using SEQSEARCH. Output directory ./
   -dored This option specifies whether to retain redundant sequences. If
   this option is set a DHF file (domain hits file) of redundant
   sequences is written. Toggle value Yes/No No
   -redoutdir This option specifies the location of DHF files (domain
   hits files) of redundant sequences (output). A 'domain hits file'
   contains database hits (sequences) with domain classification
   information, in the DHF format (FASTA or EMBL-like). The hits are
   relatives to a SCOP or CATH family and are found from a search of a
   sequence database. Files containing hits retrieved by PSIBLAST are
   generated by using SEQSEARCH. Output directory ./
   -logfile This option specifies the name of SEQNR log file (output).
   The log file contains messages about any errors arising while SEQNR
   ran. Output file seqnr.log
   Additional (Optional) qualifiers Allowed values Default
   -matrix This option specifies the residue substitution matrix that is
   used for sequence comparison. Comparison matrix file in EMBOSS data
   path EBLOSUM62
   -gapopen This option specifies the gap insertion penalty. The gap
   insertion penalty is the score taken away when a gap is created. The
   best value depends on the choice of comparison matrix. The default
   value assumes you are using the EBLOSUM62 matrix for protein
   sequences, and the EDNAFULL matrix for nucleotide sequences. Floating
   point number from 1.0 to 100.0 10.0 for any sequence
   -gapextend This option specifies the gap extension penalty. The gap
   extension, penalty is added to the standard gap penalty for each base
   or residue in the gap. This is how long gaps are penalized. Usually
   you will expect a few long gaps rather than many short gaps, so the
   gap extension penalty should be lower than the gap penalty. Floating
   point number from 0.0 to 10.0 0.5 for any sequence
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

  6.2 EXAMPLE SESSION

   An example of interactive use of SEQNR is shown below. Here is a
   sample session with seqnr


% seqnr 
Removes redundancy from DHF files.
Location of DHF files (domain hits files) (input). [./]: ../seqfraggle-keep
Use singlet sequences (e.g. DHF files) to filter input. [Y]: N
Use sets of sequences (e.g. DHF files) to filter input. [Y]: Y
Location of sets of filter sequences (e.g. DAF files) (input). [./]: ../domaina
lign-keep/daf
Redundancy removal options
         1 : Remove redundancy at a single threshold % sequence similarity
         2 : Remove redundancy outside a range of acceptable threshold % simila
rity
Select number. [1]: 1
The % sequence identity redundancy threshold. [95.0]: 70
Location of DHF files (domain hits files) of non-redundant sequences (output).
[./]: hitsnr
Retain redundant sequences. [N]: Y
Location of DHF files (domain hits files) of redundant sequences (output). [./]
: hitsred
Name seqnr log file (output). [seqnr.log]: 

Processing /ebi/services/idata/pmr/hgmp/test/qa/seqfraggle-keep/55074.dhf
Processing /ebi/services/idata/pmr/hgmp/test/qa/seqfraggle-keep/54894.dhf

   Go to the output files for this example

7.0 KNOWN BUGS & WARNINGS

   None.

8.0 NOTES

   None.

  8.1 GLOSSARY OF FILE TYPES

   FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
   Domain hits file DHF format (FASTA-like). Database hits (sequences)
   with domain classification information. The hits are relatives to a
   SCOP or CATH family (or other node in the structural hierarchies) and
   are found from a search of a discriminating element (e.g. a protein
   signature, hidden Markov model, simple frequency matrix, Gribskov
   profile or Hennikoff profile) against a sequence database. SEQSEARCH
   (hits retrieved by PSIBLAST). SIGSCAN (hits retrieved by sparse
   protein signature). LIBSCAN (hits retrieved by various types of HMM
   and profile). N.A.
   Domain alignment file DAF format (CLUSTAL-like format with domain
   classification information). Contains a sequence alignment of domains
   belonging to the same SCOP or CATH family. The file is annotated with
   domain family classification information. DOMAINALIGN (structure-based
   sequence alignment of domains of known structure). DOMAINALIGN
   alignments can be extended with sequence relatives (of unknown
   structure) to the family in question by using SEQALIGN.

   None

9.0 DESCRIPTION

   Redundancy in a database or other collection of sequences occurs when
   one or more similar sequences are present. The inclusion of very
   similar sequences in certain analyses will introduce undesirable bias.
   For example, a family may possess 100 sequences in the sequence
   database, but 90 of these might be essentially the same sequence, e.g.
   very close relatives or mutations of a single sequence. Although 100
   sequences are known, the family only contains 11 sequences that are
   essentially unique. For many applications it is desirable or even
   essential to remove redundant sequences from a set in order to produce
   a smaller set that is representative of the whole. SEQNR removes
   redundancy from an input file of sequences, either at a single
   threshold of sequence similiarty (e.g. 40%) or within a threshold
   range of sequence similiarty (e.g. 40% - 70%).

10.0 ALGORITHM

   Redundancy is calculated for each DHF file in the input directory in
   turn. The procedure is as follows. 1. Create a list of sequences from
   the main input directory. 2. Add sequences from both filter
   directories (if specified) to list but mark them up (they are
   considered in the redundancy calculation but never appear in the
   output files). 3. Identify redundant domains. 4. Write non-redundant
   domains to main output directory. 5. If specified, write redundant
   domains to output directory.

11.0 RELATED APPLICATIONS

See also

    Program name                        Description
   aaindexextract Extract data from AAINDEX
   allversusall   Sequence similarity data from all-versus-all comparison
   cathparse      Generates DCF file from raw CATH files
   cutgextract    Extract data from CUTG
   domainer       Generates domain CCF files from protein CCF files
   domainnr       Removes redundant domains from a DCF file
   domainseqs     Adds sequence records to a DCF file
   domainsse      Add secondary structure records to a DCF file
   hetparse       Converts heterogen group dictionary to EMBL-like format
   pdbparse       Parses PDB files and writes protein CCF files
   pdbplus        Add accessibility & secondary structure to a CCF file
   pdbtosp        Convert swissprot:PDB codes file to EMBL-like format
   printsextract  Extract data from PRINTS
   prosextract    Build the PROSITE motif database for use by patmatmotifs
   rebaseextract  Extract data from REBASE
   scopparse      Generate DCF file from raw SCOP files
   sites          Generate residue-ligand CON files from CCF files
   ssematch       Search a DCF file for secondary structure matches
   tfextract      Extract data from TRANSFAC

12.0 DIAGNOSTIC ERROR MESSAGES

   The log file might contain the following messages:
   embHitlistReadFasta call failed in seqnr (By default SEQNR expects a
   domain hits file in the main input directory and the first (singlets)
   filter directory. This message is given if some other type of sequence
   file is given. Its not really an error, just a warning).
   Empty input file my.file (This is given if the main input file
   'my.file' did not contain any sequences. No output file is generated
   for this file.)
   Empty singlets filter file my.file (This is given if the singlets
   filter file 'my.file' did not contain any sequences. An output file is
   still generated for this file.)
   Empty sets filter file my.file (This is given if the sets filter file
   'my.file' did not contain any sequences. An output file is still
   generated for this file.)
   ajDmxScopalgRead call failed in seqnr my.file (By default SEQNR
   expects a domain alignment file in the second (sets) filter directory.
   This message is given if some other type of sequence sets file (called
   'my.file') is given. Its not really an error, just a warning).

13.0 AUTHORS

   Jon Ison (jison@rfcgr.mrc.ac.uk)
   MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SB, UK
   based on an original program written by
   Ranjeeva Ranasinghe (rranasin@rfcgr.mrc.ac.uk)

14.0 REFERENCES

   Please cite the authors and EMBOSS.
   Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European
   Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.
   
   See also http://emboss.sourceforge.net/

  14.1 Other useful references
