
                            ROCON documentation
                                      
   

CONTENTS

   1.0 SUMMARY 
   2.0 INPUTS & OUTPUTS 
   3.0 INPUT FILE FORMAT 
   4.0 OUTPUT FILE FORMAT 
   5.0 DATA FILES 
   6.0 USAGE 
   7.0 KNOWN BUGS & WARNINGS 
   8.0 NOTES 
   9.0 DESCRIPTION 
   10.0 ALGORITHM 
   11.0 RELATED APPLICATIONS 
   12.0 DIAGNOSTIC ERROR MESSAGES 
   13.0 AUTHORS 
   14.0 REFERENCES 

1.0 SUMMARY

   Reads a DHF file (domain hits file) of hits (sequences of unknown
   structural classification) and a domain families file (validation
   sequences of known classification) and writes a "hits file" for the
   hits, which are classified and rank-ordered on the basis of score.
   Generates a hits file from comparing two DHF files

2.0 INPUTS & OUTPUTS

   ROCON reads a DHF file (domain hits file) of hits generated for a
   single node from a classification hierarchy, e.g. SCOP family. These
   sequences are putatively related to the node in question but are, in
   fact, of unknown classification. ROCON also reads a domain families
   file (in DHF format), containing "validation" sequences (of known
   classification). These sequences are used to classify the input hits.
   A "hits file" (suitable for input into the ROCPLOT application) is
   written, which contains the input hits, classified and rank-ordered on
   the basis of score.

3.0 INPUT FILE FORMAT

   The format of the DHF is described in SEQSEARCH documentation. See
   also the example of the DHF file for hit sequences (Figure 1) and
   validation sequences (Figure 2) below.

  Input files for usage example

  File: rocon/rocon.dhf

> Q9YBD5^.^11^105^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^SPARSE
^61.50^0.000e+00^4.000e-10
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q9YBD5^.^95^135^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^SPARSE
^11.50^0.000e+00^4.000e-5
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q9YBD5^.^181^235^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^SPARS
E^161.50^0.000e+00^4.000e-5
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> O26938^.^11^101^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLA
ST^81.90^0.000e+00^3.000e-16
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKPSEVDQIALIAPRATINIVR
DYKIVEKAKVRL
> Q8Z130^.^8^99^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST
^181.00^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> Q7MX57^.^8^99^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST
^80.80^0.000e+00^7.000e-16
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEEEELNRIALIAPNVRLNIIR
DYEVVEKRQVEVP
> Q8TVB1^.^7^98^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST
^72.70^0.000e+00^2.000e-13
VKRIEMGTVLDHLPPGTAPQIMRILDIDPTETTLLVAINVESSKMGRKDILKIEGKILSEEEANKVALVAPNATVNIVR
DYSVAEKFQVKPP
> P96175^.^8^99^SCOP^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST
^107.00^0.000e+00^7.000e-24
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITKSQANQLALLAPNATVNIIE
NFKVTDKHSLALP

  File: rocon.valid

> Q9YBD5^.^11^105^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^6
1.50^0.000e+00^4.000e-10
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q9UX07^.^12^104^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^6
5.80^0.000e+00^2.000e-11
VSKIRNGTVIDHIPAGRALAVLRILGIRGSEGYRVALVMNVESKKIGRKDIVKIEDRVIDEKEASLITLIAPSATINII
RDYVVTEKRHLEVP
> Q9KP65^.^9^100^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^12
8.00^0.000e+00^3.000e-30
VEAIKNGTVIDHIPAKVGIKVLKLFDMHNSAQRVTIGLNLPSSALGSKDLLKIENVFISEAQANKLALYAPHATVNQIE
NYEVVKKLALQLP
> Q9K1K9^.^8^99^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^101
.00^0.000e+00^5.000e-22
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
NFKVVQKRHLNLP
> Q9JWY6^.^8^99^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^98.
90^0.000e+00^2.000e-21
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
HFKVVQKRHLNLP
> Q9HKM3^.^7^99^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^79.
60^0.000e+00^2.000e-15
ISKIRDGTVIDHVPSGKGIRVIGVLGVHEDVNYTVSLAIHVPSNKMGFKDVIKIENRFLDRNELDMISLIAPNATISII
KNYEISEKFQVELP
> Q9HHN3^.^9^101^SCOP^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^78
.50^0.000e+00^4.000e-15
VSKIQAGTVIDHIPAGQALQVLQILGTNGASDDQITVGMNVTSERHHRKDIVKIEGRELSQDEVDVLSLIAPDATINIV
RDYEVDEKRRVDRP
> Q97FS4^.^4^93^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^49.
20^0.000e+00^2.000e-06
INSIKNGIVIDHIKAGHGIKIYNYLKLGEAEFPTALIMNAISKKNKAKDIIKIENVMDLDLAVLGFLDPNITVNIIEDE
KIRQKIQLKLP
> Q97B28^.^8^100^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^79
.20^0.000e+00^2.000e-15
ISKIKDGTVIDHIPSGKALRVLSILGIRDDVDYTVSVGMHVPSSKMEYKDVIKIENRSLDKNELDMISLTAPNATISII
KNYEISEKFKVELP
> Q970X3^.^11^101^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^7
8.50^0.000e+00^3.000e-15
VSKIKNGTVIDHIPAGRALAVLRILKIAEGYRIALVMNVESKKMGKKDIVKIENKEVDEKEANLITLIAPTATINIIRD
YEVVEKKKLKIP
> Q8ZTG2^.^7^99^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^66.
10^0.000e+00^2.000e-11
VSKIENGTVIDHIPAGRALTVLRILGISGKEGLRVALVMNVESKKLGKKDIVKIEGRELTPEEVNIISAVAPTATINII
RNFAVVKKFKVTPP
> Q8ZB38^.^9^100^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^15
6.00^0.000e+00^1.000e-38
VEAIKCGTVIDHIPAQIGFKLLSLFKLTATDQRITIGLNLPSKRSGRKDLIKIENTFLTEQQANQLAMYAPDATVNRID
NYEVVKKLTLSLP
> Q8Z130^.^8^99^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^181
.00^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> Q8U374^.^6^99^SCOP^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^92.
00^0.000e+00^3.000e-19
VSAIKEGTVIDHIPAGKGLKVIQILGLGELKNGGAVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IREYKVVEKFKVEIP
> Q8TVB1^.^7^98^SCOP^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^72.
70^0.000e+00^2.000e-13
VKRIEMGTVLDHLPPGTAPQIMRILDIDPTETTLLVAINVESSKMGRKDILKIEGKILSEEEANKVALVAPNATVNIVR
DYSVAEKFQVKPP
> Q8THL3^.^9^100^SCOP^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^69
.20^0.000e+00^2.000e-12
IQAIENGTVIDHITAGQALNVLRILRISSAFRATVSFVMNAPGARGKKDVVKIEGKELSVEELNRIALISPKATINIIR
DFEVVQKNKVVLP
> Q8PXK6^.^9^100^SCOP^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^62
.70^0.000e+00^2.000e-10
VQAIESGTVIDHIKSGQALNVLRILGISSAFRATISFVMNAPGAGGKKDVVKIEGKELSVEELNRIALISPKATINIIR
DFVVVQKNNVVLP
> Q8K9H8^.^8^99^SCOP^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^146
.00^0.000e+00^1.000e-35
VEAIKSGSVIDHIPAHIGFKLLSLFRFTETEKRITIGLNLPSQKLDKKDIIKIENTFLSDDQINQLAIYAPCATVNYIE
KYNLVGKIFPSLP
> Q8DCF7^.^9^100^SCOP^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^12
7.00^0.000e+00^9.000e-30
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q8D1W6^.^9^100^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^12
3.00^0.000e+00^1.000e-28
VEAIFGGTVIDHIPAQVGLKLLSLFKWLHTKERITMGLNLPSNQQKKKDLIKLENVLLNEDQANQLSIYAPLATVNQIK
NYIVIKKQKLKLP
> Q8A9S4^.^10^101^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^6
3.80^0.000e+00^9.000e-11
VAALKNGTVIDHIPSEKLFTVVQLLGVEQMKCNITIGFNLDSKKLGKKGIIKIADKFFCDEEINRISVVAPYVKLNIIR
DYEVVEKKEVRMP
> Q891I9^.^4^94^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^52.
30^0.000e+00^2.000e-07
ITSIKDGIVIDHIKSGYGIKIFNYLNLKNVEYSVALIMNVFSSKLGKKDIIKIANKEIDIDFTVLGLIDPTITINIIED
EKIKEKLNLELP
> Q87LF7^.^9^100^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^13
0.00^0.000e+00^7.000e-31
VEAIKNGTVIDHIPAQIGIKVLKLFDMHNSSQRVTIGLNLPSSALGHKDLLKIENVFINEEQASKLALYAPHATVNQIE
NYEVVKKLALELP
> Q83IL8^.^8^99^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^189
.00^0.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEEQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> Q7P144^.^7^98^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^128
.00^0.000e+00^3.000e-30
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTEEQANELALFAPKATVNVID
NFEVVKKHKLTLP
> Q7MZ14^.^9^100^SCOP^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^15
0.00^0.000e+00^6.000e-37
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTEQQANQLAMYAPNATVNCIE
NYEVVKKLPINLP
> Q7MX57^.^8^99^SCOP^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^80.
80^0.000e+00^7.000e-16
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEEEELNRIALIAPNVRLNIIR
DYEVVEKRQVEVP
> Q7MHF0^.^9^100^SCOP^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^12
7.00^0.000e+00^8.000e-30
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q58801^.^9^99^SCOP^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^61.
50^0.000e+00^5.000e-10
VKKITNGTVIDHIDAGKALMVFKVLNVPKETSVMIAINVPSKKKGKKDILKIEGIELKKEDVDKISLISPDVTINIIRN
GKVVEKLKPQIP
> P96175^.^8^99^SCOP^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^107
.00^0.000e+00^7.000e-24
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITKSQANQLALLAPNATVNIIE
NFKVTDKHSLALP
> P96111^.^375^472^SCOP^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^
47.30^0.000e+00^9.000e-06
GIKPIENGTVIDHIAKGKTPEEIYSTILKIRKILRLYDVDSADGIFRSSDGSFKGYISLPDRYLSKKEIKKLSAISPNT
TVNIIKNSTVVEKYRIKLP
> P77919^.^6^99^SCOP^.^6^Class 1^.^.^Fold 2^Superfamily 1^Family 2^PSIBLAST^93.
50^0.000e+00^1.000e-19
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFLSEEEVNKIALVAPNATVNI
IRDYKVVEKFKVEVP
> P74766^.^12^104^SCOP^.^6^Class 1^.^.^Fold 2^Superfamily 1^Family 2^PSIBLAST^7
4.20^0.000e+00^7.000e-14
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEISDTEANLITLIAPTATINIV
REYEVVKKTKLEVP
> P57451^.^8^99^SCOP^.^7^Class 1^.^.^Fold 2^Superfamily 2^Family 1^PSIBLAST^143
.00^0.000e+00^1.000e-34
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSDEQINQLAIYAPHATVNYIN
EYNLVRKVFPTLP
> P19936^.^8^99^SCOP^.^7^Class 1^.^.^Fold 2^Superfamily 2^Family 1^PSIBLAST^159
.00^0.000e+00^1.000e-39
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTEQQANQLAMYAPKATVNRID
NYEVVRKLTLSLP
> P08421^.^8^99^SCOP^.^7^Class 1^.^.^Fold 2^Superfamily 2^Family 1^PSIBLAST^183
.00^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTEEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> P00478^.^8^99^SCOP^.^8^Class 1^.^.^Fold 2^Superfamily 2^Family 2^PSIBLAST^191
.00^0.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEDQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> O58452^.^6^99^SCOP^.^8^Class 1^.^.^Fold 2^Superfamily 2^Family 2^PSIBLAST^94.
30^0.000e+00^6.000e-20
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IRNYKVVEKFKVEVP
> O30129^.^6^98^SCOP^.^9^Class 2^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^79.
60^0.000e+00^2.000e-15
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIRDEELNKIALISPNATINLI
RDYEIERKFKVSPP
> O26938^.^11^101^SCOP^.^10^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^
81.90^0.000e+00^3.000e-16
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKPSEVDQIALIAPRATINIVR
DYKIVEKAKVRL

   <--
   Figure 1 Excerpt of DHF file (hit sequences) 

> Q9YBD5^.^11^105^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^SPARSE^61.5
0^0.000e+00^4.000e-10
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q9YBD5^.^95^135^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^SPARSE^11.5
0^0.000e+00^4.000e-5
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q9YBD5^.^181^235^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^SPARSE^161
.50^0.000e+00^4.000e-5
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> O26938^.^11^101^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^81
.90^0.000e+00^3.000e-16
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKPSEVDQIALIAPRATINIVR
DYKIVEKAKVRL
> Q8Z130^.^8^99^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^181.
00^0.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> Q7MX57^.^8^99^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^80.8
0^0.000e+00^7.000e-16
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEEEELNRIALIAPNVRLNIIR
DYEVVEKRQVEVP
> Q8TVB1^.^7^98^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^72.7
0^0.000e+00^2.000e-13
VKRIEMGTVLDHLPPGTAPQIMRILDIDPTETTLLVAINVESSKMGRKDILKIEGKILSEEEANKVALVAPNATVNIVR
DYSVAEKFQVKPP
> P96175^.^8^99^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^107.
00^0.000e+00^7.000e-24
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITKSQANQLALLAPNATVNIIE
NFKVTDKHSLALP

   Figure 1 Excerpt of domain families file (validation sequences) 

> Q9YBD5^.^11^105^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^61.50^
0.000e+00^4.000e-10
VRKIRSGVVIDHIPPGRAFTMLKALGLLPPRGYRWRIAVVINAESSKLGRKDILKIEGYKPRQRDLEVLGIIAPGATFN
VIEDYKVVEKVKLKLP
> Q9UX07^.^12^104^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^65.80^
0.000e+00^2.000e-11
VSKIRNGTVIDHIPAGRALAVLRILGIRGSEGYRVALVMNVESKKIGRKDIVKIEDRVIDEKEASLITLIAPSATINII
RDYVVTEKRHLEVP
> Q9KP65^.^9^100^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^128.00^
0.000e+00^3.000e-30
VEAIKNGTVIDHIPAKVGIKVLKLFDMHNSAQRVTIGLNLPSSALGSKDLLKIENVFISEAQANKLALYAPHATVNQIE
NYEVVKKLALQLP
> Q9K1K9^.^8^99^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^101.00^0
.000e+00^5.000e-22
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
NFKVVQKRHLNLP
> Q9JWY6^.^8^99^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^98.90^0.
000e+00^2.000e-21
VEAIEKGTVIDHIPAGRGLTILRQFKLLHYGNAVTVGFNLPSKTQGSKDIIKIKGVCLDDKAADRLALFAPEAVVNTID
HFKVVQKRHLNLP
> Q9HKM3^.^7^99^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^79.60^0.
000e+00^2.000e-15
ISKIRDGTVIDHVPSGKGIRVIGVLGVHEDVNYTVSLAIHVPSNKMGFKDVIKIENRFLDRNELDMISLIAPNATISII
KNYEISEKFQVELP
> Q9HHN3^.^9^101^.^1^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^78.50^0
.000e+00^4.000e-15
VSKIQAGTVIDHIPAGQALQVLQILGTNGASDDQITVGMNVTSERHHRKDIVKIEGRELSQDEVDVLSLIAPDATINIV
RDYEVDEKRRVDRP
> Q97FS4^.^4^93^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^49.20^0.
000e+00^2.000e-06
INSIKNGIVIDHIKAGHGIKIYNYLKLGEAEFPTALIMNAISKKNKAKDIIKIENVMDLDLAVLGFLDPNITVNIIEDE
KIRQKIQLKLP
> Q97B28^.^8^100^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^79.20^0
.000e+00^2.000e-15
ISKIKDGTVIDHIPSGKALRVLSILGIRDDVDYTVSVGMHVPSSKMEYKDVIKIENRSLDKNELDMISLTAPNATISII
KNYEISEKFKVELP
> Q970X3^.^11^101^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^78.50^
0.000e+00^3.000e-15
VSKIKNGTVIDHIPAGRALAVLRILKIAEGYRIALVMNVESKKMGKKDIVKIENKEVDEKEANLITLIAPTATINIIRD
YEVVEKKKLKIP
> Q8ZTG2^.^7^99^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^66.10^0.
000e+00^2.000e-11
VSKIENGTVIDHIPAGRALTVLRILGISGKEGLRVALVMNVESKKLGKKDIVKIEGRELTPEEVNIISAVAPTATINII
RNFAVVKKFKVTPP
> Q8ZB38^.^9^100^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^156.00^
0.000e+00^1.000e-38
VEAIKCGTVIDHIPAQIGFKLLSLFKLTATDQRITIGLNLPSKRSGRKDLIKIENTFLTEQQANQLAMYAPDATVNRID
NYEVVKKLTLSLP
> Q8Z130^.^8^99^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^181.00^0
.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTDEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> Q8U374^.^6^99^.^2^Class 1^.^.^Fold 1^Superfamily 1^Family 2^PSIBLAST^92.00^0.
000e+00^3.000e-19
VSAIKEGTVIDHIPAGKGLKVIQILGLGELKNGGAVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IREYKVVEKFKVEIP
> Q8TVB1^.^7^98^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^72.70^0.
000e+00^2.000e-13
VKRIEMGTVLDHLPPGTAPQIMRILDIDPTETTLLVAINVESSKMGRKDILKIEGKILSEEEANKVALVAPNATVNIVR
DYSVAEKFQVKPP
> Q8THL3^.^9^100^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^69.20^0
.000e+00^2.000e-12
IQAIENGTVIDHITAGQALNVLRILRISSAFRATVSFVMNAPGARGKKDVVKIEGKELSVEELNRIALISPKATINIIR
DFEVVQKNKVVLP
> Q8PXK6^.^9^100^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^62.70^0
.000e+00^2.000e-10
VQAIESGTVIDHIKSGQALNVLRILGISSAFRATISFVMNAPGAGGKKDVVKIEGKELSVEELNRIALISPKATINIIR
DFVVVQKNNVVLP
> Q8K9H8^.^8^99^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^146.00^0
.000e+00^1.000e-35
VEAIKSGSVIDHIPAHIGFKLLSLFRFTETEKRITIGLNLPSQKLDKKDIIKIENTFLSDDQINQLAIYAPCATVNYIE
KYNLVGKIFPSLP
> Q8DCF7^.^9^100^.^3^Class 1^.^.^Fold 1^Superfamily 2^Family 1^PSIBLAST^127.00^
0.000e+00^9.000e-30
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q8D1W6^.^9^100^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^123.00^
0.000e+00^1.000e-28
VEAIFGGTVIDHIPAQVGLKLLSLFKWLHTKERITMGLNLPSNQQKKKDLIKLENVLLNEDQANQLSIYAPLATVNQIK
NYIVIKKQKLKLP
> Q8A9S4^.^10^101^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^63.80^
0.000e+00^9.000e-11
VAALKNGTVIDHIPSEKLFTVVQLLGVEQMKCNITIGFNLDSKKLGKKGIIKIADKFFCDEEINRISVVAPYVKLNIIR
DYEVVEKKEVRMP
> Q891I9^.^4^94^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^52.30^0.
000e+00^2.000e-07
ITSIKDGIVIDHIKSGYGIKIFNYLNLKNVEYSVALIMNVFSSKLGKKDIIKIANKEIDIDFTVLGLIDPTITINIIED
EKIKEKLNLELP
> Q87LF7^.^9^100^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^130.00^
0.000e+00^7.000e-31
VEAIKNGTVIDHIPAQIGIKVLKLFDMHNSSQRVTIGLNLPSSALGHKDLLKIENVFINEEQASKLALYAPHATVNQIE
NYEVVKKLALELP
> Q83IL8^.^8^99^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^189.00^0
.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEEQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> Q7P144^.^7^98^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^128.00^0
.000e+00^3.000e-30
VEALKQGTVIDHIPAGEGVKILRLFKLTETGERVTVGLNLVSRHMGSKDLIKVENVALTEEQANELALFAPKATVNVID
NFEVVKKHKLTLP
> Q7MZ14^.^9^100^.^4^Class 1^.^.^Fold 1^Superfamily 2^Family 2^PSIBLAST^150.00^
0.000e+00^6.000e-37
VEAIRCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSNRLGKKDLIKIENTFLTEQQANQLAMYAPNATVNCIE
NYEVVKKLPINLP
> Q7MX57^.^8^99^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^80.80^0.
000e+00^7.000e-16
VAAIRNGIVIDHIPPTKLFKVATLLQLDDLDKRITIGNNLRSRSHGSKGVIKIEDKTFEEEELNRIALIAPNVRLNIIR
DYEVVEKRQVEVP
> Q7MHF0^.^9^100^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^127.00^
0.000e+00^8.000e-30
VEAIKNGTVIDHIPAQVGIKVLKLFDMHNSSQRVTIGLNLPSSALGNKDLLKIENVFINEEQASKLALYAPHATVNQIE
DYQVVKKLALELP
> Q58801^.^9^99^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^61.50^0.
000e+00^5.000e-10
VKKITNGTVIDHIDAGKALMVFKVLNVPKETSVMIAINVPSKKKGKKDILKIEGIELKKEDVDKISLISPDVTINIIRN
GKVVEKLKPQIP
> P96175^.^8^99^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^107.00^0
.000e+00^7.000e-24
VEAICNGYVIDHIPSGQGVKILRLFSLTDTKQRVTVGFNLPSHDGTTKDLIKVENTEITKSQANQLALLAPNATVNIIE
NFKVTDKHSLALP
> P96111^.^375^472^.^5^Class 1^.^.^Fold 2^Superfamily 1^Family 1^PSIBLAST^47.30
^0.000e+00^9.000e-06
GIKPIENGTVIDHIAKGKTPEEIYSTILKIRKILRLYDVDSADGIFRSSDGSFKGYISLPDRYLSKKEIKKLSAISPNT
TVNIIKNSTVVEKYRIKLP
> P77919^.^6^99^.^6^Class 1^.^.^Fold 2^Superfamily 1^Family 2^PSIBLAST^93.50^0.
000e+00^1.000e-19
VSAIKEGTVIDHIPAGKGLKVIEILKLGKLTNGGAVLLAMNVPSKKLGRKDIVKVEGRFLSEEEVNKIALVAPNATVNI
IRDYKVVEKFKVEVP
> P74766^.^12^104^.^6^Class 1^.^.^Fold 2^Superfamily 1^Family 2^PSIBLAST^74.20^
0.000e+00^7.000e-14
VSKIKNGTVIDHIPAGRAFAVLNVLGIKGHEGFRIALVINVDSKKMGKKDIVKIEDKEISDTEANLITLIAPTATINIV
REYEVVKKTKLEVP
> P57451^.^8^99^.^7^Class 1^.^.^Fold 2^Superfamily 2^Family 1^PSIBLAST^143.00^0
.000e+00^1.000e-34
VEAIKSGSVIDHIPEYIGFKLLSLFRFTETEKRITIGLNLPSKKLGRKDIIKIENTFLSDEQINQLAIYAPHATVNYIN
EYNLVRKVFPTLP
> P19936^.^8^99^.^7^Class 1^.^.^Fold 2^Superfamily 2^Family 1^PSIBLAST^159.00^0
.000e+00^1.000e-39
VEAIKCGTVIDHIPAQIGFKLLTLFKLTATDQRITIGLNLPSNELGRKDLIKIENTFLTEQQANQLAMYAPKATVNRID
NYEVVRKLTLSLP
> P08421^.^8^99^.^7^Class 1^.^.^Fold 2^Superfamily 2^Family 1^PSIBLAST^183.00^0
.000e+00^0.000e+00
VEAIKCGTVIDHIPAQVGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLTEEQVNQLALYAPQATVNRID
NYDVVGKSRPSLP
> P00478^.^8^99^.^8^Class 1^.^.^Fold 2^Superfamily 2^Family 2^PSIBLAST^191.00^0
.000e+00^0.000e+00
VEAIKRGTVIDHIPAQIGFKLLSLFKLTETDQRITIGLNLPSGEMGRKDLIKIENTFLSEDQVDQLALYAPQATVNRID
NYEVVGKSRPSLP
> O58452^.^6^99^.^8^Class 1^.^.^Fold 2^Superfamily 2^Family 2^PSIBLAST^94.30^0.
000e+00^6.000e-20
VSAIKEGTVIDHIPAGKGLKVIEILGLSKLSNGGSVLLAMNVPSKKLGRKDIVKVEGKFLSEEEVNKIALVAPTATVNI
IRNYKVVEKFKVEVP
> O30129^.^6^98^.^9^Class 2^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^79.60^0.
000e+00^2.000e-15
VSKIKEGTVIDHINAGKALLVLKILKIQPGTDLTVSMAMNVPSSKMGKKDIVKVEGMFIRDEELNKIALISPNATINLI
RDYEIERKFKVSPP
> O26938^.^11^101^.^54894^Class 1^.^.^Fold 1^Superfamily 1^Family 1^PSIBLAST^81
.90^0.000e+00^3.000e-16
VKPIKNGTVIDHITANRSLNVLNILGLPDGRSKVTVAMNMDSSQLGSKDIVKIENRELKPSEVDQIALIAPRATINIVR
DYKIVEKAKVRL

   -->

4.0 OUTPUT FILE FORMAT

   The format of the hits file is described in ROCPLOT documentation. See
   also Figure 3.

  Output files for usage example

  File: rocon.hits

> RELATED 8 ; ROC 2
CROSS        Q8Z130    8     99
UNKNOWN      Q9YBD5    181   235
FALSE        P96175    8     99
TRUE         O26938    11    101
FALSE        Q7MX57    8     99
CROSS        Q8TVB1    7     98
TRUE         Q9YBD5    11    105
TRUE         Q9YBD5    95    135

5.0 DATA FILES

   None.

6.0 USAGE

   Standard (Mandatory) qualifiers:
  [-hitsinfile]        infile     This option specifies the location of the
                                  DHF file (domain hits file) (input). A
                                  'domain hits file' contains database hits
                                  (sequences) with domain classification
                                  information, in the DHF format (FASTA or
                                  EMBL-like). The hits are relatives to a SCOP
                                  or CATH family and are found from a search
                                  of a sequence database. Files containing
                                  hits retrieved by PSIBLAST are generated by
                                  using SEQSEARCH, hits retrieved by a sparse
                                  protein signatare by using SIGSCAN or
                                  various types of HMM and profile by using
                                  LIBSCAN.
  [-validinfile]       infile     This option specifies the name of domain
                                  families file (input). A 'domain families
                                  file' contains sequence relatives (hits) for
                                  each of a number of different SCOP or CATH
                                  families found from searching a sequence
                                  database, e.g. by using SEQSEARCH
                                  (psiblast). The file contains the collated
                                  search results for the indvidual families;
                                  only those hits of unambiguous family
                                  assignment are included. Hits of ambiguous
                                  family assignment are assigned as relatives
                                  to a SCOP or CATH superfamily or fold
                                  instead and are collated into a 'domain
                                  ambiguities file'. The domain families and
                                  ambiguities files are generated by using
                                  SEQSORT and use the same format as a DHF
                                  file (domain hits file).
   -thresh             integer    This option specifies the overlap threshold
                                  for hits. This is the minimum length
                                  (residues) of overlap required for two hits
                                  with the accession number to be counted as
                                  the same hit. The accession number of the
                                  hit, and the start and end point
                                  respectively of the hit relative to full
                                  length sequence are provided in the lists of
                                  hits in the DHF input file. The overlap is
                                  determined from the start and end points of
                                  the hit. For example two hits with the start
                                  and end points of 1-100 and 91-190
                                  respectively are considered to be the same
                                  hit if they have the same accession numbers
                                  and the overlap threshold is 10 or less.
   -mode               menu       This option specifies the classification
                                  scheme to use. See ROCON on-line
                                  documentation for more information.
  [-hitsoutfile]       outfile    This option specifies the name of the hits
                                  files (output). A 'hits file'contains a list
                                  of hits (e.g. from a prediction method)
                                  that are classified and rank-ordered on the
                                  basis of score, p-value, E-value etc. The
                                  files generated by using SIGSCAN and LIBSCAN
                                  will contain the results of a search of a
                                  discriminating element (e.g. hidden Markov
                                  model, profile or signature) against a
                                  sequence database. The ROCPLOT application
                                  is run on the files to perform Receiver
                                  Operator Characteristic (ROC) analysis on
                                  the hits.

   Additional (Optional) qualifiers: (none)
   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-hitsoutfile" associated qualifiers
   -odirectory3        string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write standard output
   -filter             boolean    Read standard input, write standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report deaths


  6.1 COMMAND LINE ARGUMENTS

   Standard (Mandatory) qualifiers Allowed values Default
   [-hitsinfile]
   (Parameter 1) This option specifies the location of the DHF file
   (domain hits file) (input). A 'domain hits file' contains database
   hits (sequences) with domain classification information, in the DHF
   format (FASTA or EMBL-like). The hits are relatives to a SCOP or CATH
   family and are found from a search of a sequence database. Files
   containing hits retrieved by PSIBLAST are generated by using
   SEQSEARCH, hits retrieved by a sparse protein signatare by using
   SIGSCAN or various types of HMM and profile by using LIBSCAN. Input
   file Required
   [-validinfile]
   (Parameter 2) This option specifies the name of domain families file
   (input). A 'domain families file' contains sequence relatives (hits)
   for each of a number of different SCOP or CATH families found from
   searching a sequence database, e.g. by using SEQSEARCH (psiblast). The
   file contains the collated search results for the indvidual families;
   only those hits of unambiguous family assignment are included. Hits of
   ambiguous family assignment are assigned as relatives to a SCOP or
   CATH superfamily or fold instead and are collated into a 'domain
   ambiguities file'. The domain families and ambiguities files are
   generated by using SEQSORT and use the same format as a DHF file
   (domain hits file). Input file Required
   -thresh This option specifies the overlap threshold for hits. This is
   the minimum length (residues) of overlap required for two hits with
   the accession number to be counted as the same hit. The accession
   number of the hit, and the start and end point respectively of the hit
   relative to full length sequence are provided in the lists of hits in
   the DHF input file. The overlap is determined from the start and end
   points of the hit. For example two hits with the start and end points
   of 1-100 and 91-190 respectively are considered to be the same hit if
   they have the same accession numbers and the overlap threshold is 10
   or less. Any integer value 10
   -mode This option specifies the classification scheme to use. See
   ROCON on-line documentation for more information.
   1 (Family classification scheme)
   2 ((Not yet available))
   1
   [-hitsoutfile]
   (Parameter 3) This option specifies the name of the hits files
   (output). A 'hits file'contains a list of hits (e.g. from a prediction
   method) that are classified and rank-ordered on the basis of score,
   p-value, E-value etc. The files generated by using SIGSCAN and LIBSCAN
   will contain the results of a search of a discriminating element (e.g.
   hidden Markov model, profile or signature) against a sequence
   database. The ROCPLOT application is run on the files to perform
   Receiver Operator Characteristic (ROC) analysis on the hits. Output
   file hits.dhf
   Additional (Optional) qualifiers Allowed values Default
   (none)
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

  6.2 EXAMPLE SESSION

   An example of interactive use of ROCON is shown below. Here is a
   sample session with rocon


% rocon 
Generates a hits file from comparing two DHF files.
Name of DHF file (domain hits file) (input).: rocon/rocon.dhf
Name of domain families file (input).: rocon.valid
Overlap threshold for hits. [10]: 10
Classification scheme to use
         1 : Family classification scheme
         2 : (Not yet available)
Select number. [1]: 1
Name of hits files (output). [hits.dhf]: rocon.hits

   Go to the input files for this example
   Go to the output files for this example

7.0 KNOWN BUGS & WARNINGS

   None.

8.0 NOTES

   None.

  8.1 GLOSSARY OF FILE TYPES

   FILE TYPE FORMAT DESCRIPTION CREATED BY SEE ALSO
   Domain hits file DHF format (FASTA-like). Database hits (sequences)
   with domain classification information. The hits are relatives to a
   SCOP or CATH family (or other node in the structural hierarchies) and
   are found from a search of a sequence database. SEQSEARCH (hits
   retrieved by PSIBLAST) N.A.
   Hits file Text file of classified hits A list of hits (e.g. from a
   prediction method) that are classified and rank-ordered on the basis
   of score, p-value, E-value etc. SIGSCAN and LIBSCAN (hits from
   searches of a discriminating element (hidden Markov model, profile or
   signature) against a sequence database). ROCPLOT is run on the files
   to perform Receiver Operator Characteristic (ROC) analysis on the
   hits.
   Domain families & ambiguities file DHF format (FASTA-like). Contains
   sequence relatives (hits) for each of a number of different SCOP or
   CATH families found from PSIBLAST searches of a sequence database. The
   file contains the collated search results for the indvidual families;
   only those hits of unambiguous family assignment are included. Hits of
   ambiguous family assignment are assigned as relatives to a SCOP or
   CATH superfamily or fold instead and are collated into a 'domain
   ambiguities file'. SEQSORT N.A.

   None

9.0 DESCRIPTION

   Discrciminating elements such as hidden Markov models (HMM), sparse
   protein signatures and profiles can be generated for a set of proteins
   with related sequence, structural or functional properties. These
   discriminators are characteristic of the property considered and can
   be used diagnostically, for instance, by screening a database of
   uncharacterised sequences.
   Such screens can be performed by using the LIBSCAN and SIGSCAN
   applications, which generate a DHF file (domain hits file) of database
   hits (sequences). The hits are relatives to a SCOP or CATH family (or
   other node in the structural hierarchies) and are found from a search
   of a sequence database. The DHF file includes domain classification
   information of the family in question.
   When assessing the performance of a predictive method, a "gold
   standard" of truth is required. This is a set of examples that are
   known to be related to the discriminating element, and, ideally, a
   further set that is known to be definitely not related. For example,
   to assess a protein family HMM to detect true members of that family
   requires, at least, a list of the known family members. If a method
   works well for the "gold standard" we can infer it will work well
   generally. Increasingly, use is made of databases such as SCOP, in
   which sequence, structural and functional relationships are
   classified.
   Such a "gold standard" can be generated for SCOP families by using the
   DOMAINATRIX package and particularly the SEQSORT application. SEQSORT
   generates a "domain families file" containing sequence relatives
   (hits) for each of a number of different SCOP or CATH families found
   from PSIBLAST searches of a sequence database. The file contains the
   collated search results for the indvidual families; only those hits of
   unambiguous family assignment are included.
   A powerful measure of diagnostic performance is to use Receiver
   Operator Characteristic (ROC) curves to display graphically the
   sensitivity and specificity of a method. ROC analysis is implemented
   in the ROCPLOT application. ROCPLOT requires a "hits file" containing
   a list of classified hits that are rank-ordered on the basis of score.
   The ROCON application was developed to take as input a DHF file of
   hits, and a domain families file of validation sequences (the gold
   standard) and generate a hits file for use with ROCPLOT.

10.0 ALGORITHM

   The domain families file uses the same format as a DHF file (domain
   hits file). Thus ROCON takes two DHF files as input, one containing
   hits (sequences of unknown classification) to be classified and the
   the domain families file containing sequences (of known
   classification) that are used to make the classification. A DHF file
   includes 6 tokens (in bold in the example below) that collectively
   describe the classification of a sequence as follows: domain class
   (SCOP and CATH domains), domain architecture (CATH only), domain
   topology (CATH only), domain fold (SCOP domains only), domain
   superfamily and domain family (SCOP only) - see below.

> Q9WVI4^.^513^667^SCOP^.^55074^CLASS^ARCHITECTURE^TOPOLOGY^FOLD^SUPERFAMILY^FA
MILY^PSIBLAST^113.00^0.000e+00^2.000e-25
RKFDDVTMLFSDIVGFTAICAQCTPMQVISMLNELYTRFDHQCGFLDIYKVETIGDAYCVASGLHRKSLCHAKPIALMA
LKMMELSEEVLTPDGRPIQMRIGIHSG

   The value of the tokens (CLASS through to FAMILY, but, in the current
   implementation of classification scheme, excluding ARCHITECTURE and
   TOPOLOGY) determines the classification of a hit that is given in the
   ROCON output file as follows.
   If a hit does not overlap significantly with any validation sequence
   then the hit is classified as UNKNOWN. A hit and validation sequence
   are defined as overlapping if they have identical accesssion number
   and have a common region of at least a user-defined number of
   residues. The overlap is determined from the start and end points
   (relative to the full-length sequences) of the hit and validation
   sequences. For example a hit and validation sequence with the same
   accession numbers and with the start and end points of 1-100 and 91 -
   190 respectively are defined as overlapping if the overlap threshold
   is 10 or less.
   If a hit does overlap significantly with a validation sequence it is
   defined as one of TRUE, CROSS or FALSE depending on the value of the
   tokens (CLASS through to FAMILY) as per the table below.

   CLASS         FOLD          SUPERFAMILY   FAMILY        CLASSIFICATION
   Not available Not available Not available Not available UNKNOWN
   Different     Different     Different     Different     FALSE
   Same          Different     Different     Different     FALSE
   Same          Same          Different     Different     CROSS
   Same          Same          Same          Different     CROSS
   Same          Same          Same          Same          TRUE

   Putting this in context of a real example, imagine an input DHF file
   containing hits derived from searching a sequence database with a
   novel type of profile specific to a SCOP family. In this case, the
   full SCOP classification (Class, Fold etc) of the hit are putatively
   assigned. To validate the novel method a validation file of manually
   curated sequences of known classification are used. A TRUE hit would
   be one that overlaps with a validation sequence belonging to the same
   Family (and by implication Superfamily, Fold and Class) to the hit. A
   CROSS hit overlaps with a sequence of the same fold, but different
   family, as a validation sequence, and a FALSE hit overlaps with a
   sequence of a different fold to the hit.
   The hits are rank-ordered on the basis of score before they are
   written to the the Hits (output) file.

11.0 RELATED APPLICATIONS

See also

   Program name                       Description
   contactcount Count specific versus non-specific contacts
   contacts     Generate intra-chain CON files from CCF files
   domainalign  Generate alignments (DAF file) for nodes in a DCF file
   domainrep    Reorder DCF file to identify representative structures
   domainreso   Remove low resolution domains from a DCF file
   interface    Generate inter-chain CON files from CCF files
   libgen       Generate discriminating elements from alignments
   matgen3d     Generate a 3D-1D scoring matrix from CCF files
   psiphi       Phi and psi torsion angles from protein coordinates
   rocplot      Performs ROC analysis on hits files
   scorecmapdir Contact scores for cleaned protein chain contact files
   seqalign     Extend alignments (DAF file) with sequences (DHF file)
   seqfraggle   Removes fragment sequences from DHF files
   seqsearch    Generate PSI-BLAST hits (DHF file) from a DAF file
   seqsort      Remove ambiguous classified sequences from DHF files
   seqwords     Generates DHF files from keyword search of UniProt
   siggen       Generates a sparse protein signature from an alignment
   siggenlig    Generate ligand-binding signatures from a CON file
   sigscan      Generate hits (DHF file) from a signature search
   sigscanlig   Search ligand-signature library & write hits (LHF file)

12.0 DIAGNOSTIC ERROR MESSAGES

   None.

13.0 AUTHORS

   Jon Ison (jison@rfcgr.mrc.ac.uk)
   MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

14.0 REFERENCES

   Please cite the authors and EMBOSS.
   Rice P, Longden I and Bleasby A (2000) "EMBOSS - The European
   Molecular Biology Open Software Suite" Trends in Genetics, 15:276-278.
   
   See also http://emboss.sourceforge.net/

  14.1 Other useful references
