
                                megamerger 



Function

   Merge two large overlapping nucleic acid sequences

Description

   megamerger takes two overlapping sequences and merges them into one
   sequence. It could thus be regarded as the opposite of what splitter
   does.

   The sequences can be very long. The program does a match of all
   sequence words of size 20 (by default). It then reduces this to the
   minimum set of overlapping matches by sorting the matches in order of
   size (largest size first) and then for each such match it removes any
   smaller matches that overlap. The result is a set of the longest
   ungapped alignments between the two sequences that do not overlap with
   each other. If the two sequences are identical in their region of
   overlap then there will be one region of match and no mismatches.

   It should be possible to merge sequences that are Mega bytes long.
   Compare this with the program merger which does a more accurate
   alignment of more divergent sequences using the Needle and Wunsch
   algorithm but which uses much more memory.

   The sequences should ideally be identical in their region of overlap.
   If there are any mismatches between the two sequences then megamerger
   will still attempt to create a merged sequence, but you should check
   that this is what you required.

   A report of the actions of megamerger is written out. Any actions that
   require a choice between using regions of the two sequences where they
   have a mismatch is marked with the word WARNING!. The sequence in
   these regions is written out in uppercase. All other regions of the
   output sequence are written in lowercase.

   Where there is a mismatch then the sequence that is chosen to supply
   the region of the mismatch in the final merged sequence is that
   sequence whose mismatch region is furthest from the start of end of
   the sequence.

Usage

   Here is a sample session with megamerger


% megamerger tembl:ap000504 tembl:af129756 
Merge two large overlapping nucleic acid sequences
Word size [20]: 
Output sequence [ap000504.merged]: 
Output file [ap000504.megamerger]: report

   Go to the input files for this example
   Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-asequence]         sequence   Sequence USA
  [-bsequence]         sequence   Sequence USA
   -wordsize           integer    Word size
  [-outseq]            seqout     Output sequence USA
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers:
   -prefer             boolean    When a mismatch between the two sequence is
                                  discovered, one or other of the two
                                  sequences must be used to create the merged
                                  sequence over this mismatch region. The
                                  default action is to create the merged
                                  sequence using the sequence where the
                                  mismatch is closest to that sequence's
                                  centre. If this option is used, then the
                                  first sequence (seqa) will always be used in
                                  preference to the other sequence when there
                                  is a mismatch.

   Advanced (Unprompted) qualifiers: (none)
   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1             integer    Start of the sequence to be used
   -send1               integer    End of the sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2             integer    Start of the sequence to be used
   -send2               integer    End of the sequence to be used
   -sreverse2           boolean    Reverse (if DNA)
   -sask2               boolean    Ask for begin/end/reverse
   -snucleotide2        boolean    Sequence is nucleotide
   -sprotein2           boolean    Sequence is protein
   -slower2             boolean    Make lower case
   -supper2             boolean    Make upper case
   -sformat2            string     Input sequence format
   -sdbname2            string     Database name
   -sid2                string     Entryname
   -ufo2                string     UFO features
   -fformat2            string     Features format
   -fopenfile2          string     Features file name

   "-outseq" associated qualifiers
   -osformat3           string     Output seq format
   -osextension3        string     File name extension
   -osname3             string     Base file name
   -osdirectory3        string     Output directory
   -osdbname3           string     Database name to add
   -ossingle3           boolean    Separate file for each entry
   -oufo3               string     UFO features
   -offormat3           string     Features format
   -ofname3             string     Features file name
   -ofdirectory3        string     Output directory

   "-outfile" associated qualifiers
   -odirectory4         string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


   Standard (Mandatory) qualifiers Allowed values Default
   [-asequence]
   (Parameter 1) Sequence USA Readable sequence Required
   [-bsequence]
   (Parameter 2) Sequence USA Readable sequence Required
   -wordsize Word size Integer 2 or more 20
   [-outseq]
   (Parameter 3) Output sequence USA Writeable sequence <sequence>.format
   [-outfile]
   (Parameter 4) Output file name Output file <sequence>.megamerger
   Additional (Optional) qualifiers Allowed values Default
   -prefer When a mismatch between the two sequence is discovered, one or
   other of the two sequences must be used to create the merged sequence
   over this mismatch region. The default action is to create the merged
   sequence using the sequence where the mismatch is closest to that
   sequence's centre. If this option is used, then the first sequence
   (seqa) will always be used in preference to the other sequence when
   there is a mismatch. Boolean value Yes/No No
   Advanced (Unprompted) qualifiers Allowed values Default
   (none)

Input file format

   megamerger reads any two Sequence USAs.

  Input files for usage example

   'tembl:ap000504' is a sequence entry in the example nucleic acid
   database 'tembl'

  Database entry: tembl:ap000504

ID   AP000504   standard; DNA; HUM; 100000 BP.
XX
AC   AP000504; BA000025;
XX
SV   AP000504.1
XX
DT   28-SEP-1999 (Rel. 61, Created)
DT   22-AUG-2001 (Rel. 68, Last updated, Version 3)
XX
DE   Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I region, section
DE   3/20.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia
;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-100000
RA   Hirakawa M., Yamaguchi H., Imai K., Shimada J.;
RT   ;
RL   Submitted (21-SEP-1999) to the EMBL/GenBank/DDBJ databases.
RL   Mika Hirakawa, Japan Science and Technology Corporation (JST), Advanced
RL   Databases Department; 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-0081, Japan
RL   (E-mail:mika@tokyo.jst.go.jp, URL:http://www-alis.tokyo.jst.go.jp/,
RL   Tel:81-3-5214-8491, Fax:81-3-5214-8470)
XX
RN   [2]
RA   Shiina S., Tamiya G., Oka A., Inoko H.;
RT   "Homo sapiens 2,229,817bp genomic DNA of 6p21.3 HLA class I region";
RL   Unpublished.
XX
DR   SWISS-PROT; O00299; CLI1_HUMAN.
DR   SWISS-PROT; O43196; MSH5_HUMAN.
DR   SWISS-PROT; O95445; APOM_HUMAN.
DR   SWISS-PROT; O95865; DDH2_HUMAN.
DR   SWISS-PROT; O95867; NG24_HUMAN.
DR   SWISS-PROT; P13862; KC2B_HUMAN.
XX
CC   This sequence is conducted by Tokai University as a JST sequencing
CC   Team.
CC   Principal Investigator: Hidetoshi Inoko Ph.D
CC   Phone:+81-463-93-1121, Fax:+81-463-94-8884,
CC   The sequence is submitted by Human Genome Sequencing in ALIS
CC   project of JST
CC   Japan Science and Technology Corporation (JST)
CC   5-3, Yonbancyo, Chiyoda-ku, Tokyo, 102-0081 Japan
CC   For further infomation about this sequences, please visit our
CC   sequence archive Web site (http://www-alis.tokyo.jst.go.jp/HGS/top.


  [Part of this file has been deleted for brevity]

     gggtggatca tgaggtcaag agatcgagac tatcctggct aacatgatga aaccccgtct     9708
0
     ctactaaaaa tacaaaaaat tagctgggca tggtggcggg cacctgtagt cccagctact     9714
0
     cgggaggctg agtcaggaga atggtgtgaa cccaggagac ggagcttgca gtgagctgag     9720
0
     gtcgcaccac tgcactccag cctgggtgat agagcgagac tctgtctcaa aaaaaaaaaa     9726
0
     aaaaaaaaaa aaaacaaaaa ttagccgggt gtggtggcag gcaacttaat cccagctact     9732
0
     tgggaggcag aggcaggaga atcgtttgaa cctgggaggc ggaggttgaa gagaatagaa     9738
0
     gctctgctgg tccagagaag gattgggcca gggctctggg agaccaggga gaaagagggc     9744
0
     acatgtggtc cctgttgact gtgagggtgg gaatctgagg aaggctttgg ctcattgccc     9750
0
     cttgggtttg tccacagcca tccttcccct gcggagtatg tcgaggtgct ccaggagcta     9756
0
     cagcggctgg agagtcgcct ccagcccttc ttgcagcgct actacgaggt tctgggtgct     9762
0
     gctgccacca cggactacaa taacaatgtg agccctttga tggccctgcc ctttctcctc     9768
0
     agccccagta ctcccaaaac agaacaggct gaaatacaga taactctttc cctccctgga     9774
0
     aaaacattgc aacagggcca ggtgcagtgg ctcacgcctg taatcccagc actttgggag     9780
0
     gccaaggtgg gcggatcatc tgagatcggg agtttgagac cagcctggcc aacatggtgc     9786
0
     aaccccatct ctactgaaaa tataaacatt agctggatgt agtggtgcac acctgtaatc     9792
0
     ccagctactc aggaggctga ggcaggagaa tcgctagaac tcgggaggag ggggttgcag     9798
0
     tgagccgaga ttgcactact gcactctagc ctgggtgaca gagcgagact gtctcaaaaa     9804
0
     acaaaacaaa acaaaaaaac acacattgca acaaaacaat ttctctctaa acctgtaagt     9810
0
     gattttgtcc tcccttacag agaaggtgat aatctttgct gtaagcactg tcctcgtatc     9816
0
     gtaccccttg tgcccctgaa tgaatttaga aaatgtaaag tacaggagat cagtatatga     9822
0
     tgacttactg attcatagta gtgttttaat aggatgttcc ttatgtgaat aagatataat     9828
0
     ttatttgcaa agatttggtc tacatgtaaa cttccaagga tataactgaa agttttggag     9834
0
     gacatggtat tctcagtagg cattattgct tttattagtg agatggactc cagcttgata     9840
0
     ttttctgcct ttttgtgttt ggctggttgt gcgcagcacg agggccggga ggaggatcag     9846
0
     cggttgatca acttggtagg ggagagcctg cgactgctgg gcaacacctt tgttgcactg     9852
0
     tctgacctgc gctgcaatct ggcctgcacg cccccacgac acctgcatgt ggtccggcct     9858
0
     atgtctcact acaccacccc catggtgctc cagcaggcag ccattcccat acaggtgggt     9864
0
     tagggggagt ctggcctgag ggagagtgag gggtgttgat agagtgaccc agggtagcta     9870
0
     ctgggcctga aggaggttag gaaaggagga gactggaaac atggtgatga aggctggaga     9876
0
     tactttagag gtttatcatg aggttttctt ggttaggctc ttgtattttt ctcacatctg     9882
0
     cctgtccatc tgtctttttc agatcaatgt gggaaccact gtgaccatga caggaaatgg     9888
0
     gactcggccc cccccaactc ccaatgcaga ggcacctccc cctggtcctg ggcaggcctc     9894
0
     atccgtggct ccgtcttcta ccaatgtcga gtcctcagct gagggggctc ccccgccagg     9900
0
     tccagctccc ccgccagcca ccagccaccc gagggtcatc cggatttccc accagagtgt     9906
0
     ggaacccgtg gtcatgatgc acatgaacat tcaaggtgag aatagttgct ggcgagaaga     9912
0
     gcaggatcag catgatgagg gaggttcatg ctgaggtgtg agggaacagg gtggggaagg     9918
0
     gagaggcaca tgctggtggt ggtagcctgg ggaccagagc agaagcttaa gtagacagat     9924
0
     gtggggggtg tgggggttgg tttgtctttg gaggtgtgtt tgtgtggtga agggagtacc     9930
0
     tctccctgtt tagatggagg gaaaggcagg ctttctgatt gggggattat gggcctgaag     9936
0
     tatgcctgat ctcagaagga tatagttagg ccttggccct acctacctca gggccactgt     9942
0
     ctctgtctcc ctgcccagat tctggcacac agcctggtgg tgttccgagt gctcccactg     9948
0
     gccccctggg accccctggt catggccaaa ccctgggtaa gagtgagggc atcagggcag     9954
0
     gctgagctct gggtagagaa agggaagggc tgagtgggtg ggttgaaggg gtccaggttc     9960
0
     aaggttacat cagacccgcc ccccaggctc caccctcatc cagctgccct ccctgccccc     9966
0
     tgagttcatg cacgccgtcg cccaccagat cactcatcag gccatggtgg cagctgttgc     9972
0
     ctccgcggcc gcaggtaatg acctggaagg ggaggcttgg gaggtagggc acagtccatg     9978
0
     gtggcagctg gctggcaagg gcctggccct cagccctctt cggtctgtct cttctgccac     9984
0
     ccacaggaca gcaggtgcca ggcttcccaa cagctccaac ccgggtggtg attgcccggc     9990
0
     ccactcctcc acaggctcgg ccttcccatc ctggagggcc cccagtctct gggacactgg     9996
0
     tgagcaaggg tcggggagtt ctagtgcgta acagtctagg                          10000
0
//

  Database entry: tembl:af129756

ID   AF129756   standard; DNA; HUM; 184666 BP.
XX
AC   AF129756;
XX
SV   AF129756.1
XX
DT   12-MAR-1999 (Rel. 59, Created)
DT   29-OCT-1999 (Rel. 61, Last updated, Version 2)
XX
DE   Homo sapiens MSH55 gene, partial cds; and CLIC1, DDAH, G6b, G6c, G5b, G6d,
DE   G6e, G6f, BAT5, G5b, CSK2B, BAT4, G4, Apo M, BAT3, BAT2, AIF-1, 1C7, LST-1
,
DE   LTB, TNF, and LTA genes, complete cds.
XX
KW   .
XX
OS   Homo sapiens (human)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia
;
OC   Eutheria; Primates; Catarrhini; Hominidae; Homo.
XX
RN   [1]
RP   1-184666
RA   Rowen L., Madan A., Qin S., Shaffer T., James R., Ratcliffe A., Abbasi N.,
RA   Dickhoff R., Loretz C., Madan A., Dors M., Young J., Lasky S., Hood L.;
RT   "Sequence of the human major histocompatibility complex class III region";
RL   Unpublished.
XX
RN   [2]
RP   1-184666
RA   Rowen L.;
RT   ;
RL   Submitted (22-FEB-1999) to the EMBL/GenBank/DDBJ databases.
RL   Department of Molecular Biotechnology, Box 357730 University of Washington
,
RL   Seattle, WA 98195, USA
XX
RN   [3]
RP   1-184666
RA   Rowen L.;
RT   ;
RL   Submitted (28-OCT-1999) to the EMBL/GenBank/DDBJ databases.
RL   Multimegabase Sequencing Center, University of Washington, PO Box 357730,
RL   Seattle, WA 98195, USA
XX
DR   EPD; EP11158; HS_TNFA.
DR   EPD; EP11159; HS_TNFB.
DR   SPTREMBL; O00452; O00452.
DR   SPTREMBL; O14931; O14931.
DR   SPTREMBL; O95866; O95866.
DR   SPTREMBL; O95868; O95868.
DR   SPTREMBL; O95869; O95869.
DR   SPTREMBL; O95870; O95870.


  [Part of this file has been deleted for brevity]

     aaaccagttt accaccactc ctaacactaa acttaaatct gactctaaat gtaagtccaa    18174
0
     tctgagccac aagcctaaag ttgaacttta tcctgcttta tgaattattc atccattcct    18180
0
     ccatttagtg agtatctgcg tgcctaacac atgctgggca ttgtcctaag gcaggaggga    18186
0
     catggaggca aagggatcag agaaggtacc agcacctgtg gagcttgtat tccagtgagg    18192
0
     ccagacggaa aagaaagaaa ctgaagaaga aattggtact atgagaaaat aagacaggct    18198
0
     gatgttgtaa gagtggcagg gagctacttt taaatacagt agtcagcaaa atcctctttg    18204
0
     agtgtttggg tggcactgga gctgagaccc aaatgacaaa aaatagtgac caggtaaaag    18210
0
     tttgggagca aagcatttca ggtaaaggga gcagctactg caaaggctgg aaggcggaac    18216
0
     caagctgggg gtgttgacga caaacagaag gccagtgtgg ctggagcaga gagagagact    18222
0
     gggaggcggg tgggagatga ggtcagagag gagggcaggg gccaggtcat gcagggccat    18228
0
     gcaagaaggg taaagcctct agatttcatc cagccacagg aagcctttaa aggtcgtcag    18234
0
     agtgtgtggt gcgtgcgtgt gtgtgtgtgt gtgtgtgtgt gttgcagggg agagaggggg    18240
0
     agggagagag agagagagag agagaagagg gaggtgagca gaggtgattg gatttttttt    18246
0
     tcttttgaca tggtgtcttg ctctgtggcc taggctggag tgcagtggca ccatcatagc    18252
0
     ccactgcaac ctcaaaacca tgggctcaag tcatccttcc acctcagctt cccaagtatc    18258
0
     taggactaca ggtgtgtgcc actgtgcctg gctaatttta aaaaatattt taaaattttt    18264
0
     gttgagacag ggtctatgct gctcaggctg gtctcgaact cctggtttca agtgatctgc    18270
0
     ccatcttggc ctcccaaagt ttttttttgt tagtttgaga ggcggtttcg ctcgttgccc    18276
0
     aggctggagt gcaatgactg atctcatctc actgcaacct ctgcctcctg ggttcaagcg    18282
0
     attctcctgc ttcagcctcc caagtagctg ggattacagg tgcatgccac cattcccggc    18288
0
     taattttttg tatttagtag agatggggtt tcaccatgtt agtcaggctg atctcaaact    18294
0
     cctgacctca ggtgatccgc ctgcctcagc ctcccaaagt tttgggatta caggtgtgag    18300
0
     ccaccatgct gggccagcct cccaaagttt tgggattaca ggcatgagtc accacactgg    18306
0
     ccctggattt tttttctttc ttttttttgg agacggagtc tcactctgtt gcccaggctg    18312
0
     gagtgcaatg gcgtaatctc agctcactgc aacctctgct gcccgggttc aaacgattct    18318
0
     cctgtcttag cctcctgagt agctgggatt ataggtgcat gccaccatgc ctggctaatt    18324
0
     tttgtacttt tagtagagaa agtacaccat cttggccagg ctggtctcga actcctgacc    18330
0
     tcaggtgatc cacttgcgtc ggcctcccaa agtgctggga ttacaggcgt gagacaccgc    18336
0
     acccagcctt tttttttttt tttcttttaa gacagaatcg ctctgtcacc caggctggag    18342
0
     tgcagtggca caatctcggc tcactgcaac ctctgcctcc caggtttaag caatccacct    18348
0
     atgtcagtct cccaagtagc tgggattata ggtgcatgtc accatgcctg gctaattttt    18354
0
     gtacttttag tatagaaagt acaccatgtt ggccaggctg gtcttgaact cctgacctca    18360
0
     agtgatccgc ctgcctcagc ctcccgaagt gctggaatta cagacatgtg ccactgcacc    18366
0
     cggcctggtt ttttttttct aagagatgga gtctcacttt tctgcccagg ttggagtgca    18372
0
     atggcaccat catagctcac tgcagccttc aactcttggc ctcaggcaat ccttgcacct    18378
0
     tagcctcgca gtgttgggat tacaggcatg agccactgag ccttgcctgg actttttttt    18384
0
     ttttttgaga tggcgtctcg ctctgttgcc caggttggag tgctacggca tgatcttggc    18390
0
     tcactgcaac ttccacctcc caggttcaag cgattctctt gcctcggccc cccgagtagc    18396
0
     tgggattaca ggcatgcgcc accgtgcctg gctaattttg gtatttttag tagagatagg    18402
0
     gtttcatcat gttgggcagg ctggtcttga actcctgacc tcgtgatcca cccacctcgg    18408
0
     cctcccaaag tgctgggatt ataggcatag ccaacgcgcc cagcctggac ttgtttttaa    18414
0
     aagatcactg tggctcctgt gtttaggctg gctggtagga gacaggtggc agtggcattg    18420
0
     atggtgaaga gaaaatagtg gcagccatgg agatggagag aagtagacaa gtttgggata    18426
0
     tattatacat tccaggggta gaaacaacag gactagatga tggattgatg ggtgggagat    18432
0
     gtagatactg ggagagaagc aggattctga tggatggaaa aactaaaaaa ttctattttg    18438
0
     ggtgtggtaa gtctaagtct attagacatg caagtagaga tgtcactggg cagatacaca    18444
0
     tctggatttc aggggcaagg tccaagctag agaaagaaac ctgggcatgg tcagcatgag    18450
0
     gatggtgttt aaagccatgg aacttatctt gtgcatccct ataagacccc tttgaggcac    18456
0
     ttgtttcccc tcacaatgga tgcagtgcat cttccattct gaattccaga ggcaacaacc    18462
0
     tcctgctcct agaagctaaa ctctccagac ttagtcttct gaattc                   18466
6
//

Output file format

  Output files for usage example

  File: report

# Report of megamerger of: AP000504 and AF129756

AP000504 overlap starts at 1
AF129756 overlap starts at 6036

Using AF129756 1-6035 as the initial sequence

Matching region AP000504 1-846 : AF129756 6036-6881
Length of match: 846

WARNING!
Mismatch region found:
Mismatch AP000504 847-847
Mismatch AF129756 6882-6882
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 848-1794 : AF129756 6883-7829
Length of match: 947

WARNING!
Mismatch region found:
Mismatch AP000504 1795-1795
Mismatch AF129756 7830-7830
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 1796-2272 : AF129756 7831-8307
Length of match: 477

WARNING!
Mismatch region found:
Mismatch AP000504 2273-2273
Mismatch AF129756 8307
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 2274-2465 : AF129756 8308-8499
Length of match: 192

WARNING!
Mismatch region found:
Mismatch AP000504 2466-2466
Mismatch AF129756 8500-8500
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 2467-2654 : AF129756 8501-8688
Length of match: 188

WARNING!
Mismatch region found:
Mismatch AP000504 2655-2658
Mismatch AF129756 8688


  [Part of this file has been deleted for brevity]


WARNING!
Mismatch region found:
Mismatch AP000504 95451-95451
Mismatch AF129756 101481-101481
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 95452-96649 : AF129756 101482-102679
Length of match: 1198

WARNING!
Mismatch region found:
Mismatch AP000504 96650-96650
Mismatch AF129756 102680-102680
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 96651-97272 : AF129756 102681-103302
Length of match: 622

WARNING!
Mismatch region found:
Mismatch AP000504 97273-97274
Mismatch AF129756 103302
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 97275-97715 : AF129756 103303-103743
Length of match: 441

WARNING!
Mismatch region found:
Mismatch AP000504 97716-97716
Mismatch AF129756 103744-103744
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 97717-97826 : AF129756 103745-103854
Length of match: 110

WARNING!
Mismatch region found:
Mismatch AP000504 97827-97827
Mismatch AF129756 103855-103855
Mismatch is closer to the ends of AP000504, so use AF129756 in the merged seque
nce

Matching region AP000504 97828-100000 : AF129756 103856-106028
Length of match: 2173

AP000504 overlap ends at 100000
AF129756 overlap ends at 106028

Using AF129756 106029-184666 as the final sequence

  File: ap000504.merged

>AP000504 AP000504.1 Homo sapiens genomic DNA, chromosome 6p21.3, HLA Class I r
egion, section 3/20.
gaattctctccctcccatctgtggctgagattaaagatctgcacctgaagcactgaagaa
tgtgtgggtaattaaattaccctgccgattcctggagatgctgattacctggagatgacc
tcagagattatcctaaattaactcctacaagacacacattgcagcgggaggtgaggtagg
ggaaggattgtgcacctgaggctagcaaaggtttccactctgtttagagatgatgtcacc
agtgcgtttacatttgcgttgtgttcacacattgagtgctactatgtacaagaccatgtg
tcagacactgaggtaacaggtatagctactagatagagttcatagctatggaggcaagca
gattcactgactgctaattctaacatgatgtgacagtgcaactagaaaaacataaacaag
cactatgtgagcacaaagaaggtgcacatcaactccttacaggtacctgtaaaagccaaa
gggtaacagttggattgcaccttgaagaggatgcacttttttttttttttaagacagaat
ctcactctgttgcccaggctggagtgcagtggggcaatctgggctcacttcaacctctac
ctcccgagttcaagcaattctcctgcctcagcctcctgagtagctggtactagaggcatg
cgccatgatgctgggctaatttttgtattttcggtagacgtgaagtttcaccaagttggc
caggctggtcttgaactcctgacctcaaatcatccacccacctcagcctcccaaaatgct
gagactacaggcgtgagccaccgcgcctgacctggatgtaagattttgataggtacagaa
caaggaaaagactttccaggccgggcacagtggcttatgcctgtaatcccagcactttgg
gaggccgaggtgggcagatcacgaggtcagcagttcaagaccagcgtggccaacatggtg
aaatcccatctctactaaaaatacaaaaattagccaggcatggtggtgggtccctgtaat
cccagctactccggaggctgaggcaggagaattgcttgaacctgggaggtggaggttgca
gtgagcaaagaccgcgccactgcactccagcctgggtgacagagggagactccgtctcaa
aaaaaaaaaaaaaagactttccagaaggagcagcataaacacaggcatgacatgtttcca
taatggcaagtggccctaaatgactagaatataaggtagatccagtaggaaaggacttag
aaggggctttggaaggtgagtctggaaattaaaactggggtaaacgtgatggaccctgaa
catcattatactgcttaagatgctaatcttaatcctgaaggtaatgggaaaacctcctaa
ggtttatgttattttctttctacttaggctatttaaaaagtggagtgacggccaggcgca
gtgactcatgcctgtaatcccagcactttgggaggccgaggtgggcggatcaccaggagt
tcgagaccagcctgaccaacatggtgaaaccccgcctctactgaaaatacaaaaattagc
caggtgtggtggtgggcgcctgtaatcccagctacttgggaggctgaggcagaagaattg
cttgaacccgggaagtggaggttgcagtgagcagagatcgtgccattgtactccagcctg
ggcaacaagagcgaaactcagtcacaaaaaaaaaaaaaaaaaaaaaggagtgacatgctt
agatctctgttttggaatgacaggttttttgtttctagcatcaatccaaggttcatggct
tgagaaggtgtactgccagcaatgccattaaccagcaaagggaatgcaggaagaggaaca
gatctggtgggcatcagtttggatgctctgagtttgagctgcctgtgaaaactgcaggtg
gtgatatgcaattaacattcacatacggagttcaaaactagagacacaaatttgagagtc
atcacagaaatgtgaagtgtgttttctataactaaagataaccatgctaacatagccatg
tgttacattagcattttttttttttttgagacggagtctcactctgttgcccaggctgaa
gtgcagtgcacaatcttggctcactgcaacctccacctcctgggttcaagcgattctcct
gccttagtctcctgagtagctggaattacaggcacctaccaacacgcttggctaattttt
gcattttagtagagatggggctttaccatgttggccagctggtctcaaactcctgacctc
aagtgattcacccaccttggccccccaaagtgctgggattacaggtgtgagccactgtgc
ccggccttacattttgtgttttttcctgctgcttgtatgtgtgcaagtctgtgtatcatc
aatgggtatatgtgtacctgcgctgacaacaaaaaatgagatgcatatcagctactacac
aaagctgttataaggatgaaatgcagttagccagtgctcagtaaagggcagttgctttac
tactactaggtggggtggtgtatgtgagaatctgtatactgccattagtaggctttagta
tgtagtgtgcatatggaattcatgcattagtgtgtagtatgtgtgggacccactcacctg
agcagcttctctccccacttacagtggcatctgttgaggattcctgtgagggataaggca
gggagtgaacttgttacaaggcagggacagggaatggaatgtgtttatgtgtctaagctg
aggcatccaggtcagaggtgctggttgttgaggaagctggcctgggagggcacaaaggca
gccaaagctggtgcctggccacaaatatgagctgggattaccgtacatggagatggggga
agggatggacactcacagggacacttagccagaaaaatacacaaagcagacctagttaaa


  [Part of this file has been deleted for brevity]

accccctaaataaaacttctcctctaccccaacccaaccctgtttctagggctaatcttg
aaaccagtttaccaccactcctaacactaaacttaaatctgactctaaatgtaagtccaa
tctgagccacaagcctaaagttgaactttatcctgctttatgaattattcatccattcct
ccatttagtgagtatctgcgtgcctaacacatgctgggcattgtcctaaggcaggaggga
catggaggcaaagggatcagagaaggtaccagcacctgtggagcttgtattccagtgagg
ccagacggaaaagaaagaaactgaagaagaaattggtactatgagaaaataagacaggct
gatgttgtaagagtggcagggagctacttttaaatacagtagtcagcaaaatcctctttg
agtgtttgggtggcactggagctgagacccaaatgacaaaaaatagtgaccaggtaaaag
tttgggagcaaagcatttcaggtaaagggagcagctactgcaaaggctggaaggcggaac
caagctgggggtgttgacgacaaacagaaggccagtgtggctggagcagagagagagact
gggaggcgggtgggagatgaggtcagagaggagggcaggggccaggtcatgcagggccat
gcaagaagggtaaagcctctagatttcatccagccacaggaagcctttaaaggtcgtcag
agtgtgtggtgcgtgcgtgtgtgtgtgtgtgtgtgtgtgtgttgcaggggagagaggggg
agggagagagagagagagagagagaagagggaggtgagcagaggtgattggatttttttt
tcttttgacatggtgtcttgctctgtggcctaggctggagtgcagtggcaccatcatagc
ccactgcaacctcaaaaccatgggctcaagtcatccttccacctcagcttcccaagtatc
taggactacaggtgtgtgccactgtgcctggctaattttaaaaaatattttaaaattttt
gttgagacagggtctatgctgctcaggctggtctcgaactcctggtttcaagtgatctgc
ccatcttggcctcccaaagtttttttttgttagtttgagaggcggtttcgctcgttgccc
aggctggagtgcaatgactgatctcatctcactgcaacctctgcctcctgggttcaagcg
attctcctgcttcagcctcccaagtagctgggattacaggtgcatgccaccattcccggc
taattttttgtatttagtagagatggggtttcaccatgttagtcaggctgatctcaaact
cctgacctcaggtgatccgcctgcctcagcctcccaaagttttgggattacaggtgtgag
ccaccatgctgggccagcctcccaaagttttgggattacaggcatgagtcaccacactgg
ccctggattttttttctttcttttttttggagacggagtctcactctgttgcccaggctg
gagtgcaatggcgtaatctcagctcactgcaacctctgctgcccgggttcaaacgattct
cctgtcttagcctcctgagtagctgggattataggtgcatgccaccatgcctggctaatt
tttgtacttttagtagagaaagtacaccatcttggccaggctggtctcgaactcctgacc
tcaggtgatccacttgcgtcggcctcccaaagtgctgggattacaggcgtgagacaccgc
acccagcctttttttttttttttcttttaagacagaatcgctctgtcacccaggctggag
tgcagtggcacaatctcggctcactgcaacctctgcctcccaggtttaagcaatccacct
atgtcagtctcccaagtagctgggattataggtgcatgtcaccatgcctggctaattttt
gtacttttagtatagaaagtacaccatgttggccaggctggtcttgaactcctgacctca
agtgatccgcctgcctcagcctcccgaagtgctggaattacagacatgtgccactgcacc
cggcctggttttttttttctaagagatggagtctcacttttctgcccaggttggagtgca
atggcaccatcatagctcactgcagccttcaactcttggcctcaggcaatccttgcacct
tagcctcgcagtgttgggattacaggcatgagccactgagccttgcctggactttttttt
ttttttgagatggcgtctcgctctgttgcccaggttggagtgctacggcatgatcttggc
tcactgcaacttccacctcccaggttcaagcgattctcttgcctcggccccccgagtagc
tgggattacaggcatgcgccaccgtgcctggctaattttggtatttttagtagagatagg
gtttcatcatgttgggcaggctggtcttgaactcctgacctcgtgatccacccacctcgg
cctcccaaagtgctgggattataggcatagccaacgcgcccagcctggacttgtttttaa
aagatcactgtggctcctgtgtttaggctggctggtaggagacaggtggcagtggcattg
atggtgaagagaaaatagtggcagccatggagatggagagaagtagacaagtttgggata
tattatacattccaggggtagaaacaacaggactagatgatggattgatgggtgggagat
gtagatactgggagagaagcaggattctgatggatggaaaaactaaaaaattctattttg
ggtgtggtaagtctaagtctattagacatgcaagtagagatgtcactgggcagatacaca
tctggatttcaggggcaaggtccaagctagagaaagaaacctgggcatggtcagcatgag
gatggtgtttaaagccatggaacttatcttgtgcatccctataagacccctttgaggcac
ttgtttcccctcacaatggatgcagtgcatcttccattctgaattccagaggcaacaacc
tcctgctcctagaagctaaactctccagacttagtcttctgaattc

   A merged sequence is written out.

   Where there has been a mismatch between the two sequences, the merged
   sequence is written out in uppercase and the sequence whose mismatch
   region is furthest from the edges of the sequence is used in the
   merged sequence.

   The name and description of the first input sequence is used for the
   name and description of the output sequence.

   A report of the merger is written out.

Data files

   None.

Notes

   If you run out of memory, use a larger wordsize.

References

   None.

Warnings

   None.

Diagnostic Error Messages

   None.

Exit status

   It always exits with status 0.

Known bugs

   None.

See also

   Program name                 Description
   cons         Creates a consensus from multiple alignments
   merger       Merge two overlapping nucleic acid sequences

   Compare this with the program merger which does a more accurate
   alignment of more divergent sequences using the Needle and Wunsch
   algorithm but which uses much more memory.

   A graphical dotplot of the matches used in this merge can be displayed
   using the program dotpath.

Author(s)

   Gary Williams (gwilliam  rfcgr.mrc.ac.uk)
   MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust
   Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

   Written Aug 2000 by Gary Williams.

Target users

   This program is intended to be used by everyone and everything, from
   naive users to embedded scripts.

Comments

   None
