Ortholuge 0.8 Installation Instructions ---------------------------------------- OVERVIEW Ortholuge is a high-througput pipeline that can generate precise ortholog predictions between two species on a genome-wide scale. The program analyzes the phylogenetic distances for genes or proteins from 2 ingroup species (species being compared) relative to genes from an outgroup species, and identifies which genes are most likely to be ssd-orthologs ("Supporting Species Divergence" orthologs) and may most likely have the same function. It also identifies probable paralogs in a list of putative orthologs. Users must supply three FASTA files of gene or protein sequences from the species' genomes, in addition they may supply an optional list of putative ortholog "triples" file (putative orthologs from the 2 ingroups and 1 outgroup). If no triples file is supplied, putative orthologs are initially determined based on a reciprocal-best-blast analysis. Ortholuge then aligns the sequences using MUSCLE or ClustalW, edits the alignments using previously determined criteria, calculates phylogenetic distance ratios using PHYLIP (with EMBOSS), and plots phylogenetic distance ratios in user-friendly graphical outputs. The resulting Ortholuge ratios and graphs can be flexibly used to identify orthologs most likely to have the same function and eliminate probable paralogs from a dataset of orthologs. Note that this process is not meant to be interactive, it's a high through-put pipeline suitable for genome-scale analysis, hence there is no user interface. For more information, see http://www.pathogenomics.ca/ortholuge ----------------------------------------------------------------------------- PREREQUISITES 1. Though ortholuge should work with any unix distribution, it has only been tested with linux. Currently, Ortholuge has been specifically tested with: RH 9.0, SuSE 9.0. To obtain the following additional requirements, we have provided links to each from the Ortholuge website at http://www.pathogenomics.ca/ortholuge 2. Perl version 5.005_03 or higher (5.6.x or higher recommended) The latest version of Perl can be obtained from http://www.cpan.org 3. The Bioperl library version 1.4 or higher While Ortholuge should work with Bioperl 1.2, 1.4 or above is highly recommended. Bioperl can be obtained from www.bioperl.org 4. A working installation of standalone NCBI BLAST BLAST can be obtained from ftp://ftp.ncbi.nih.gov/blast/executables/ 5a. MUSCLE multiple sequence alignment tool, version 3.52 or higher. MUSCLE can be obtained from http://phylogenomics.berkeley.edu/muscle/ The MUSCLE 3.6 source code and linux binaries are mirrored on the Ortholuge website: http://www.pathogenomics.ca/ortholuge/download.html. This program version has been tested for combatability with the Ortholuge 0.8 package and we recommend you use this version. - OR - 5b. ClustalW multiple sequence alignment tool ClustalW can be obtained from ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/ 6. The Phylip package of the EMBASSY Program Suites from EMBOSS. Phylip can be obtained from ftp://emboss.open-bio.org/pub/EMBOSS (tested with v3.57c, for other versions please see the Phylip notice below). The EMBOSS 3.0.0 package and PHYLIP 3.6b EMBASSY application are mirrored on the Ortholuge website: http://www.pathogenomics.ca/ortholuge/download.html. These programs have been tested for combatability with the Ortholuge 0.8 package and we recommend you use these versions. 7. The Perl SVG 2.33 package, which can be found at http://search.cpan.org/~ronan/SVG-2.33/ 8. ImageMagick image convertion suite (only needed if you plan to output graphs in png format). ImageMagick can be found at http://www.imagemagick.org/ ----------------------------------------------------------------------------- INSTALLATION If all the above prerequisits are in your default execution path, installation should be as easy as unpacking this archive and running ortholuge.pl. Ortholuge consists of 5 scripts: ortholuge.pl ortholuge-RBB.pl ortholuge-align.pl ortholuge-hist.pl ortholuge-scatter.pl As well to work properly, programs such as blastall and muscle should be in your $PATH. Most likely this has already been done by your system administrator. ----------------------------------------------------------------------------- USAGE To function properly these scripts must either all be in your default path, or you must run ortholuge from the directory these scripts are in, or you may set the --bindir arguement pointing to where they live. It is recommended that you create an .ortholugerc file in your home directory containing certain defaults, such as the bindir, this will save you from setting commonly used arguments each time you run ortholuge. Arguments can be passed to ortholuge in three ways: - ~/.ortholugerc file - A configuration file spcified with the -c option - Command line arguments Arguments to ortholuge will be taken in that order, for example a command line argument will overrule directives in a config file. Execute: "ortholuge.pl -h" to see a list of allowed arguments and switches Open: config.sample to see the options allowed in your .ortholugerc file or a config file Once you have created a config file or determined which arguments you wish to use, simply run ortholuge with your three input FASTA files. Depending on your processor speed execution could take from a few hours to over a day. eg. ortholuge.pl -c myconfig.txt --workdir /tmp/myorth --ingroup1 critter1.faa \ --ingroup2 critter2.faa --outgroup anothercritter.faa ----------------------------------------------------------------------------- REPLOTTING GRAPHS Once you have initially run ortholuge and created a histogram and scatter plot you may wish to rerun ortholuge to adjust the scales, shading and cutoffs for these graphs. Ortholuge allows this to be done quickly without rerunning the entire pipeline. Upon the completion of the initial ortholuge run a file named config.txt will be created in your work directory. Using this as the input configuration file for regeneration of the graphs will cause ortholuge to use all the previous settings and computed files from your intial run while skipping the computation steps. You may then use the command line arguments to adjust the graphing settings. eg. ortholuge.pl -c /tmp/ortholuge/config.txt --hist-scale 1.6 --hist-grey --scatter-c1 0.6 --scatter-c2 1.2 --scatter-shade --scatter-format png ----------------------------------------------------------------------------- USING YOUR OWN TRIPLETS FILE Some users may want to use their own file of putative orthologs, Ortholuge allows this by-passing the RBB step. First, you must ensure your set your work directory up in the proper format, in the working directory you need the three FASTA files of the organisms. Your triplets file must then be layed out as a tab delimited file containing the identifiers in the order: ingroup1 ingroup2 outgroup The identifiers must be the same as in the FASTA header upto the first space, the field identified by Bioperl using $seq->id (ie. >identifier some-text-not-in-the-identifier) You must then create an Ortholuge config file as detailed in the documentation, please see config.sample for a sample config file. Include the following options to the config file: [global] skip-setup=1 infile1=[ingroup1 FASTA file] infile2=[ingroup2 FASTA file] outfile=[outgroup FASTA file] [rbb] skip-all=1 outfile=[your triplet file] Assuming all other options previously detailed in the config file are correct, Ortholuge should now begin running from the multiple sequence alignment step. ----------------------------------------------------------------------------- PHYLIP NOTICE Ortholuge was originally designed using Phylip 3.57c, as of writing of this document, Phylip 3.6b is the current stable release. Unfortunately as part of this version change the filename and arguments to eprotdist and ednadist changed. For version 3.6b please edit the file ortholuge-align.pl, search for eprotdist, comment out the lines for version 3.57c, and uncomment the lines for version 3.6b. eg. if($seqtype eq 'protein') { # For Phylip 3.57c # $phylip ||= 'eprotdist'; # $phylipOptions = '-method Kimura -tranrate 2.0'; # For Phylip 3.6b $phylip ||= 'fprotdist'; $phylipOptions = '-model Kimura -ttratio 2.0' } if($seqtype eq 'dna') { # For Phylip 3.57c # $phylip ||= 'ednadist'; # $phylipOptions = '-method Kimura -ttratio 2.0 -matrix S'; # For Phylip 3.6b $phylip ||= 'fdnadist'; $phylipOptions = '-model Kimura -ttratio 2.0'; } ----------------------------------------------------------------------------- TROUBLE SHOOTING Q: I receive an error from BLAST similar to: [blastall] WARNING: [000.000] gi|28867250: Unable to open BLOSUM62 A: Your copy of BLAST is not properly installed. You must create a .ncbirc file in your home directory that points to the directory that BLAST is installed in. Please see NCBI's website for more details. ----------------------------------------------------------------------------- QUESTIONS? PROBLEMS? COMMENTS? Email ortholuge-mail@sfu.ca (contact person: Matthew Laird) and we'd be happy to help.