Ortholuge 0.8 Installation Instructions
----------------------------------------

OVERVIEW

Ortholuge is a high-througput pipeline that can generate precise ortholog 
predictions between two species on a genome-wide scale. 

The program analyzes the phylogenetic distances for genes or proteins 
from 2 ingroup species (species being compared) relative to genes from an 
outgroup species, and identifies which genes are most likely to be 
ssd-orthologs ("Supporting Species Divergence" orthologs) and may most 
likely have the same function. It also identifies probable paralogs in a 
list of putative orthologs. Users must supply three FASTA files of gene or
protein sequences from the species' genomes, in addition they may supply an
optional list of putative ortholog "triples" file (putative orthologs from 
the 2 ingroups and 1 outgroup).  If no triples file is supplied, putative 
orthologs are initially determined based on a reciprocal-best-blast 
analysis. Ortholuge then aligns the sequences using MUSCLE or ClustalW, 
edits the alignments using previously determined criteria, calculates
phylogenetic distance ratios using PHYLIP (with EMBOSS), and plots phylogenetic 
distance ratios in user-friendly graphical outputs. The resulting 
Ortholuge ratios and graphs can be flexibly used to identify orthologs 
most likely to have the same function and eliminate probable paralogs 
from a dataset of orthologs. Note that this process is not meant to 
be interactive, it's a high through-put pipeline suitable for genome-scale 
analysis, hence there is no user interface.

For more information, see http://www.pathogenomics.ca/ortholuge

-----------------------------------------------------------------------------
PREREQUISITES

1. Though ortholuge should work with any unix distribution, it has only
been tested with linux.  Currently, Ortholuge has been specifically tested
with: RH 9.0, SuSE 9.0.

    To obtain the following additional requirements, we have provided
    links to each from the Ortholuge website at 
    http://www.pathogenomics.ca/ortholuge


2. Perl version 5.005_03 or higher (5.6.x or higher recommended)
The latest version of Perl can be obtained from http://www.cpan.org

3. The Bioperl library version 1.4 or higher
While Ortholuge should work with Bioperl 1.2, 1.4 or above is highly
recommended.  Bioperl can be obtained from www.bioperl.org

4. A working installation of standalone NCBI BLAST
BLAST can be obtained from ftp://ftp.ncbi.nih.gov/blast/executables/

5a. MUSCLE multiple sequence alignment tool, version 3.52 or higher.
MUSCLE can be obtained from http://phylogenomics.berkeley.edu/muscle/ 
The MUSCLE 3.6 source code and linux binaries are mirrored on the
Ortholuge website: 
http://www.pathogenomics.ca/ortholuge/download.html.  This program 
version has been tested for combatability with the Ortholuge 0.8 
package and we recommend you use this version.
   
- OR -

5b. ClustalW multiple sequence alignment tool
ClustalW can be obtained from ftp://ftp.ebi.ac.uk/pub/software/unix/clustalw/

6. The Phylip package of the EMBASSY Program Suites from EMBOSS.
Phylip can be obtained from ftp://emboss.open-bio.org/pub/EMBOSS
(tested with v3.57c, for other versions please see the Phylip notice below).
The EMBOSS 3.0.0 package and PHYLIP 3.6b EMBASSY application are 
mirrored on the Ortholuge website: 
http://www.pathogenomics.ca/ortholuge/download.html.  These programs have 
been tested for combatability with the Ortholuge 0.8 package and we 
recommend you use these versions. 

7. The Perl SVG 2.33 package, which can be found at 
http://search.cpan.org/~ronan/SVG-2.33/

8. ImageMagick image convertion suite (only needed if you plan to output
graphs in png format).
ImageMagick can be found at http://www.imagemagick.org/

-----------------------------------------------------------------------------
INSTALLATION

If all the above prerequisits are in your default execution path, installation
should be as easy as unpacking this archive and running ortholuge.pl.

Ortholuge consists of 5 scripts:
ortholuge.pl
ortholuge-RBB.pl
ortholuge-align.pl
ortholuge-hist.pl
ortholuge-scatter.pl

As well to work properly, programs such as blastall and muscle should
be in your $PATH.  Most likely this has already been done by your system
administrator.

-----------------------------------------------------------------------------
USAGE

To function properly these scripts must either all be in your default path,
or you must run ortholuge from the directory these scripts are in, or you
may set the --bindir arguement pointing to where they live.

It is recommended that you create an .ortholugerc file in your home directory
containing certain defaults, such as the bindir, this will save you from
setting commonly used arguments each time you run ortholuge.

Arguments can be passed to ortholuge in three ways:
- ~/.ortholugerc file
- A configuration file spcified with the -c option
- Command line arguments

Arguments to ortholuge will be taken in that order, for example a command
line argument will overrule directives in a config file.

Execute: "ortholuge.pl -h" to see a list of allowed arguments and switches
Open: config.sample to see the options allowed in your .ortholugerc file
or a config file

Once you have created a config file or determined which arguments you wish
to use, simply run ortholuge with your three input FASTA files.  Depending
on your processor speed execution could take from a few hours to over a day.

eg.
ortholuge.pl -c myconfig.txt --workdir /tmp/myorth --ingroup1 critter1.faa \
   --ingroup2 critter2.faa --outgroup anothercritter.faa

-----------------------------------------------------------------------------
REPLOTTING GRAPHS

Once you have initially run ortholuge and created a histogram and scatter
plot you may wish to rerun ortholuge to adjust the scales, shading and
cutoffs for these graphs.  Ortholuge allows this to be done quickly without
rerunning the entire pipeline.

Upon the completion of the initial ortholuge run a file named config.txt
will be created in your work directory. Using this as the input configuration
file for regeneration of the graphs will cause ortholuge to use all the
previous settings and computed files from your intial run while skipping the
computation steps.  You may then use the command line arguments to adjust the
graphing settings.

eg.
ortholuge.pl -c /tmp/ortholuge/config.txt --hist-scale 1.6 --hist-grey --scatter-c1 0.6 --scatter-c2 1.2 --scatter-shade --scatter-format png

-----------------------------------------------------------------------------
USING YOUR OWN TRIPLETS FILE

Some users may want to use their own file of putative orthologs, Ortholuge
allows this by-passing the RBB step.

First, you must ensure your set your work directory up in the proper format,
in the working directory you need the three FASTA files of the organisms.

Your triplets file must then be layed out as a tab delimited file containing
the identifiers in the order:
ingroup1 ingroup2 outgroup

The identifiers must be the same as in the FASTA header upto the first space,
the field identified by Bioperl using $seq->id
(ie. >identifier some-text-not-in-the-identifier)

You must then create an Ortholuge config file as detailed in the documentation,
please see config.sample for a sample config file.  Include the following options
to the config file:
[global]
skip-setup=1
infile1=[ingroup1 FASTA file]
infile2=[ingroup2 FASTA file]
outfile=[outgroup FASTA file]

[rbb]
skip-all=1
outfile=[your triplet file]

Assuming all other options previously detailed in the config file are correct,
Ortholuge should now begin running from the multiple sequence alignment step.

-----------------------------------------------------------------------------
PHYLIP NOTICE

Ortholuge was originally designed using Phylip 3.57c, as of writing of this
document, Phylip 3.6b is the current stable release.  Unfortunately as part
of this version change the filename and arguments to eprotdist and ednadist
changed.

For version 3.6b please edit the file ortholuge-align.pl, search for 
eprotdist, comment out the lines for version 3.57c, and uncomment the lines
for version 3.6b.

eg.
if($seqtype eq 'protein') {
    # For Phylip 3.57c
#    $phylip ||= 'eprotdist';
#    $phylipOptions = '-method Kimura -tranrate 2.0';

    # For Phylip 3.6b
    $phylip ||= 'fprotdist';
    $phylipOptions = '-model Kimura -ttratio 2.0'
}
if($seqtype eq 'dna') {
    # For Phylip 3.57c
#    $phylip ||= 'ednadist';
#    $phylipOptions = '-method Kimura -ttratio 2.0 -matrix S';

    # For Phylip 3.6b
    $phylip ||= 'fdnadist';
    $phylipOptions = '-model Kimura -ttratio 2.0';
}


-----------------------------------------------------------------------------
TROUBLE SHOOTING

Q: I receive an error from BLAST similar to:
[blastall] WARNING:  [000.000]  gi|28867250: Unable to open BLOSUM62

A: Your copy of BLAST is not properly installed.  You must create
   a .ncbirc file in your home directory that points to the
   directory that BLAST is installed in.  Please see NCBI's website for
   more details.

-----------------------------------------------------------------------------
QUESTIONS? PROBLEMS? COMMENTS?

Email ortholuge-mail@sfu.ca (contact person: Matthew Laird) and we'd be happy to help.