Description of the genome query for
Construct a phylogenetic distance matrix
The user has a choice of two methods in constructing the Phylip-format distance matrix.
- The first method calculates the mean normalized BLASTP distance of ORFs matching better than the chosen BLASTP cutoff e-value, for each pair of genomes.
- For each ORF in the smaller genome of the pair, the reciprocal best match is found in the larger genome. The ratio of this best match to the smaller-genome ORF's self-match is added to a running sum. Once the sum is fully computed, it is divided by the number of matches between the two genomes. Finally, the distance is computed as 1.0 minus this estimate of similarity. As described in our papers, one may filter out phylogenetically discordant sequences (Clarke et al. 2002. J. Bacteriol. 184, 2072-2080), select genes by functional category (Charlebois et al. 2004. In: Organelles, Genomes and Eukaryote Phylogeny: An Evolutionary Synthesis in the Age of Genomics, CRC Press, pp. 189-206), or weight contributions by concordance or prevalence (Gophna et al. 2005. J. Bacteriol. 187, 1305-1316).
- Bootstrapping, available only for this method, is analogous to the bootstrap method used for sequence alignments. In short, the first matrix output considers each ORF shared between a pair of genomes, and each subsequent matrix samples from these shared ORFs, with replacement, in order to recompute the mean distance.
- Jackknifing, also available only for this method, produces a series of distance matrices computed from random samples (without replacement) selected from the set of ORFs shared between two genomes. The proportion of ORFs selected randomly from the shared set is specified by the user, defaulting to 50%.
- The second method uses the proportion of matching ORFs between two genomes in order to estimate the distance.
- For each pair of genomes, the number of ORFs from the first genome that match something in the second genome better than the chosen BLASTP cutoff e-value is computed, then divided by the number of ORFs in the first genome. The distance is then simply 1.0 minus this proportion.
- Note that the distance between A and B is not the same as that between B and A.
Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland
