Description of the genome query for
Gene distance matrix
The user is prompted to enter a protein PID number (gi number). Type it in, or copy and paste it from the results of a previous query.
Three options are available:
1.0 - ((A vs. B)/(A vs. A) + (B vs. A)/(B vs. B)) / 2,
for each of the n(n - 1)/2 pairs of ORFs. Note that although each ORF matches the specified ORF by definition, they do not all necessarily match one another. The specified cutoff is ignored in cross-matching all ORFs, but a cutoff of 1.0e-5 is implicit. Where ORFs do not match one another better than 1.0e-5, a distance of 1.0 will be returned.
The labels within the distance matrix are the gi numbers of each protein. Although this isn't pretty, it's rather necessary. We could not put genome names as labels, since the third option of gene families within a genome would generate a matrix with each entry bearing the same label. Consequently, after the distance matrix, we provide a description of what each gi number represents, and from what genome it is from. If you have any suggestions for improving the labels, let us know.
Example:
If one specifies 7190047 as the gi number, 1.0e-10 as the BLASTP cutoff e-value, and asks for reciprocal best matches, one gets (Dec. 5, 2002) a distance matrix of 40 RecB presumed orthologs, all from Bacteria. This matrix may then be input into a treeing algorithm of choice.
Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland
