Description of the genome query for

Gene distance matrix

The user is prompted to enter a protein PID number (gi number). Type it in, or copy and paste it from the results of a previous query.

Three options are available:

Best match in other genomes, collects the specified ORF and its best BLASTP hit better than the chosen cutoff e-value from each of the database's other genomes. It then computes the normalized BLASTP-based distance,
1.0 - ((A vs. B)/(A vs. A) + (B vs. A)/(B vs. B)) / 2,
for each of the n(n - 1)/2 pairs of ORFs. Note that although each ORF matches the specified ORF by definition, they do not all necessarily match one another. The specified cutoff is ignored in cross-matching all ORFs, but a cutoff of 1.0e-5 is implicit. Where ORFs do not match one another better than 1.0e-5, a distance of 1.0 will be returned.

Reciprocal best match in other genomes is like the previous option, but the specified protein must be the target's best match as well. We currently do not deal with ties, so the set of ORFs returned may not be as complete as it should be and the distance matrix may therefore be smaller than expected.

Gene family within a genome considers self, and any and all matches better than the chosen cutoff e-value within the same genome.

The labels within the distance matrix are the gi numbers of each protein. Although this isn't pretty, it's rather necessary. We could not put genome names as labels, since the third option of gene families within a genome would generate a matrix with each entry bearing the same label. Consequently, after the distance matrix, we provide a description of what each gi number represents, and from what genome it is from. If you have any suggestions for improving the labels, let us know.

Example:

If one specifies 7190047 as the gi number, 1.0e-10 as the BLASTP cutoff e-value, and asks for reciprocal best matches, one gets (Dec. 5, 2002) a distance matrix of 40 RecB presumed orthologs, all from Bacteria. This matrix may then be input into a treeing algorithm of choice.

Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland

Back to our Home Page