Description of the genome query for
Lineage-specific and Species-specific ORFs
A) Lineage-specific ORFs:
For each ORF in the chosen genome, or for each ORF in the "results from my last query,"
For each of the ORF's matches to the database that is better than or equal
to the specified BLASTP cutoff e-value, a check is made to see if the hit
is to a sequence from a species outside the chosen lineage.
If none of the hits are to sequences from species outside the lineage, and if this ORF has at least two hits to the database, the ORF is reported. This final constraint (at least two hits) prevents the reporting of single-copy species-specific ORFs.
B) Species-specific ORFs:
For each ORF in the chosen genome, or for each ORF in the "results from my
last query,"
For each of the ORF's matches to the database that is better than or equal to the specified BLASTP cutoff e-value,
A check is made to see if the hit is to a sequence from an species other than the chosen genome's species.
If none of the hits are to sequences from species other than the genome's own, the ORF is reported.
C) Non-species-specific ORFs:
This query is exactly as that for "Species-specific ORFs" except that the ORFs reported are those which do match a sequence from another species.
Examples:
- Say that we are interested in finding the bacterial-specific ORFs from the Mycoplasma genitalium genome. These are genes that are present only in Bacteria (not in Archaea or Eukaryota). By selecting a BLASTP cutoff e-value of 1.0e-5, we are asking the database to exclude ORFs even if they only slightly match outside the Bacteria. We select Mycoplasma genitalium from the pop-up menu, and select taxonomic level 1. Clicking the Submit button (on May 12, 2002), we get 187 ORFs, out of M. genitalium's 480. These include a number of annotated genes, as well as numerous unknowns. Since some of these ORFs may actually be specific to the more restricted sub-bacterial lineages, we can try selecting different taxonomic levels: Low G+C Firmicutes (level 2): 107; Bacillus/Clostridium group (level 3): 107; and Mollicutes (level 4): 93. Thus 93 ORFs are restricted to the Mollicutes, 14 are found in other Bacillus/Clostridium group members, none are found in other Low G+C Firmicutes, and 80 are found in other Bacteria.
- From example 1 above, we found 187 ORFs that were bacterial-specific in Mycoplasma genitalium. If we are interested in knowing how many of M. genitalium's ORFs are specific to M. genitalium itself, we can try the "Species-specific" option. Again selecting a BLASTP cutoff e-value of 1.0e-5, we get (on May 12, 2002) four ORFs, all "hypothetical". These ORFs may or may not be real genes: a match to the databases is a good sign that an ORF is real; the absence of a match might indicate that the ORF is truly an species-specific gene, or an annotation artifact (a "statistical ORF").
Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland
