Description of the genome query for
Mean values
For each ORF in the chosen genome, the value of the chosen character is added to a running sum, if it falls between the specified minimum and maximum values entered by the user, or if it falls outside this range, depending on the selector chosen. Defaults for the minimum and maximum are -1e7 and +1e7, respectively. Then, the sum is divided by the number of ORFs used in computing the sum. The standard deviation of this mean is also reported to the user.
The characters available:
- ORF length, in base pairs including the stop codon.
- ORF replicon number, e.g. if you want ORFs from a specific plasmid, or a sequential group of plasmids. See here for replicon numbering information.
- ORF product molecular weight, in Daltons.
- ORF mol% G+C.
- ORF mol% G+C, first codon position.
- ORF mol% G+C, second codon position.
- ORF mol% G+C, third codon position.
- ORF mol% A+G.
- ORF product isoelectric point (pI), predicted from the sequence.
- ORF product proportion of hydrophobic amino acids, these being Ala, Ile, Leu, Met, Phe, Pro, Trp, Val.
- ORF product percent net charge, computed as %(Arg+Lys)-(Asp+Glu)
- ORF product mean amino acid composition.
- ORF mean codon usage.
- ORF mean oligonucleotide composition.
- number of archaeal genomes reciprocally best matched.
- number of bacterial genomes reciprocally best matched.
- number of eukaryal genomes reciprocally best matched.
- how well the ORF product matches outside of taxonomic level 1 (e.g., Bacteria).
- how well the ORF product matches outside of taxonomic level 2 (e.g., Proteobacteria).
- how well the ORF product matches outside of taxonomic level 3 (e.g., gamma subdivision).
- how well the ORF product matches outside of taxonomic level 4 (e.g., Enterobacteriaceae).
- evolutionary scope.
- PDS analysis at 1.0e-5.
A few of these characteristics require further explanation:
- (xviii - xxi): How well the ORF product matches outside of the genome's lineage, at the specified depth.
- For instance, choosing taxonomy level 3 for Escherichia coli K12 would find the mean ORFs' best match outside of the gamma subdivision of the Proteobacteria.
- (xxii): Evolutionary scope.
- Using the genomic distance matrix computed at a BLASTP cutoff e-value of 1.0e-5,
we compute the evolutionary scope of an ORF's matches. For example, if an ORF from
Mycoplasma genitalium matches one or more ORFs from Mycoplasma pneumoniae,
Mycoplasma penetrans and Ureaplasma urealyticum, then that ORF's
evolutionary scope is the mean distance between M. genitalium and these other
three species.
Since a genome-specific ORF yields an evolutionary scope of 0.0, you may wish to set the minimum value for this parameter to a very small number greater than 0.0. - (xxiii): PDS analysis.
- What we have done is to determine how well correlated a gene's
relationships to its orthologs in other species is to the genome's
overall relationships to orthologous genes from other species. A low value
indicates a "phylogenetically discordant sequence", or "PDS". A high value
indicates that the gene's relationships track the genome's relationships
well. Details of this analysis are available in:
Clarke et al. (2002) J. Bacteriol. 184, 2072-2080.
Examples:
- The most useful application of this query is to extract a subset of ORFs from the genome for subsequent analysis using another query. For instance, if we are interested in analysing ORFs from Escherichia coli K12 whose protein's predicted isoelectric point is between 5.2 and 5.4, we would select Escherichia coli K12 from the genomic pop-up menu, select "ORF product isoelectric point" from the character pop-up menu, and type 5.2 and 5.4 into the two text boxes, respectively.
- Some means are inherently interesting on their own: the mean codon usage, for instance. Selecting Mycoplasma pneumoniae from the genomic pop-up menu, and "ORF product mean codon usage" from the character pop-up menu (and leaving the minimum and maximum text-entry boxes blank), one retrieves a codon-usage table for the entire set of M. pneumoniae ORFs. We could examine high G+C genes and low G+C genes separately for codon usage, simply by running this query to retrieve the desired G+C range, then using those results as input to determine codon usage.
Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland
