A Complete List of Available Analyses, Queries & Utilities
Below is a list of the genomic analyses, queries and utilities available through NGI's bioinformatics web service. Though a brief description of each is given, a more detailed description is available by clicking on the "more information..." link in the relevant query page.
Lineage-specific and species-specific ORFs.
For any of the sequenced genomes, we identify all of its protein-coding genes which are specific to the genome's own lineage, at the specified depth and with the specified stringency.
Species-specific ORFs are sequences which cannot be found in any other species at the specified stringency,
but may be found in other genomes of the same species.
Comparative inventories of sequenced genomes.
We provide lists of ORFs which are shared, or not shared, between genomes
at a specified stringency.
Multigenomic comparisons.
View a table showing the number of homologs of a genome's ORFs found in each
of the sequenced genomes.
ORFs found in a set of genomes.
Count and find ORFs that are present or not present in various subsets of a defined
set of genomes.
ORFs shared exclusively with one or more lineages.
Find ORFs which are found in the selected genome and in zero or more other
lineages, at the chosen NCBI level. You can specify an upper cutoff for inclusion,
as well as a lower cutoff for exclusion. For example, one can find the ORFs
from Thermotoga maritima which are present in one other phylum of Bacteria
at a BLASTP cutoff of 1.0e-20, but not present in any other phylum of Bacteria
at a BLASTP cutoff of 1.0e-10. Furthermore, you can specify whether to ignore
or to exclude matches in outer lineages (in the example above, matches outside
the Bacteria).
Gene families.
For any of the sequenced genomes, lists are generated grouping ORFs based on BLASTP similarity.
The search is iterative, since members of a family may not all "connect" to
one another above the chosen level of stringency, the relationship only being
apparent through intermediates.
Gene clusters.
For any pair of sequenced genomes, we provide lists of ORFs which occur near
each other in both genomes.
Genetic mosaicism.
We provide lists of a genome's ORFs whose best match outside the genome's
own lineage is to another, specified, lineage. For instance, which of the
ORFs from Methanococcus jannaschii has a best match outside the Archaea
to Proteobacteria?
Select ORFs by functional category.
The user selects a genome, or the results from his or her last query, and
selects a functional category or a group of related functional categories.
The query then returns each of the ORFs satisfying the request.
Sorted lists of protein characteristics.
A genome's proteins can be sorted by character (e.g., ORF length, MW, position,
mol%G+C, mol%A+G, pI, etc.). You may also specify to display only the top,
middle, or bottom n% of the ORFs sorted according to the chosen character.
Protein characteristics plotted against each other.
These plots can be generated for proteins within a genome, or for homologous
proteins between genomes.
You can examine, for instance, the relationship between a protein's length
and the strength of its match to something in the database, or you can view
pI vs. hydrophobicity, etc.
Between genomes, are homologous proteins smaller in one than the other? More
acidic?
Plot of cumulative strand bias by position.
We plot, for the chosen replicon of the chosen genome, the cumulative sum
of the ratio of (A-C)/(A+C), (A-G)/(A+G), (A-T)/(A+T) or (G-C)/(G+C), or the
cumulative sum of locus orientation (rightward = 1; leftward = -1), along
the replicon. For instance, if you plot (G-C)/(G+C) for Escherichia coli,
you will clearly see the location of the origin of replication (the graph's
minimum) and the location of the terminus of replication (the graph's maximum).
Genomic dot plots.
Available for inspection are BLASTP-based dot plots for each of the sequenced
microbial genomes. Within a genome, a dot plot illustrates duplicated sequences:
repeated sequences & gene families. Between genomes, dot plots reveal
levels of conservation of sequence and of gene order.
Organism phylogenies.
Using the mean BLASTP similarities of orthologous ORFs (at the specified level
of stringency), we construct a Phylip-format distance matrix with which you
can construct a phylogenetic tree.
Triplet-controlled four-taxon tree analysis.
For any pair of sequenced genomes, we identify all of their protein-coding genes
which are suspected of being involved in lateral gene transfer, since the pair of
genomes diverged from one another. This query can also be used to collect information
on the distribution of topologies observed in four-taxon tree analysis.
Mean values.
Here, we calculate the mean value of a characteristic from a collection of
ORFs. You may, for instance, find ORFs which are lineage-specific,
then have this analysis determine their mean length, hydrophobicity, molecular
weight, etc. This query may also be used to produce a cut of ORFs whose characteristic
falls within a specified range of values.
Intergenic distances.
Display the ORFs in map order, with orientation and distance between genes.
Logical NOT of the results from the last query.
This utility will perform a Logical NOT function on the results of your last query or analysis.
Random Loci.
Find random clusters of loci from the selected genome.
List Best Matches.
List the best BLASTP matches for each ORF in a selected genome.
Competitive Matching.
Find genes that match a gene from a member of one set of genomes better than it matches any gene from a
second set of genomes.
Retrieve Sequences.
Retrieve the sequences related to a sequence specified by its gi number.
Gene Distance Matrix.
For an ORF specified by its gi number, produce a normalized BLASTP-based distance
matrix of its homologs.
Similarity Histogram.
Provides a histogram of normalized BLASTP-based similarities of ORFs shared
by two genomes.
ORFs Closer than Expected.
Finds possible cases of lateral gene transfer by finding ORFs whose sequences
are much closer to ORFs from one of a pair of genomes, than the ORFs within the pair are to each
other.
Find Ubiquitous Genes.
Find genes from a selected genome or a selected group of genomes that are ubiquitously
distributed in a specified group of taxa.
List Named Ubiquitous Genes.
Find genes whose consensus gene name is found ubiquitously in a specified group of taxa.
Find Cross-Phylum Ubiquitous Genes.
Find genes that are present in most members of each prokaryotic phylum.
Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland
