Bioinformatics Recent News
June 4, 2007
The EMU server was down for most of the month of May after a bug was found in the processing of pseudogenes. The effect of this bug was to assign some properties of the pseudogene (particularly length information) to the folloing gene in the source GenBank file. The most obvious consequence of this was the assignment of G+C content values >100% to approximately 100 genes, scattered across about 70 genomes.
Changes have been made such that source files for EMU are now processed via BioPerl and loaded into a relational database before being formatted for EMU. This change resolves the majority of problems with the reading of .gbk files. There are still some oddities in a handful of .gbk files that violate the assumptions made by BioPerl about the GenBank format, but these are detected and handled early in the process.
EMU is now up and running with 493 genomes; the full list can be viewed here. The 'extra' databases have been removed: of the three, Caenorhabditis elegans and Drosophila melanogaster have been added to the database as full genomes, while the set of viruses has been dropped for the time being. These will ultimately be offered as part of an expanded service after hardware migration has been carried out, but for now are not available for analysis.
Finally, a note on strain names: the NCBI Genome Project ID is now included internally as a unique ID to avoid confusion among strains of the the same species. If they have no specific strain designation, genomes will have '*xxxx' in place of their strain name, where 'xxxx' is the NCBI Genome Project ID. This is in no way an official part of the naming convention for that genome, but exists to provide continuity when the strain name is added by NCBI.
September 5, 2006
EMU up and running again
The EMU server was inoperative for longer than expected due to connectivity problems, but these have been resolved and the server is up and running again.
August 15, 2006
Server downtime
The EMU server will be unable to process queries for a 48 hour period between approximately 3 PM, Monday 14 August and 3 PM, Wednesday 16 August (all times Australian Eastern Standard). Another update will be performed when the server upgrade is complete.
May 19, 2006
Even more genomes!
EMU has been updated with 16 new genomes, bringing the total to 352. Included in this update are a number of fungal genomes, and a couple of older fungal genomes (notably Schizosaccharomyces pombe and Ermothecium gossypii) that were out of date have been updated as well.
Here are the new genomes, with bacteria and then fungi in alphabetical order:
- Acidobacterium sp. Ellin345
- Baumannia cicadellinicola Homalodisca coagulata
- Deinococcus geothermalis DSM 11300
- Lawsonia intracellularis PHE MN1 00
- Psychrobacter cryohalolentis K5
- Ralstonia metallidurans CH34
- Streptococcus pyogenes MGAS 10270
- Streptococcus pyogenes MGAS10750
- Streptococcus pyogenes MGAS2096
- Streptococcus pyogenes MGAS9429
- Aspergillus fumigatus Af293
- Candida glabrata CBS138
- Cryptococcus neoformans JEC21
- Debaryomyces hansenii CBS767
- Kluyveromyces lactis NRRL_Y-1140
- Yarrowia lipolytica CLIB99
May 9, 2006
EMU up and running with 336 genomes
The update and migration of EMU (the new, public version of NGIBWS) is complete!
Please note, though, that we are trying to fix a problem with the display of query results in any browser that is not Internet Explorer. Running a query in e.g. Firefox will still return a result, but the page will be displayed as html source rather than being interpreted by the browser. We're working on it, but it's turning out to be a thorny problem.
The following genomes have been added to the database, bringing the total number up to 336:
- Burkholderia xenovorans LB400
- Chromohalobacter salexigens DSM 3043
- Escherichia coli UTI89
- Lactobacillus salivarius UCC118
- Methanococcoides burtonii DSM 6242
- Methylobacillus flagellatus KT
- Nitrobacter hamburgensis X14
- Polaromonas sp. JS666
- Rhodopseudomonas palustris BisB18
- Rhodopseudomonas palustris BisB5
- Rickettsia bellii RML369-C
- Shewanella denitrificans OS217
- Syntropus aciditrophicus SB
April 26, 2006
Progress in migration of NGIBWS:
The port of NGIBWS to emu.imb.uq.edu.au is nearly complete. Service from this address should begin within the next two weeks.
The database is currently being updated with the following genomes:
- Anaeromyxobacter dehalogenans 2CP-C
- Anaplasma phagocytophilum HZ
- Chlamydophila felis Fe_C-56
- Desulfitobacterium hafniense Y51
- Escherichia coli W3110
- Ehrlichia chaffeensis Arkansas
- Erythrobacter litoralis HTCC2594
- Frankia CcI3
- Jannaschia CCS1
- Francisella tularensis holarctica
- Methanosphaera stadtmanae DSM 3091
- Methanospirillum hungatei JF-1
- Neorickettsia sennetsu Miyayama
- Novosphingobium aromaticivorans DSM 12444
- Phytoplasma asteris AYWB (Aster yellows witches-broom phytoplasma AYWB)
- Rhizobium etli CFN_42
- Rhodoferax ferrireducens DSM 15236
- Rhodopseudomonas palustris HaA2
- Saccharophagus degradans 2-40
- Sodalis glossinidius morsitans
- Staphylococcus aureus NCTC 8325
- Staphylococcus aureus USA 300
- Synechococcus JA-3-3Ab (Cyanobacteria bacterium Yellowstone A-Prime)
- Synechococcus JA-3-3B (Cyanobacteria bacterium Yellowstone B-Prime)
Another update, involving at least fifteen newer genomes, will be carried out in the next two weeks as well.
Changes to the EMU interface:
A few cosmetic changes have been made to Web service, reflecting the 15 000 km move from North America to Australia. In particular, note the change of Contact Details for website feedback.
The Available Genomes page now contains links to each of the genomes at NCBI. This will hopefully be helpful in identifying whether, e.g., Ambiguosum nomenarbitrarium str. ACK1 is the same as Paraambiguosum sp. conflatii.
January 4, 2006
299 genomes are currently included in our database.
The last update occurred on January 4, 2006, adding the recently completed genomes for:
- Burkholderia thailandensis E264
- Hahella chejuensis KCTC 2396
- Moorella thermoacetica ATCC 39073
- Mycoplasma capricolum ATCC 27343
- Rhodospirillum rubrum ATCC 11170
- Salinibacter ruber DSM 13855
Important notes reflecting recent changes to the NGIBWS code base:
We have modified the method for computing consensus gene names, the most common annotated name found amongst a gene's reciprocal best matches. We now weight contributions by strength, which appears to improve accuracy to some degree especially where paralogs evolve at different rates. A more detailed description of our auto-annotation system will be prepared soon; revisit this web site for an eventual link to that page.
Due to the overwhelming number of publicly available genomes, it was necessary to
revise the NGIBWS database to reside largely on disk rather than in the server's main memory.
Consequently, a few queries will take significantly longer to run,
though most should complete almost as quickly as before.
As always, for queries that do take a long time to complete, please be
patient and please ensure that your browser is not set to time out.
The revised system also necessitated a slight modification to the method of computing
reciprocal best matches. We feel that the new method is as good as, and probably better than,
the previous method. You may notice a slight difference in the results of a query,
compared to previously, especially where equidistant paralogs exist.
Finally, we corrected a bug affecting the gene family query for a few methanogenic
Archaea. If ever you detect an unexpected result and believe that there is a bug in the
system, you will be doing everyone a favour by reporting it to us, please!
NGIBWS has moved!
The NGIBWS code base will not be modified (with, e.g. new queries or
query options) until after it is transferred to its new home. Web pages at NeuroGadgets Inc.
will redirect you to the new web site when it becomes available, but will no longer serve
NGIBWS directly. We expect the move to occur within the first half of 2006.
This page was last modified on June 4, 2007.
Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland
