Bioinformatics Recent News

June 4, 2007

The EMU server was down for most of the month of May after a bug was found in the processing of pseudogenes. The effect of this bug was to assign some properties of the pseudogene (particularly length information) to the folloing gene in the source GenBank file. The most obvious consequence of this was the assignment of G+C content values >100% to approximately 100 genes, scattered across about 70 genomes.

Changes have been made such that source files for EMU are now processed via BioPerl and loaded into a relational database before being formatted for EMU. This change resolves the majority of problems with the reading of .gbk files. There are still some oddities in a handful of .gbk files that violate the assumptions made by BioPerl about the GenBank format, but these are detected and handled early in the process.

EMU is now up and running with 493 genomes; the full list can be viewed here. The 'extra' databases have been removed: of the three, Caenorhabditis elegans and Drosophila melanogaster have been added to the database as full genomes, while the set of viruses has been dropped for the time being. These will ultimately be offered as part of an expanded service after hardware migration has been carried out, but for now are not available for analysis.

Finally, a note on strain names: the NCBI Genome Project ID is now included internally as a unique ID to avoid confusion among strains of the the same species. If they have no specific strain designation, genomes will have '*xxxx' in place of their strain name, where 'xxxx' is the NCBI Genome Project ID. This is in no way an official part of the naming convention for that genome, but exists to provide continuity when the strain name is added by NCBI.

September 5, 2006


EMU up and running again

The EMU server was inoperative for longer than expected due to connectivity problems, but these have been resolved and the server is up and running again.

August 15, 2006


Server downtime

The EMU server will be unable to process queries for a 48 hour period between approximately 3 PM, Monday 14 August and 3 PM, Wednesday 16 August (all times Australian Eastern Standard). Another update will be performed when the server upgrade is complete.

May 19, 2006


Even more genomes!

EMU has been updated with 16 new genomes, bringing the total to 352. Included in this update are a number of fungal genomes, and a couple of older fungal genomes (notably Schizosaccharomyces pombe and Ermothecium gossypii) that were out of date have been updated as well.

Here are the new genomes, with bacteria and then fungi in alphabetical order:

May 9, 2006


EMU up and running with 336 genomes

The update and migration of EMU (the new, public version of NGIBWS) is complete!

Please note, though, that we are trying to fix a problem with the display of query results in any browser that is not Internet Explorer. Running a query in e.g. Firefox will still return a result, but the page will be displayed as html source rather than being interpreted by the browser. We're working on it, but it's turning out to be a thorny problem.

The following genomes have been added to the database, bringing the total number up to 336:

April 26, 2006


Progress in migration of NGIBWS:

The port of NGIBWS to emu.imb.uq.edu.au is nearly complete. Service from this address should begin within the next two weeks.

The database is currently being updated with the following genomes:

Another update, involving at least fifteen newer genomes, will be carried out in the next two weeks as well.


Changes to the EMU interface:

A few cosmetic changes have been made to Web service, reflecting the 15 000 km move from North America to Australia. In particular, note the change of Contact Details for website feedback.

The Available Genomes page now contains links to each of the genomes at NCBI. This will hopefully be helpful in identifying whether, e.g., Ambiguosum nomenarbitrarium str. ACK1 is the same as Paraambiguosum sp. conflatii.

January 4, 2006

299 genomes are currently included in our database.

The last update occurred on January 4, 2006, adding the recently completed genomes for:


Important notes reflecting recent changes to the NGIBWS code base:

We have modified the method for computing consensus gene names, the most common annotated name found amongst a gene's reciprocal best matches. We now weight contributions by strength, which appears to improve accuracy to some degree especially where paralogs evolve at different rates. A more detailed description of our auto-annotation system will be prepared soon; revisit this web site for an eventual link to that page.


Due to the overwhelming number of publicly available genomes, it was necessary to revise the NGIBWS database to reside largely on disk rather than in the server's main memory. Consequently, a few queries will take significantly longer to run, though most should complete almost as quickly as before. As always, for queries that do take a long time to complete, please be patient and please ensure that your browser is not set to time out.

The revised system also necessitated a slight modification to the method of computing reciprocal best matches. We feel that the new method is as good as, and probably better than, the previous method. You may notice a slight difference in the results of a query, compared to previously, especially where equidistant paralogs exist.

Finally, we corrected a bug affecting the gene family query for a few methanogenic Archaea. If ever you detect an unexpected result and believe that there is a bug in the system, you will be doing everyone a favour by reporting it to us, please!

NGIBWS has moved!
The NGIBWS code base will not be modified (with, e.g. new queries or query options) until after it is transferred to its new home. Web pages at NeuroGadgets Inc. will redirect you to the new web site when it becomes available, but will no longer serve NGIBWS directly. We expect the move to occur within the first half of 2006.


This page was last modified on June 4, 2007.

Copyright ©1998-2005 NeuroGadgets Inc. ©2006 University of Queensland

Back to our Home Page