Abstract

MOTIVATION: Advances in sequencing technologies have enabled researchers to sequence whole genomes rapidly and cheaply. However, despite improvements in genome assembly, structural genome annotation (i.e. the identification of protein-coding genes) remains challenging, particularly for eukaryotic genomes. It requires using several approaches (typically ab initio, transcriptomics, and homology search), which may give substantially different results. Deciding which gene models to retain in a consensus is far from trivial, and automated approaches tend to lag behind laborious manual curation efforts in accuracy.

RESULTS: We present OMAnnotator, a novel approach to building a consensus annotation. OMAnnotator repurposes the OMA algorithm, originally designed to elucidate evolutionary relationships among genes across species, to integrate predictions from different annotation sources into a consensus annotation, using evolutionary information as a tie-breaker. During benchmarking on the Drosophila melanogaster reference, OMAnnotator's consensus improved upon its source annotations and two state-of-the-art pipelines used as annotation combiners with the same inputs. When applied to three recently published genomes, OMAnnotator gave substantial improvements in two cases, and mixed results in the third, which had already benefitted from extensive expert curation. This underlines the method's effectiveness and robustness for combining the results of disagreeing annotation softwares, strengthening the toolkit for eukaryotic genome annotation.

AVAILABILITY AND IMPLEMENTATION: OMAnnotator is available on GitHub (https://github.com/DessimozLab/OMAnnotator).