About GNPAnnot

GNPAnnot is a project on green genomics which intends to develop a community system of structural and functional annotation supported by comparative genomics and dedicated to plant, insect and fungus genomes allowing both automatic predictions and manual curations of genomic objects.

Crops are the major source of food. Up to 30% of crop yield is lost due to pests and pathogens. In the era of high throughput technology, the number of plant and bioaggressor genomes sequenced partially or completely increases exponentially. Gene annotation is a crucial step in nucleic sequence analysis which needs to define gene structure, protein function and homology relationships. While variations exist between eukaryotic species, generally genome annotation is a hard task for many reasons: large genomes, polyploidy, high proportion of non coding DNA and repeats, mosaic gene structure (intron / exon). Gene finder softwares are improving but they cannot replace manual annotation. Systems have been developed to help genomic data curation, but none of them satisfy all following requirements:

  • manual structural, functional and comparative annotation of eukaryotic genes
  • database, analysis pipeline and annotator Web interfaces
  • genericity, modularity, portability, sustainability and compatibility.

CIRAD, INRA and Bioversity uses to work together, and they propose to mutualize theirs efforts to develop and use a Community Annotation System (CAS) which allows users to browse, compare and annotate genomes. The work is divided into five work packages:

  • database and flow management
  • annotator interface implementation
  • interoperability with other systems
  • sequence exploitation and platform release
  • manual annotation and platform validation.

The core of the GNPAnnot CAS is based on GMOD components: the Chado database / the GBrowse / the Apollo editor and on components compatible with Chado as Artemis. The system should also enable to browse comparative genomics results, to build queries and to export the results in various formats. It should allow the annotation reconciliation, history, integrity, consistency and update and the management of public and private projects.

This CAS is exploited on well identified sequencing projects studying genome structure and evolution and / or researching genes of agronomical interest. Four CAS are released on three sites:

  1. monocots (CIRAD, Joint Research Unit Plant Development and Genetic Improvement Montpellier, France),
  2. insects (INRA, Joint Research Unit Biology of Organisms and Populations for Plant Protection, Rennes, France),
  3. fungi (INRA, Joint Research Unit BIOlogie et GEstion des Risques en agriculture - Champignon Pathogènes des Plantes, Versailles, France) and
  4. wheat / grapevine (INRA, Research Unit in Genomics and Bioinformatics, Versailles).