High numbers do not equate with high quality
Over the past 10 years, the number of complete plastid genome sequences available on NCBI GenBank has skyrocketed, especially for flowering plants. Whereas in December 2009, approximately 120 complete plastid genomes of flowering plants were present on GenBank, there are 5,838 complete plastid genomes of flowering plants available at the end of November 2019. That is an increase of 4,765% in 10 years (or roughly a quinquaginta-nupling of all plastid genomes available since 2009 [yes, I had Latin in school]).
However, it appears that the annotation quality of many of the genomes submitted to GenBank over the past decade remained less than stellar during this increase. The inverted repeat (IR) regions of a plastid genome are integral parts of virtually any flowering plant plastome, and their presence in the sequence annotations are typically a good proxy for the annotation quality. While in December 2009, approximately 95 (or approx. 80%) of the available plastid genomes of flowering plants did not contain unambiguous annotations for the IRs, that number has increased to 3,190 (or approx. 55%) at the end of November 2019.
In short, despite a dramatic increase in complete plastid genomes of flowering plants available on GenBank, more than half of all those genomes still do not have appropriate sequence annotations of the IR regions of the plastid genome. This is quite concerning, not least because IR annotations are very simple to include in a genome file (see figure). Clearly, improved strategies for annotating (or correcting) complete plastid genomes are necessary.