Ramblings on an important topic
A series of software tools exist that allow users to conduct submissions of DNA sequences to NCBI GenBank, but file conversion represents a recurring challenge for those submissions. Similar to DNA sequence submissions to ENA, GenBank provides a wide range of options to upload annotated DNA sequences in a custom format, including an interactive submission portal, a web-based tool (BankIt) and two stand-alone software solutions (Sequin, tbl2asn), the latter of which operates on a command-line basis.
However, for several of these tools (e.g., BankIt, Sequin, tbl2asn), the user must provide tab-delimited data tables (“five-column tables”) in order to provide information regarding gene annotations. Five-column tables display a complex syntax and, thus, require considerable bioinformatic knowledge to generate. The methodological gap between GenBank-formatted flatfiles and Sequin-formatted submission files is partially bridged by the perl-script gbf2tbl.pl, which takes GenBank-formatted flatfiles as input, extracts the DNA sequence and saves it in FASTA-format, as well as extracts gene annotations saves them as five-column tables.
Upon conversion to five-column tables via gbf2tbl.pl, the data can be read by tbl2asn to create Sequin files for direct submission. However, gbf2tbl.pl has considerable limitations, not least in user-friendliness and ability to communicate problems to the user, which is why Lehwark and Greiner (2018) developed a web-based online tool to directly convert GenBank-formated flatfiles into Sequin-formatted submission files.