Alumnus of Freie Universität Berlin – Michael Grünstäudl, PhD

Successful habilitation in botany and bioinformatics

On the issue of file formats during DNA sequence submissions to GenBank

Ramblings on an important topic

A series of software tools exist that allow users to conduct submissions of DNA sequences to NCBI GenBank, but file conversion represents a recurring challenge for those submissions. Similar to DNA sequence submissions to ENA, GenBank provides a wide range of options to upload annotated DNA sequences in a custom format, including an interactive submission portal, a web-based tool (BankIt) and two stand-alone software solutions (Sequin, tbl2asn), the latter of which operates on a command-line basis.

However, for several of these tools (e.g., BankIt, Sequin, tbl2asn), the user must provide tab-delimited data tables (“five-column tables”) in order to provide information regarding gene annotations. Five-column tables display a complex syntax and, thus, require considerable bioinformatic knowledge to generate. The methodological gap between GenBank-formatted flatfiles and Sequin-formatted submission files is partially bridged by the perl-script gbf2tbl.pl, which takes GenBank-formatted flatfiles as input, extracts the DNA sequence and saves it in FASTA-format, as well as extracts gene annotations saves them as five-column tables.

Upon conversion to five-column tables via gbf2tbl.pl, the data can be read by tbl2asn to create Sequin files for direct submission. However, gbf2tbl.pl has considerable limitations, not least in user-friendliness and ability to communicate problems to the user, which is why Lehwark and Greiner (2018) developed a web-based online tool to directly convert GenBank-formated flatfiles into Sequin-formatted submission files.

Der Beitrag wurde am Tuesday, den 20. August 2019 um 17:14 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics, writing abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Captcha
Refresh
Hilfe
Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.