{"id":784,"date":"2019-08-20T17:14:14","date_gmt":"2019-08-20T15:14:14","guid":{"rendered":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/?p=784"},"modified":"2019-08-20T17:15:34","modified_gmt":"2019-08-20T15:15:34","slug":"dna-sequence-submissions-to-genbank","status":"publish","type":"post","link":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/2019\/08\/20\/dna-sequence-submissions-to-genbank\/","title":{"rendered":"On the issue of file formats during DNA sequence submissions to GenBank"},"content":{"rendered":"<p align=\"left\"><strong>Ramblings on an important topic<\/strong><\/p>\n<p align=\"left\">A series of software tools exist that allow users to conduct submissions of DNA sequences to <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/genbank\" target=\"_blank\" rel=\"noopener\">NCBI GenBank<\/a>, but file conversion represents a recurring challenge for those submissions. Similar to DNA sequence submissions to <a href=\"https:\/\/www.ebi.ac.uk\/ena\" target=\"_blank\" rel=\"noopener\">ENA<\/a>, GenBank provides a wide range of options to upload annotated DNA sequences in a custom format, including an <a href=\"https:\/\/submit.ncbi.nlm.nih.gov\/\" target=\"_blank\" rel=\"noopener\">interactive submission portal<\/a>, a web-based tool (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/WebSub\/?tool=genbank\" target=\"_blank\" rel=\"noopener\">BankIt<\/a>) and two stand-alone software solutions (<a href=\"https:\/\/www.ncbi.nlm.nih.gov\/Sequin\/\" target=\"_blank\" rel=\"noopener\">Sequin<\/a>, <a href=\"https:\/\/www.ncbi.nlm.nih.gov\/genbank\/tbl2asn2\/\" target=\"_blank\" rel=\"noopener\">tbl2asn<\/a>), the latter of which operates on a command-line basis.<\/p>\n<p align=\"left\">However, for several of these tools (e.g., BankIt, Sequin, tbl2asn), the user must provide <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/Sequin\/table.html\" target=\"_blank\" rel=\"noopener\">tab-delimited data tables<\/a> (\u201cfive-column tables\u201d) in order to provide information regarding gene annotations. Five-column tables display a complex syntax and, thus, require considerable bioinformatic knowledge to generate. The methodological gap between GenBank-formatted flatfiles and Sequin-formatted submission files is partially bridged by the perl-script <a href=\"ftp:\/\/ftp.ncbi.nlm.nih.gov\/toolbox\/ncbi_tools\/converters\/scripts\/gbf2tbl.pl\" target=\"_blank\" rel=\"noopener\">gbf2tbl.pl<\/a>, which takes GenBank-formatted flatfiles as input, extracts the DNA sequence and saves it in FASTA-format, as well as extracts gene annotations saves them as five-column tables.<\/p>\n<p align=\"left\">Upon conversion to five-column tables via gbf2tbl.pl, the data can be read by tbl2asn to create Sequin files for direct submission. However, gbf2tbl.pl has considerable limitations, not least in user-friendliness and ability to communicate problems to the user, which is why <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0888754318301897?via%3Dihub\" target=\"_blank\" rel=\"noopener\">Lehwark and Greiner (2018)<\/a> developed a web-based online tool to directly convert GenBank-formated flatfiles into Sequin-formatted submission files.<\/p>\n<p align=\"left\">\n","protected":false},"excerpt":{"rendered":"<p>Ramblings on an important topic A series of software tools exist that allow users to conduct submissions of DNA sequences to NCBI GenBank, but file conversion represents a recurring challenge for those submissions. Similar to DNA sequence submissions to ENA, GenBank provides a wide range of options to upload annotated DNA sequences in a custom [&hellip;]<\/p>\n","protected":false},"author":2306,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[57598,84691],"tags":[],"class_list":["post-784","post","type-post","status-publish","format-standard","hentry","category-bioinformatics","category-writing"],"_links":{"self":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/784","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/users\/2306"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/comments?post=784"}],"version-history":[{"count":3,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/784\/revisions"}],"predecessor-version":[{"id":787,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/784\/revisions\/787"}],"wp:attachment":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/media?parent=784"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/categories?post=784"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/tags?post=784"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}