{"id":597,"date":"2018-04-24T20:46:55","date_gmt":"2018-04-24T18:46:55","guid":{"rendered":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/?p=597"},"modified":"2018-06-21T12:22:16","modified_gmt":"2018-06-21T10:22:16","slug":"batch-download-of-dna-sequences-from-ncbi","status":"publish","type":"post","link":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/2018\/04\/24\/batch-download-of-dna-sequences-from-ncbi\/","title":{"rendered":"Few-liner: Batch download of DNA sequences from NCBI"},"content":{"rendered":"<p><strong>The wonders of entrez<\/strong><\/p>\n<p>Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file <code>input.txt<\/code>.<\/p>\n<div style=\"background-color: #ffebdb\">\n<pre><code class=\"bash\">$ cat input.txt\r\n  Liriope_muscari_USACult,JX080424\r\n  Dracaena_adamii_IVORYCOAST,JX080436\r\n  ...\r\n<\/code><\/pre>\n<\/div>\n<p>Here is how I did it:<\/p>\n<div style=\"background-color: #ffebdb\">\n<pre><code class=\"bash\">$ INF=input.txt\r\n$ for line in $(cat $INF); do\r\n    SEQNAME=$(echo \"$line\" | awk -F',' '{print $1}')\r\n    ACCNUM=$(echo \"$line\" | awk -F',' '{print $2}')\r\n    FULLNAM=$(echo \"&gt;${SEQNAME}_${ACCNUM}\")\r\n    SEQ=$(esearch -db nucleotide -query \"$ACCNUM\" | efetch -format fasta | tail -n +2)\r\n    echo -e \"$FULLNAM\\n$SEQ\" &gt;&gt; out.txt\r\n  done\r\n<\/code><\/pre>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The wonders of entrez Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt. $ cat input.txt Liriope_muscari_USACult,JX080424 Dracaena_adamii_IVORYCOAST,JX080436 &#8230; Here is how I did it: $ INF=input.txt $ for line in $(cat $INF); do SEQNAME=$(echo &#8220;$line&#8221; [&hellip;]<\/p>\n","protected":false},"author":2306,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[57598,57600],"tags":[],"class_list":["post-597","post","type-post","status-publish","format-standard","hentry","category-bioinformatics","category-one-liners"],"_links":{"self":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/597","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/users\/2306"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/comments?post=597"}],"version-history":[{"count":5,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/597\/revisions"}],"predecessor-version":[{"id":637,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/597\/revisions\/637"}],"wp:attachment":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/media?parent=597"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/categories?post=597"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/tags?post=597"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}