The wonders of entrez
Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt
.
$ cat input.txt
Liriope_muscari_USACult,JX080424
Dracaena_adamii_IVORYCOAST,JX080436
...
Here is how I did it:
$ INF=input.txt
$ for line in $(cat $INF); do
SEQNAME=$(echo "$line" | awk -F',' '{print $1}')
ACCNUM=$(echo "$line" | awk -F',' '{print $2}')
FULLNAM=$(echo ">${SEQNAME}_${ACCNUM}")
SEQ=$(esearch -db nucleotide -query "$ACCNUM" | efetch -format fasta | tail -n +2)
echo -e "$FULLNAM\n$SEQ" >> out.txt
done