Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

BlastNg Away – Part 1

A post without spello in the title

Today, I attempted to blast entire chloroplast genomes against NCBI’s nucleotide database via the BLASTn command-line tool. Since typical plastomes are between 150,000 and 160,000 bp in length, BLASTn searches that are conducted remotely take approximately 20 min. on average.

time blastn -db nt -query myinputseq.fasta -remote -out results.txt

real 21m25.189s
user 0m0.070s
sys 0m0.010s

Can we speed up such searches by splitting the input query sequence into equally-sized, smaller pieces and blasting each piece separately?

# Splitting input query sequence into ten equally-sized, smaller pieces

INF=myinputseq.fasta
split -d -b $(bc <<< $(tail -n1 $INF | wc -c)/10) $INF prt

# Blasting each region against NCBI’s nucleotide database

for i in $(ls prt*); do
echo $i >> results.txt;
time blastn -db nt -query $i -remote -outfmt '7 length pident sscinames' -max_target_seqs 10 -out $i.result >> time.txt;
rm $i;
done


real 0m15.161s
user 0m0.060s
sys 0m0.010s


real 0m28.901s
user 0m0.060s
sys 0m0.013s


real 0m27.328s
user 0m0.057s
sys 0m0.013s


real 0m14.909s
user 0m0.067s
sys 0m0.010s


real 0m13.689s
user 0m0.043s
sys 0m0.023s
etc.

 

Yes, we can! (However, I bet that the above scenario was a lucky incidence, and that the time improvement caused by reducing the query sequence size is not always that pronounced).

 

Der Beitrag wurde am Thursday, den 3. September 2015 um 18:02 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Captcha
Refresh
Hilfe
Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.