Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

Identifying oligonucleotide primers via the commandline

Priming for success

It has been several years since I last developed a pair of customized oligonucleotide primers for DNA sequencing. At that time I tended to operate software via GUIs, not knowing that an entire toolkit of commandline tools exists, which can get the job done faster and more efficiently. Today I have an opportunity to explore this toolkit.

Assume that you have (a) a tab-delimited database of oligonucleotide primers (e.g., primer_database.tsv), and (b) a fasta-formatted DNA alignment of a target organism (e.g., target_alignment.fas). Your primer database contains hundreds of primers, each occupying one line and followed (on the same line) by information regarding the target gene (e.g., matK) or the strand orientation (e.g., R for reverse). How can you identify those primers within your database that will likely bind to one or more of the target organism?

First, you extract your primers of interest via grep and awk.

cat primer_database.tsv | grep matK | grep $'\tF\t'
| awk -F '\t' '{print $2, $3}' > matK_forw.tsv

Second, you loop through your primers of interest and match them to the target alignment via fqgrep, allowing a specified number of mismatches.

for i in $(cat matK_forw.tsv | awk '{print $2}'); do fqgrep -m 1
-p $i -r target_alignment.fas >> forward_oneMismatch.txt; done

Third, you reduce the resulting list of matches so that extraneous information is filtered out and all matches are combined into a single text file (e.g., combined.txt).

for i in $(ls *Mismatch.txt); do echo $i >> combined.txt; cat $i
| awk -F '\t' '{print $1, $3, $7, $8, $9}' >> combined.txt;
echo "\n" >> combined.txt; done

You now have a list of primers that match the target sequences and can evaluate other factors relevant to the design of oligonucleotide primers, such as amplicon length, melting and annealing temperatures, and hairpin and dimer formation.

P.S. Special thanks goes to my colleague Nadja K. for reminding me of relevant issues involved in primer design.


Der Beitrag wurde am Wednesday, den 17. June 2015 um 13:37 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics, lab work abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.