Alumnus of Freie Universität Berlin – Michael Grünstäudl, PhD

Successful habilitation in botany and bioinformatics

Correcting tRNA annotations

How to call the anticodon?

Over the past few days I have been correcting genomic annotations using custom bash and Python code. One of the more interesting exercises has been the homogenization of the “product” tags for transfer RNAs, which provide information about the respective anticodon sequences.

In the majority of databases familiar to me, anticodons are sometimes indicated by their DNA  (e.g., transfer RNA-Ile (GAT)), sometimes by their RNA sequence (e.g., transfer RNA-Ile (GAU)). I have yet to see a rule as to which version ought to be used.

In order to homogenize the spelling, I wrote a few lines of bash code. Interestingly, this coding problem is not a sed one-liner, but requires some intricate awk command (please see this Stackoverflow discussion for details).

Here is the solution I eventually adopted:

echo -e "CompleteAssembly maker gene 1859 4482 . - . Name=trnK-UUU" > tmp

awk -v kw='trn'-v pos=5 'p=index($0, kw) {n=p+length(kw)+1; s=substr($0, n, pos); gsub(/U/, "T", s); $0=substr($0, 1, n-1) s substr($0, n+pos)} 1' tmp

CompleteAssembly maker gene 1859 4482 . – . Name=trnK-TTT

 

Der Beitrag wurde am Tuesday, den 29. September 2015 um 18:37 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Captcha
Refresh
Hilfe
Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.