Michael Grünstäudl (Gruenstaeudl), PhD

Postdoctoral Researcher at the Freie Universität Berlin

CompoundLocations in Biopython

Compound but not complex

The Biopython manual informs the alert reader that ‘join’ locations of EMBL/GenBank files can be handled by CompoundLocation objects. This class of objects is a special object class in Biopython and very straight forward to operate.

Assume, for example, the following DNA sequence:

>>> from Bio.Seq import Seq
>>> s

This example sequence contains three exons (each 3-bp long), which are flanked by 2-bp long spacers that have the sequence “AA”. Taken together (i.e., ‘joined’), they translate to the following protein consisting of two aminoacids: “M I *” (where the asterisk indicates a stop codon). How can I extract the exons from the above sequence?

First, you set up three FeatureLocation objects:

>>> from Bio.SeqFeature import FeatureLocation, CompoundLocation
>>> f1 = Bio.SeqFeature.FeatureLocation(2,5)
>>> f2 = Bio.SeqFeature.FeatureLocation(7,10)
>>> f3 = Bio.SeqFeature.FeatureLocation(12,15)
>>> f1
FeatureLocation(ExactPosition(2), ExactPosition(5))

Second, you convert the FeatureLocation objects to a CompoundLocation object:

>>> f = CompoundLocation([f1,f2,f3])
>>> f
CompoundLocation([FeatureLocation(ExactPosition(2), ExactPosition(5)), FeatureLocation(ExactPosition(7), ExactPosition(10)), FeatureLocation(ExactPosition(12), ExactPosition(15))], 'join')

Third, you extract the exons from the sequence via the CompoundLocation object:

>>> s2 = f.extract(s)
>>> s2
Seq('ATGATCTAA', Alphabet())

Finally, you translate the extracted DNA sequence:

>>> s2.translate()
Seq('MI*', HasStopCodon(ExtendedIUPACProtein(), '*'))


Der Beitrag wurde am Monday, den 13. June 2016 um 14:09 Uhr von Michael Grünstäudl veröffentlicht und wurde unter bioinformatics abgelegt. Sie können die Kommentare zu diesem Eintrag durch den RSS 2.0 Feed verfolgen. Sie können einen Kommentar schreiben, oder einen Trackback auf Ihrer Seite einrichten.

Leave a Reply

Hinweis / Hint
Das Captcha kann Kleinbuchstaben, Ziffern und die Sonderzeichzeichen »?!#%&« enthalten.
The captcha could contain lower case, numeric characters and special characters as »!#%&«.