{"id":351,"date":"2016-06-13T14:09:49","date_gmt":"2016-06-13T12:09:49","guid":{"rendered":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/?p=351"},"modified":"2016-06-13T14:37:12","modified_gmt":"2016-06-13T12:37:12","slug":"compoundlocations-in-biopython","status":"publish","type":"post","link":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/2016\/06\/13\/compoundlocations-in-biopython\/","title":{"rendered":"CompoundLocations in Biopython"},"content":{"rendered":"<p><strong>Compound but not complex<\/strong><\/p>\n<p>The Biopython manual informs the alert reader that &#8216;join&#8217; locations of EMBL\/GenBank files can be handled by <em>CompoundLocation<\/em> objects. This class of objects is a special object class in Biopython and very straight forward to operate.<\/p>\n<p>Assume, for example, the following DNA sequence:<\/p>\n<div style=\"background-color: #ffebdb\">\n<p style=\"padding-left: 30px\"><code>&gt;&gt;&gt; from Bio.Seq import Seq<br \/>\n&gt;&gt;&gt; s = Seq(\"AAATGAAATCAATAAAA\")<br \/>\n&gt;&gt;&gt; s<br \/>\nSeq('AAATGAAATCAATAAAA', Alphabet())<br \/>\n<\/code><\/p>\n<\/div>\n<p>This example sequence contains three exons (each 3-bp long), which are flanked by 2-bp long spacers that have the sequence &#8220;AA&#8221;. Taken together (i.e., &#8216;joined&#8217;), they translate to the following protein consisting of two aminoacids: &#8220;M I *&#8221; (where the asterisk indicates a stop codon). How can I extract the exons from the above sequence?<\/p>\n<p>First, you set up three <em>FeatureLocation<\/em> objects:<\/p>\n<div style=\"background-color: #ffebdb\">\n<p style=\"padding-left: 30px\"><code>&gt;&gt;&gt; from Bio.SeqFeature import FeatureLocation, CompoundLocation<br \/>\n&gt;&gt;&gt; f1 = Bio.SeqFeature.FeatureLocation(2,5)<br \/>\n&gt;&gt;&gt; f2 = Bio.SeqFeature.FeatureLocation(7,10)<br \/>\n&gt;&gt;&gt; f3 = Bio.SeqFeature.FeatureLocation(12,15)<br \/>\n&gt;&gt;&gt; f1<br \/>\nFeatureLocation(ExactPosition(2), ExactPosition(5))<br \/>\n<\/code><\/p>\n<\/div>\n<p>Second, you convert the <em>FeatureLocation<\/em> objects to a <em>CompoundLocation<\/em> object:<\/p>\n<div style=\"background-color: #ffebdb\">\n<p style=\"padding-left: 30px\"><code>&gt;&gt;&gt; f = CompoundLocation([f1,f2,f3])<br \/>\n&gt;&gt;&gt; f<br \/>\nCompoundLocation([FeatureLocation(ExactPosition(2), ExactPosition(5)), FeatureLocation(ExactPosition(7), ExactPosition(10)), FeatureLocation(ExactPosition(12), ExactPosition(15))], 'join')<br \/>\n<\/code><\/p>\n<\/div>\n<p>Third, you extract the exons from the sequence via the <em>CompoundLocation<\/em> object:<\/p>\n<div style=\"background-color: #ffebdb\">\n<p style=\"padding-left: 30px\"><code>&gt;&gt;&gt; s2 = f.extract(s)<br \/>\n&gt;&gt;&gt; s2<br \/>\nSeq('ATGATCTAA', Alphabet())<br \/>\n<\/code><\/p>\n<\/div>\n<p>Finally, you translate the extracted DNA sequence:<\/p>\n<div style=\"background-color: #ffebdb\">\n<p style=\"padding-left: 30px\"><code>&gt;&gt;&gt; s2.translate()<br \/>\nSeq('MI*', HasStopCodon(ExtendedIUPACProtein(), '*'))<br \/>\n<\/code><\/p>\n<\/div>\n<p>QED.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Compound but not complex The Biopython manual informs the alert reader that &#8216;join&#8217; locations of EMBL\/GenBank files can be handled by CompoundLocation objects. This class of objects is a special object class in Biopython and very straight forward to operate. Assume, for example, the following DNA sequence: &gt;&gt;&gt; from Bio.Seq import Seq &gt;&gt;&gt; s = [&hellip;]<\/p>\n","protected":false},"author":2306,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[57598],"tags":[],"class_list":["post-351","post","type-post","status-publish","format-standard","hentry","category-bioinformatics"],"_links":{"self":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/351","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/users\/2306"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/comments?post=351"}],"version-history":[{"count":9,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/351\/revisions"}],"predecessor-version":[{"id":360,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/posts\/351\/revisions\/360"}],"wp:attachment":[{"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/media?parent=351"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/categories?post=351"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.fu-berlin.de\/gruenstaeudl\/wp-json\/wp\/v2\/tags?post=351"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}