23. September 2018 von Michael Grünstäudl
Talking about novel bioinformatic tools for DNA sequence submissions
This Thursday, I held a conference talk at the 24th International Symposium on Biodiversity and Evolutionary Biology of the German Botanical Society. I introduced the participants to some of my newly developed tools for streamlining and automating the submssion of plant DNA barcoding sequences to public sequence repositories. This conference was a wonderful example of how small conferences can both meet high scientific standards and be an enjoyable reprieve for the participants. Lots of interested talks and a great social programme among the gorgeous Carinthian scenery!
Talk at DBG Sektionstagung 2018 in Klagenfurt, Kaertnen
Kategorie Allgemein | 0 Kommentare »
21. June 2018 von Michael Grünstäudl
In science, standardization and repeatability is a must.
Together with two other scientists, I just published a new paper on bioinformatic workflows for generating complete plastid genome sequences in the context of plastid phylogenomics of the water-lily clade. We demonstrate that standardization and repeatability are essential elements for modern plant phylogenomics and how such standardization and repeatability can be achieved efficiently during plastid genome assembly, annotation and alignment.
Kategorie bioinformatics, scientific papers | 0 Kommentare »
15. June 2018 von Michael Grünstäudl
Quick, split it!
There are one-liners that never get old. Here is another one of them.
$ csplit multisequence.fasta /\>/ {*} &&
find . -size 0 -print0 |xargs -0 rm –
Kategorie Allgemein | 0 Kommentare »
11. June 2018 von Michael Grünstäudl
Teaching in spring 2018
Kategorie Allgemein | 0 Kommentare »
7. May 2018 von Michael Grünstäudl
Quick, de-interleave it!
There are one-liners that never get old. This is one of them.
$ perl -MBio::SeqIO -e
'my $seqin = Bio::SeqIO->
new(-fh => \*STDIN, -format => 'fasta');
while (my $seq = $seqin-> next_seq)
{ print ">", $seq-> id, "\n", $seq-> seq, "\n"; }'
< interleaved.fasta > deinterleaved.fasta
Update 28-Jan-2019:
Over time, I came to find working with BioPerl uncomfortable, as its clean installation is just not well-supported on Linux. Thus, I have found myself relying on this method more and more, assuming that the line breaks of the input file are LF (and not CRLF):
$ INF=interleaved.fasta
awk '/^>/ {printf("\n%s\n",$0);next; } \
{ printf("%s",$0);} END {printf("\n");}' \
< $INF | tail -n +2 \
> ${INF%.fasta*}_deint.fasta
Kategorie bioinformatics, one-liners | 0 Kommentare »
24. April 2018 von Michael Grünstäudl
The wonders of entrez
Today I found myself in need of a script to download dozens of DNA sequences submitted to NCBI Nucleotide. The sequences in questeion were stores in file input.txt
.
$ cat input.txt
Liriope_muscari_USACult,JX080424
Dracaena_adamii_IVORYCOAST,JX080436
...
Here is how I did it:
$ INF=input.txt
$ for line in $(cat $INF); do
SEQNAME=$(echo "$line" | awk -F',' '{print $1}')
ACCNUM=$(echo "$line" | awk -F',' '{print $2}')
FULLNAM=$(echo ">${SEQNAME}_${ACCNUM}")
SEQ=$(esearch -db nucleotide -query "$ACCNUM" | efetch -format fasta | tail -n +2)
echo -e "$FULLNAM\n$SEQ" >> out.txt
done
Kategorie bioinformatics, one-liners | 0 Kommentare »
5. April 2018 von Michael Grünstäudl
An improved few-liner to keep the data compressed
If you wish to recusively loop through a folder and its nested subfolders and automatically gzip all files greater than 1 GB, the following few-liner is for you:
for file in $(LANG=C find . -size +1G -type f -print); do
if [[ ! $file == *.gzip ]]; then
gzip $file
fi;
done
Kategorie bioinformatics, one-liners | 0 Kommentare »
30. March 2018 von Michael Grünstäudl
A short one-liner to keep the data compressed
One of the bash one-liners that I use after every successful project, yet never remember when needed, is for the simple task of looping through your folders, tar-zipping them and then removing the original folders.
for i in $(ls -d */); do
tar czf ${i%%/}.tar.gz $i && rm -r $i;
done
Kategorie bioinformatics, one-liners | 0 Kommentare »
25. March 2018 von Michael Grünstäudl
Galanthus nivalis (Amaryllidaceae) in Berlin in March 2018
Kategorie audiovisual | 0 Kommentare »
14. February 2018 von Michael Grünstäudl
Talking about efficient data partitioning strategies
Today, I held a workshop at the 19th annual meeting of the Gesellschaft für Biologische Systematik (GfBS). I introduced the participants to computational strategies to automate the selection of data partitioning schemes and nucleotide substitution models for phylogenomic datasets.
Workshop at the GfBS 2018
Kategorie Allgemein | 0 Kommentare »