A comparison of three programming languages for a full-fledged next-generation sequencing tool | BMC Bioinformatics
The sequence alignment/map format (SAM/BAM)  is the de facto standard in the bioinformatics community for storing mapped sequencing data. There exists a large body of work on tools for processing SAM/BAM files for analysis [1–15]. The SAMtools , Picard , and Genome Analysis Toolkit (GATK)  software packages developed by the Broad and Sanger institutes are considered to be reference implementations for many operations on SAM/BAM files, examples of which include sorting reads, marking polymerase chain reaction (PCR) and optical duplicates, recalibrating base quality scores, indel realignment, and various filtering options, which typically precede variant calling. Many alternative software packages [4–10, 12, 14, 15] focus on optimizing the computations of these operations, either by providing alternative algorithms, or by using parallelization, distribution, or other optimization techniques specific to their implementation language, which is often C, C++, or Java.
We have developed elPrep [8, 16], an open-source, multi-threaded framework for