GENALICE MAP, Next-Generation Sequencing (NGS) secondary data analysis solution, has been further upgraded to a new version 2.3.0. This release holds a set of revolutionary improvements to further enhance its NGS data analysis solution.
GENALICE MAP V2.3.0 produces up to 25% less false negative and up to 50% less false positive variants in benchmark studies compared to V2.2.0.
Advanced Population Calling module
In October 2015, GENALICE launched its Population Calling Module by demonstrating that it can call variants on 800 mapped NGS data samples in less than one hour using 100 compute nodes on Amazon Web Services. With GENALICE MAP V2.3.0 the Population Calling module has matured into a stable tool. Its hallmarks are high speed data processing, incremental addition of single samples to a population, linear scalability and improved variant detection accuracy due to Consensus Based Call Enhancement. The called variants are stored in a GENALICE Variant Map (GVM), which enables fast and easy data management.
Embedded quality control statistics (all-in-one GAR file)
GENALICE development team has added a high performance quality control tool to the MAP features. The quality control feature reports sequence read mapping quality statistics in the GENALICE Aligned Reads (GAR) file. These statistics are instantly available and include: Coverage depth, Estimated template insert size, Binned base call qualities, Mapping results and PCR duplicate counts. Due to a clever design, its performance is unparalleled compared to other (open source) tools that produce similar quality statistics, such as the Picard suite.
Variant detection with automatic coverage depth recognition
GENALICE MAP V2.3.0 supports automatic coverage depth recognition during variant detection. This functionality of the software reads coverage depth information from the GAR file prior to variant calling. It automatically determines the average coverage depth of the sample and uses a minimal-maximal coverage depth range based on this average. This strategy filters genome positions with extremely low or high coverage depth, which are prone to false variant calls. Automatic coverage depth recognition has clear benefits over the ‘fixed parameter’ method used in earlier versions of GENALICE MAP. A user advantage the automated filter condition optimization for data samples with distinct coverage depths. Besides, automatic coverage depth recognition driven filtering improves the accuracy of variant detection.
Precise repeat area handling
GENALICE has improved the mapping and variant calling of INDELs. As a consequence, the number of false variant calls is strongly reduced in benchmark tests where variants detected by GENALICE MAP are compared to ‘truth’ variants (e.g. NIST/GIAB NA12878 ‘truth’ variants). In addtion, this consolidation is indirectly benefecial for the detection of SNPs as the benchmark assays also demonstrate less false SNP calls.
Simplified reference index creation, HTTP and AWS S3 file system support
The latest version of GENALICE MAP has got a further revised and simplified reference index creation. As of MAP 2.3.0 it is possible to generate a reference index directly into RAM memory, which can be beneficial when storage space is too costly or too limited to store the comprehensive GENALICE Reference Index with a large storage footprint. Furthermore, MAP 2.3.0 allows files to stream from HTTP server to the mapper directly, enabling an easier to configure data stream and deployment in cloud environments such as AWS S3 file systems.
Higher speed and quality gains
MAP V2.3.0 shows significant performance and accuracy improvements over V2.2.0. Below is a quick comparison of the GENALICE MAP V2.2.0 and V2.3.0 performance on NA12878 from Illumina’s Platinum Genomes with 50x Median Coverage Depth.