NGS Data Analysis
No compromises. No choices. This was the challenging starting point when designing the new algorithms and turnkey configurations for our ultra-fast Next-Generation Sequencing (NGS) data processing and analysis software solution: GENALICE MAP.
MAP transforms NGS workflows into ultra-fast, simple and cost-effective processes. Our groundbreaking solution is also suitable for application in large scale, e.g. Whole Genome Sequencing (WGS), Transcriptome (WTS) or Exome (WES) projects.
MAP runs on commodity Intel Xeon E5 processors and is delivered on a turnkey (virtual) appliance, the GENALICE VAULT series, which also contains real-time monitoring, workflow management software and an embedded Oracle database.
MAP also runs as an application
on Amazon (AWS).
GENALICE MAP has been thoroughly validated using various (benchmark) data sets in close collaboration with the world’s leading university medical centers and genomic organizations.
The performance data in head-to-head comparisons against some of the most widely-used alternative NGS data analysis workflows will be described below.
The hardware configurations, input data and genome reference files used in these comparison studies were kept exactly the same for both workflows.
- GENALICE LINK is a unique and open correlation platform integrating molecular data and diagnostic data from different sources. LINK is an enabler of accelerated translational molecular research and biomarker discovery studies in particular.
- GENALICE CHECK is a Clinical Decision Support (CDS) software solution that supports treating physicians in their daily clinical decision-making process.
More about GENALICE MAP
Release notes: GENALICE MAP (version 2.2.0)
The performance of GENALICE MAP is driven by its proprietary newly designed algorithms, which are optimally geared towards modern hardware architecture.
As a result, MAP processes (short read mapping and variant calling) NGS short reads up to over 150 times faster than widely-used conventional workflows. The exact speed gains in comparison to BWA-MEM and three different variant callers are displayed in the graph on the right.
On a simple Intel Xeon E5 2620 dual processor, the alignment of a complete human genome with 37x coverage takes around 25 minutes and variant calling takes just 5 minutes.
In head-to-head comparisons between GENALICE MAP and BWA-MEM in combination with three different variant callers; GATK HC, Platypus, and VarScan, on the platinum genome NA12878, it was shown that MAP outperforms the latter two on the F1scores. The F1 score includes all variants (SNP and INDELs) and is a weighted average of sensitivity and precision.
An even more significant quality improvement can be seen on INDEL detection. GENALICE MAP can align reads with “infinite” gap sizes at high speed due to its architecture, algorithms, and comprehensive reference index. This enables it to detect longer INDELs with higher efficiency than other NGS workflows. MAP detects more long deletions than any of the other three NGS workflows.
GENALICE MAP allows researchers to generate a small footprint GAR (GENALICE Aligned Reads) file to replace BAM/SAM files produced in conventional workflows.
The size of a GAR file generated after alignment of a complete human genome 37x is only 5GB and thus offers a twenty-fold storage reduction. The file is ideal for ultra-fast variant calling afterwards and can be converted, if needed, to a conventional SAM/ BAM file in near-time.
This significant file size reduction is achieved through a smart combination of reference backed encoding, base quality binning, and best quality reference scoring.
Head to head comparison
Using the popular benchmark Comparison & Analytic Testing (GCAT) tool from bioplanet.com, we performed two head-to-head comparisons with widely used conventional workflows for NGS read alignment and variant calling. The GCAT report offers you an objective comparison between GENALICE MAP and other widely used NGS data preprocessing pipelines on the quality related elements.
The variant calls in these reports have been analyzed across a series of metrics including comparisons to the Genome in a Bottle (GIAB) call set, consistency with genotyping array data, and concordance with other variant callers.
GENALICE MAP detects SNPs and INDELs with a high sensitivity and precision rate, using the public benchmark high confidence SNP and INDEL call set (NIST v2.18) from the GIAB consortium. In both analyses, the exome datasets with 30 and 150 times coverage, MAP outperformed the different conventional workflows.
A significant difference can be observed in the 30x set, indicating a much higher sensitivity, while precision remains high, in areas with low coverage. In other words, GENALICE MAP not only offers a high speed and low cost NGS data analysis solution, it also warrants improved outcome data.
Ease of use
GENALICE MAP has a short and simple workflow, consisting of only two processing steps: alignment and variant calling. Variant calling includes duplicate marking, INDEL realignment, genotyping, and filtering in a single processing step.
Conversely the BWA-MEM/GATK workflow is lengthy, complex, and multi-staged; involving several software packages. The aligned reads are stored in the large storage footprint (~100GB) BAM format.
In addition, aligned reads must be prepared, which involves sorting, indexing, and duplicate marking, prior to variant calling. Each preparation step generates another BAM file. Moreover, variant calling in GATK requires two independent processing stages: genotyping and filtering.
GENALICE is committed to improve the complete NGS data analysis workflow for as many different analysis jobs.
Our Production Box consists of two key components, which are required before any type of downstream analysis: mapping and calling.
The solution is built on general purpose hardware and also equipped with a GAR toolbox for conversion to SAM/BAM files, a GAR-API, real-time monitoring software, and 5TB of local storage.
This turnkey appliance, the GENALICE VAULT series, is available in different configurations for onsite and on cloud (AWS) usage. Modular expansion of GENALICE MAP is in progress, to make it a complete NGS data Analysis Suite.
The following modules are already available or under development:
- Population Calling (PC)
- RNA-Seq data alignment and quantification
- Structural Variation analysis (SV)
- FASTA generation and injection (FASTA)
- Cohort Builder (CB)
- Somatic Calling, tumor/normal subtraction (SC)
GENALICE MAP generates major gains in cost-effectiveness. Its substantial processing speed increase allows users to perform NGS data processing and analysis jobs on significantly smaller hardware configurations and in far less time.
MAP allows users to produce a small footprint GAR file to replace the BAM file. Consequently, significant storage cost reductions can be achieved. In addition, this file size reduction allows organizations to replace the existing physical shipment of hard disks containing BAM files with secure network transfer.
This sums up to significant cost-savings, and reduced project risks. Lastly, better accuracy results in more cost-effective drug development, improved diagnosis and better treatment.
Downloads and videos of GENALICE MAP
Product Supporting Materials:
- GENALICE MAP product brochure
- GENALICE MAP cost-effectiveness flyer: A4 version or US letter version
- GENALICE MAP product specification sheet: A4 version or US letter version
- GENALICE MAP validation sheet: A4 version or US letter version
- GENALICE MAP deployment options: A4 version or US letter version
- Infographic: GENALICE MAP in a nutshell
- Infographic: Population Calling module
- GENALICE MAP white papers:
- An Introduction to a New High Performance Read Alignment Solution
- Insertion and Deletion Detection Goes at Great Length
- GENALICE MAP on Intel a Perfect Match
- Variant Calling in a Matter of Minutes
- Taking Definitive Care of the NGS Big Data Deluge
- NA12878 Platinum Genome Analysis Report
- Efficient and Accurate Population Genomics
- GENALICE MAP GCAT benchmark report
- Unofficial world record setting event
- PAG 2016 workshop: Bringing plant genomics to the next level
- ASHG 2015 poster presentation: Efficient and accurate population genomics (Bas Tolhuis, Linda Baarspul & Hans Karten)
- PAG 2014 poster presentation: Ultra-fast, accurate and cost-effective NGS read alignment validated for complex plant genomes (Jos Lunenberg, Bas Tolhuis & Hans Karten)
- HiTSeq 2013 poster presentation: Ultra-fast, accurate and cost effective NGS read alignment with significant storage footprint reduction (Bas Tolhuis, Jos Lunenberg & Hans Karten)
- NBIC 2013 poster presentation: Ultra-fast, accurate and cost effective NGS read alignment with significant storage footprint reduction (Rick Karten, Bas Tolhuis & Hans Karten)
- ECCB 2012 poster presentation: A new ultra-fast and comprehensive NGS read aligner with high precision (Jos Lunenberg, Bas Tolhuis & Hans Karten)
*For more information on the validations and product availability in your country, please contact us.