Genalice 2.5 marks another important milestone on performance, quality and functionality. All modules in the suite work well with WGS, WES or panel data and is reference independent. NGS input data from Illumina devices have been validated.
Our newest release brings:
- Mapping performance has been increased and the time to write a GAR file (WGS 5GB) has been reduced with >80%;
- Improved INDEL calling;
- Embedded trio-analysis in the population caller;
- Improved somatic calling;
- Improved CNV calling;
- Support for GRCh38 with ‘alt’ sections enabling users to find new variants in this richer and more accurate reference.
With this release, the Genalice Map production suite consists of the following modules:
- Map is a simple one step operation to align short NGS reads from FASTQ, BAM or GAR to a reference. The reference can range from very small to extremely large. We support many sections in a FASTA, which allows you e.g. to use an assembly output or a library with many sections to map against. Mapping a human WGS (30x) takes about 25 minutes on a single node with a commodity Intel E5 processor.
- Variant Calling provides a robust and repeatable method for single sample variant calling. It is a monolithic tool that takes a GAR file input and produces a VCF. The implementation is scalable. On a small Intel node this takes 5 minutes. On a quad E7, it takes a bit over 1 minute. Note that Genalice variant calling uses a pure observation approach. This means that there is no bias towards known mutations, making the tool extremely useful in cases where the Caucasian reference is not the norm. For clinical labs, we support profiles that can be setup to enhance the sensitivity to variants that are of special interest.
The combination of Map and Variant allows the full process from FASTQ to VCF to complete in less than 30 minutes.
- Population Calling is a unique tool implementing a well known concept of using the intrinsic population observations to improve variant calling. This inContext calling, or Consensus based Call Enhancement, improves both sensitivity and precision. The implementation scales linear from a few to many samples and from one to many nodes. The storage system is designed and tested to run with eventually consistent file systems such as S3 from Amazon. The Population Caller has embedded trio support, resulting in consistent Mendelian error detection to assist finding mutations related to rare diseases. Population Calling takes about 6 minutes per sample. This replaces the time it takes to run Variant Calling. Just as with Variant Calling, Population Calling is done on the GAR file. The variants for a population (cohort) are stored in a GVM (Genalice Variant Map). This is a repository that can be used to manage the samples and to search for patterns in the genetic profile. The use of GVM allows you to play with the data. You can take different groups of samples out of a large GVM into a separate GVM to further enhance the quality of the calls in e.g. a phenotype related cohort. The speed and ease of use frees up time to really focus on the meaning of the data instead of dealing with the data (Population Calling Module Infographic). The population has been build to deal with very large cohorts that can be stored and analyzed quickly while stored in the GVM. Population management scales linear with expanded compute and storage requirements due to the unique format and parallel processing design.
- Somatic Calling takes two samples from the same subject and does an in depth comparison to find variants that are different for each sample. In general this technique is used for tumor/normal analysis when looking for somatic mutations. The Genalice Somatic Caller uses two GAR files as input. This provides a detailed background for high quality variant detection. The difficulty with Somatic Calling is that tumor samples are impure. Genalice implements dynamic tumor purity detection to be able to find the correct balance between sensitivity and precision. This auto tuning facility allows the caller to cover a wide range of signals in the tumor to be correctly detected. Comparison done against a merge of the four different somatic callers all covering part of the signal spectrum shows that the processing time of 6 minutes per sample does not reduce the quality or consistency of the result (Somatic Calling Infographic).
- Copy Number Variation (CNV) uses the coverage of two samples, or a sample and a group-coverage (GCO – Genalice Coverage), and computes a normalized differential between the two inputs. The sample input is obtained from the GAR file. This file has a separate section with coverage data. This facility allows CNV to be completed in seconds instead of hours or days with other tools. The group coverage can be compiled from a set of GAR files, and uses the average and standard deviation of the group to make a call for gain, loss, normal or skip for a segment of the genome (Copy Number Variation Infographic).
GAR toolkit combines a convert tool that can convert the GAR file to BAM, SAM or FASTQ and an API (Java and C), that allows tools to access the GAR file content as if it was a BAM file. The aligned reads, the quality and the cigar strings are presented as BAM records through the interface. Position access is provided, such that tools like IGV can browse through the GAR file without noticing the difference in content. The only difference is the ease of use and the speed of accessing the data.
Somatic Calling is the art of finding the differences between two samples. In a tumor/normal context this is complicated by the fact that the tumor sample is not ‘pure’. This means that standard Variant Calling (including haplotype-calling) is defeated by having a mix of DNA profiles in the same ‘sample’.
To deal with this, we developed a dynamic purity detection method. It detects and separates tumor signal from normal and noise. This gives a clear signal covering a wide signal range. In addition it allows for proper LOH (Loss Of Heterozygosity) detection in case of HETHOM promotion and total loss of signal.
The Somatic Caller uses two GAR files as input. The GAR files provide detailed input and the complete context to be able to make a proper call.
The Caller operates at similar speed as the single sample Variant Caller. It takes ±6 minutes per 30x WGS sample to detect Somatic Calls.
The Somatic Caller output is a VCF file containing the somatic calls. The output can also be stored directly in the GVM. This allows you to collect a series of somatic outputs (cohort), and apply Consensus based Call Enhancement to further improve the signal. This also provides a repository in which genetic profiles can be detected.
Because of the speed, the quality and adaptive nature of the caller, it can be used in different environments, ranging from clinic to support high speed diagnosis, research to support genetic profile mining and clinical studies to support fast and efficient validation.