Current Next-Generation Sequencing (NGS) platforms can analyze tens of thousand individual genomes per year, enabling large-scale population genetics. Population Calling, also referred to as ‘Joint Genotyping’ or ‘Joint Calling’ by others, is a variant calling approach in which cohorts of samples are examined in a single variant detection run. The aggregate information of all samples is used to improve the sensitivity and precision to detect all DNA changes in a single sample. This means that the individual sample results are enhanced using the information from other samples in the population. Call set enrichment can be seen on three levels.
While with Joint Genotyping it is computationally extremely expensive to call a large number of samples simultaneously. Using the GENALICE MAP Population Calling module, it only takes about 6 minutes per WGS sample of processing time. This means that it is more than two orders of magnitude faster than GATK’s Joint Genotyping (6 minutes vs. 34 hours for one WGS sample on a single node).
The GENALICE MAP population caller is fully scalable, and individual or batches of samples can be incrementally added to the cohort. Different batches can be delivered at separate time point without the necessity to re-process previously delivered batches. Data retrieval from the cohort variant call set is fast.
This fast data retrieval is related to the chosen storage format, the GENALICE Variant Map (GVM). This is a multi-dimensional extensible variant storage format. The GVM is a fully searchable data structure and per sample the storage footprint is approximately 50-fold reduced. The search results will be reported in the standard VCF format.
Applications for Population Calling
Population Calling provides a strong context for new biomarker discoveries and high confidence molecular profile driven diagnosis.