For more than 50 years, copy-number variations (CNVs), which are simply duplications and deletions of genomic regions, have been recognised as important evolutionary mechanisms. Only a small number of CNVs in humans, however, are advantageous for adaptation.
However, rare CNVs (rCNVs), which include duplications and deletions that are infrequent throughout the human population, can greatly raise the risk of disease. These rCNVs have a long history of being linked to Mendelian and complicated disorders. Additionally, a subset of rCNVs connected to disease known as genomic disorders (GDs) has received considerable attention in the literature for a long time.
It’s interesting to note that dosage-sensitive (DS) driver genes have historically proved difficult to discover among rCNVs. Gene annotations and genome-wide DS segments are also still unavailable. For any human gene, there are no widely used frameworks for evaluating triplosensitivity and haploinsufficiency.
Furthermore, it is widely unknown whether the duplication- and deletion-linked reciprocal GD symptoms are caused by a single bidirectional DS gene, two or more distinct haploinsufficient (HI), triplosensitive (TS), or other genes. Detailed maps of bidirectional dose sensitivity across illnesses are also urgently required for clinical interpretation and the study of human disease.
Concerning the study
The goal of the current work was to evaluate the characteristics of triplosensitivity and haploinsufficiency, or duplication intolerance, over the whole human genome. The team assembled a genome-wide library of rCNV connections for 54 disease characteristics by harmonising and meta-analyzing rCNVs from 950,278 individuals.
Additionally, they forecasted the probabilities of triplosensitivity (pTriplo) and haploinsufficiency (pHaplo) for all protein-coding genes using 145 genome annotations along with these rCNVs.
To be more specific, the researchers gathered rCNVs discovered by microarrays from 17 sources, ranging from diagnostic labs to national biobanks. They carefully looked for rCNV linkages for each trait, taking advantage of the existing sample size and drawing from decades of influential research on CNV in sickness.
Using exome-wide rCNV connection testing, the researchers then sought to identify certain genes that were more frequently found to have coding rCNVs in patients than in controls. They postulated that a library of dose sensitivity measures for each gene would make for a potentially useful tool for clinical genetics and genomics research, even if it were inadequate. In order to computationally predict the pTriplo and pHaplo for 18,641 autosomal protein-coding genes, the scientists developed a two-step method.
Results and discussion
In order to assess the influence of rCNVs on 54 human diseases, the researchers created a genome-wide library of standardised rCNV correlation statistics by meta-analyzing a substantial number of biomedical datasets. This collection includes a consensus catalogue of 178 DS genomic segments linked to human disease, with a high-confidence selection of 88 DS genomic segments having rigorous genome-wide relevance.
The researchers also showed that, relying on enrichments of restricted disease genes and non-uniform concentrations of harmful de novo mutations (DNMs) inside rCNV segments, a sizable fraction of these segments probably contains at least one DS driver gene.
A fundamental structure of around one causal gene per phenotype per segment was consistent with the increased density of restricted genes the team discovered for pleiotropic rCNVs. Additionally, it agreed with data on a few notable GDs, such as the association between 22q11.2 GD deletions and abnormalities in the heart and kidneys in T-box transcription factor 1 (TBX1) and CT10 regulator of kinase-like proto-oncogene, adaptor protein (CRKL), respectively.
Given the known cis-regulatory effects, gene-gene connections, and variable penetrance or expressivity attributed to the polygenic backdrop and secondary variations, the overall genetic ramifications of the majority of rCNVs were predicted to be more complex.
Using a variety of genetic topologies and effect sizes, the authors repurposed fine-mapping methods from genome-wide association studies (GWASs) to statically pick particular genes inside huge rCNVs. By merging short variant datasets, the team identified patterns that suggest that rCNVs and short variations frequently gather on the same causative genes at disease-linked loci. The CNV direction-selective augmentations of discovered protein-truncating variants (PTVs) and missense DNMs suggest that this convergence may point to a mechanism.
The team’s final step was predicting the dosage sensitivity of each autosomal protein-coding gene using the study’s data. The present triplosensitivity scores in particular may provide a novel perspective for the analysis of rare duplications and even a few disease-related missense variations, for which loss-of-function (LoF) and gain-of-function effects are difficult to distinguish in silico.
The authors were able to compile a genome-wide library of dosage sensitivity encompassing 54 diseases thanks to the harmonisation and meta-analysis of rCNVs from nearly 1,000,000 individuals. 163 DS segments associated with at least one illness were identified by this technique. Because these regions were typically gene dense and commonly contained dominant DS driving genes, the team prioritised them using statistical fine-mapping.
When it came time to estimate dosage sensitivity probabilities (pHaplo and pTriplo) for all autosomal genes, the researchers developed an ensemble machine-learning framework. 2,987 HI and 1,559 TS genes, including 648 novel TS genes, were found in this model.
Notably, the researchers released the entire set of data and maps from the current study as a free resource. They believed that the study’s findings on dosage sensitivity would be very helpful for researching clinical genetics and human disorders.
- Collins, R. et al. (2022) “A cross-disorder dosage sensitivity map of the human genome”, Cell, 185(16), pp. 3041-3055.e25. doi: 10.1016/j.cell.2022.06.036. https://www.sciencedirect.com/science/article/pii/S0092867422007887