A compilation of tri-allelic SNPs from 1000 Genomes and use of the most polymorphic loci for a large-scale human identification panel
Files view or download
Files view or download
Forensic Science International-Genetics
Type of content
DeCSgenotipo | secuenciación de nucleótidos de alto rendimiento | frecuencia génica | heterocigoto | humanos | linaje | genética forense
MeSHHumans | Pedigree | High-Throughput Nucleotide Sequencing | Heterozygote | Forensic Genetics | Genotype | Gene Frequency
In a directed search of 1000 Genomes Phase III variation data, 271,934 tri-allelic single nucleotide polymorphisms (SNPs) were identified amongst the genotypes of 2,504 individuals from 26 populations. The majority of tri-allelic SNPs have three nucleotide substitution-based alleles at the same position, while a much smaller proportion, which we did not compile, have a nucleotide insertion/deletion plus substitution alleles. SNPs with three alleles have higher discrimination power than binary loci but keep the same characteristic of optimum amplification of the fragmented DNA found in highly degraded forensic samples. Although most of the tri-allelic SNPs identified had one or two alleles at low frequencies, often single observations, we present a full compilation of the genome positions, rs-numbers and genotypes of all tri-allelic SNPs detected by the 1000 Genomes project from the more detailed analyses it applied to Phase III sequence data. A total of 8,705 tri-allelic SNPs had overall heterozygosities (averaged across all 1000 Genomes populations) higher than the binary SNP maximum value of 0.5. Of these, 1,637 displayed the highest average heterozygosity values of 0.6-0.666. The most informative tri-allelic SNPs we identified were used to construct a large-scale human identification panel for massively parallel sequencing, designed for the identification of missing persons. The large-scale MPS identification panel comprised: 1,241 autosomal tri-allelic SNPs and 29 X tri-allelic SNPs (plus 46 microhaplotypes adapted for genotyping from reduced length sequences). Allele frequency estimates are detailed for African, European, South Asian and East Asian population groups plus the Peruvian population sampled by 1000 Genomes for the 1,270 tri-allelic SNPs of the final MPS panel. We describe the selection criteria, kinship simulation experiments and genomic analyses used to select the tri-allelic SNP components of the panel. Approximately 5 % of the tri-allelic SNPs selected for the large-scale MPS identification panel gave three-genotype patterns in single individual samples or discordant genotypes for genomic control DNAs. A likely explanation for some of these unreliably genotyped loci is that they map to multiple sites in the genome - highlighting the need for caution and detailed scrutiny of multiple-allele variant data when designing future forensic SNP panels, as such patterns can arise from common structural variation in the genome, such as segmental duplications.