Phasing is the process of inferring haplotypes from genotype data. Efficient algorithms and associated software for accurate phasing in pedigrees are needed, especially for populations lacking reference panels of sequenced individuals. We present a novel method for phasing genotypes from whole-genome sequence data in pedigrees, called PULSAR (Phasing Using Lineage Specific Alleles/Rare variants). The method is based on the property that alleles specific to a single founding chromosome within a pedigree are highly informative for identifying haplotypes that are shared identical by descent. Simulation studies are used to assess the performance of PULSAR with various pedigree sizes and structures, and the effect of genotyping errors and the presence of nonsequenced individuals is investigated. In pedigrees with complete sequencing and realistic genotyping error rates, PULSAR correctly phases >99.9% of heterozygous genotypes, excluding sites at which all individuals are heterozygous, and does so with a switch error rate frequently below 10-4. PULSAR is highly accurate, capable of genotype error correction and imputation, and computationally competitive with alternative phasing software applicable to pedigrees. Our method has the significant advantage of not requiring reference panels that are essential for other population-based phasing algorithms. A software implementation of PULSAR is freely available.

译文

定相是从基因型数据推断单倍型的过程。需要有效的算法和相关软件来对谱系进行精确定相,尤其是对于缺乏测序个体参考面板的人群。我们提出了一种新的方法,用于从谱系中的全基因组序列数据中定相基因型,称为脉冲星 (使用谱系特异性等位基因/稀有变体进行定相)。该方法基于以下特性: 谱系中特定于单个创始染色体的等位基因对于鉴定通过血统共享相同的单倍型具有高度的信息。模拟研究用于评估具有各种谱系大小和结构的脉冲星的性能,并研究了基因分型错误和非测序个体的影响。在具有完整测序和实际基因分型错误率的谱系中,脉冲星正确地阶段> 杂合基因型的99.9%,不包括所有个体都是杂合的位点,并且切换错误率通常低于10-4。PULSAR具有很高的准确性,能够进行基因型纠错和归因,并且与适用于谱系的替代定相软件在计算上具有竞争力。我们的方法具有不需要其他基于种群的定相算法必不可少的参考面板的显着优势。可以免费获得PULSAR的软件实现。

+1
+2
100研值 100研值 ¥99课程
检索文献一次
下载文献一次

去下载>

成功解锁2个技能,为你点赞

《SCI写作十大必备语法》
解决你的SCI语法难题!

技能熟练度+1

视频课《玩转文献检索》
让你成为检索达人!

恭喜完成新手挑战

手机微信扫一扫,添加好友领取

免费领《Endnote文献管理工具+教程》

微信扫码, 免费领取

手机登录

获取验证码
登录