As the demand for accurately aligning gene sequences to the genome of a related species grows with the sequencing of new genomes, spaced seeds emerge as a promising vehicle for increasing alignment sensitivity. We extend the existing {0, 1} match-mismatch models for sensitivity evaluation to take into account the compositional structure of coding sequences and ultimately produce seeds better suited to this particular application. Designing seeds for alignment programs, however, needs to balance sensitivity and specificity. We assess the effects of seed variations on both sensitivity and specificity in an extended model that incorporates transitions and differentiates among the three codon positions, and show that spaced seeds with transitions offer a better sensitivity-specificity tradeoff. Furthermore, we propose a theoretical formulation for rigorously assessing seed specificity, starting from Bernoulli and Markov models of the mRNA and genomic sequences. Within this framework, we perform the first comprehensive analysis of seeds to serve as a blueprint for selecting sensitive and specific seeds for practical applications. Our analyses show that specificity is relatively constant for seeds of a given weight, while sensitivity varies widely, with the highest values attained by seeds allowing a small (2-6) number of transitions.A strategy for designing seeds, therefore, is to first select the weight of the seed by identifying the desired sensitivity-specificity tradeoff, then choose the most sensitive seed(s) within that weight group. We illustrate our methods with the alignment of chicken coding sequences against the human genome assembly version HG17.

译文

随着新基因组测序的增长,将基因序列与相关物种的基因组精确比对的需求不断增长,间隔种子成为增加比对敏感性的有前途的载体。我们扩展了现有的{0,1}匹配不匹配模型以进行灵敏度评估,以考虑编码序列的组成结构,并最终产生更适合此特定应用程序的种子。但是,为比对程序设计种子需要平衡敏感性和特异性。我们在扩展模型中评估了种子变异对敏感性和特异性的影响,该模型结合了过渡并区分了三个密码子位置,并显示了具有过渡的间隔种子提供了更好的敏感性-特异性折衷。此外,我们从mRNA和基因组序列的伯努利和马尔可夫模型开始,提出了一种严格评估种子特异性的理论方法。在此框架内,我们将对种子进行首次全面分析,以作为为实际应用选择敏感种子和特定种子的蓝图。我们的分析表明,对于给定重量的种子,特异性相对恒定,而灵敏度差异很大,种子获得的最高值允许少量(2-6)转换。因此,设计种子的策略是首先通过确定所需的敏感性-特异性折衷来选择种子的重量,然后选择该重量组中最敏感的种子。我们用针对人类基因组装配版本HG17的鸡编码序列进行比对来说明我们的方法。

+1
+2
100研值 100研值 ¥99课程
检索文献一次
下载文献一次

去下载>

成功解锁2个技能,为你点赞

《SCI写作十大必备语法》
解决你的SCI语法难题!

技能熟练度+1

视频课《玩转文献检索》
让你成为检索达人!

恭喜完成新手挑战

手机微信扫一扫,添加好友领取

免费领《Endnote文献管理工具+教程》

微信扫码, 免费领取

手机登录

获取验证码
登录