BACKGROUND:Protein remote homology detection and fold recognition are central problems in computational biology. Supervised learning algorithms based on support vector machines are currently one of the most effective methods for solving these problems. These methods are primarily used to solve binary classification problems and they have not been extensively used to solve the more general multiclass remote homology prediction and fold recognition problems. RESULTS:We present a comprehensive evaluation of a number of methods for building SVM-based multiclass classification schemes in the context of the SCOP protein classification. These methods include schemes that directly build an SVM-based multiclass model, schemes that employ a second-level learning approach to combine the predictions generated by a set of binary SVM-based classifiers, and schemes that build and combine binary classifiers for various levels of the SCOP hierarchy beyond those defining the target classes. CONCLUSION:Analyzing the performance achieved by the different approaches on four different datasets we show that most of the proposed multiclass SVM-based classification approaches are quite effective in solving the remote homology prediction and fold recognition problems and that the schemes that use predictions from binary models constructed for ancestral categories within the SCOP hierarchy tend to not only lead to lower error rates but also reduce the number of errors in which a superfamily is assigned to an entirely different fold and a fold is predicted as being from a different SCOP class. Our results also show that the limited size of the training data makes it hard to learn complex second-level models, and that models of moderate complexity lead to consistently better results.

译文

背景:蛋白质远程同源性检测和折叠识别是计算生物学中的核心问题。目前,基于支持向量机的监督学习算法是解决这些问题的最有效方法之一。这些方法主要用于解决二进制分类问题,尚未广泛用于解决更一般的多类远程同源性预测和折叠识别问题。
结果:我们目前对在SCOP蛋白质分类的背景下建立基于SVM的多类别分类方案的许多方法进行了全面评估。这些方法包括直接构建基于SVM的多类模型的方案,采用第二级学习方法来组合由一组基于二进制SVM的分类器生成的预测的方案以及为各个级别的SVM构建和组合二进制分类器的方案。 SCOP层次结构超出了定义目标类的层次结构。
结论:分析不同方法在四个不同数据集上获得的性能,我们发现大多数提议的基于多类SVM的分类方法在解决远程同源性预测和折叠识别问题方面非常有效,并且使用来自二进制模型的预测的方案为SCOP层次结构内的祖先类别构建的结构不仅会导致较低的错误率,而且还会减少将超家族分配给完全不同的折叠并预测来自不同SCOP类的折叠的错误数量。我们的结果还表明,训练数据的数量有限,很难学习复杂的第二级模型,而中等复杂性的模型则可以始终如一地获得更好的结果。

+1
+2
100研值 100研值 ¥99课程
检索文献一次
下载文献一次

去下载>

成功解锁2个技能,为你点赞

《SCI写作十大必备语法》
解决你的SCI语法难题!

技能熟练度+1

视频课《玩转文献检索》
让你成为检索达人!

恭喜完成新手挑战

手机微信扫一扫,添加好友领取

免费领《Endnote文献管理工具+教程》

微信扫码, 免费领取

手机登录

获取验证码
登录