BACKGROUND & AIMS:
:The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approach to phylogenetics demands validation studies and sound measures of performance. To date there has been little practical guidance available as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the maximum likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter nonidentifiability (NI) and distinguish it from other parameter estimability problems, including when parameters are structurally identifiable, but are not estimable in a given data set (INE), and when parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use Bayesian phylogenetics software to diagnose nonidentifiability. Theoreticians and practitioners alike now have a powerful, yet simple tool to detect nonidentifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.
背景与目标:
:基于模型的方法在系统发育学中的成功激发了许多旨在生成新的生物学信息模型的研究。这种新的计算机密集型系统发育方法要求进行验证研究和对性能进行合理测量。迄今为止,关于何时以及为什么可以可靠地识别特定模型中的参数的实践指南很少。在这里,我们说明如何使用数据克隆(DC)(一种最近开发的方法来计算最大似然估计值及其渐近方差)来诊断结构参数不可识别性(NI),并将其与其他参数可估计性问题(包括何时使用参数)区分开来在结构上是可识别的,但是在给定的数据集中(INE)不可估计,并且在参数可识别且可估计的情况下,但仅在微弱的情况下(WE)是可估计的。 DC定理的应用使用了众所周知且广泛使用的贝叶斯计算技术。通过DC方法,从业人员可以使用贝叶斯系统进化软件来诊断不可识别性。现在,理论家和从业人员都拥有强大而简单的工具,可以在调查复杂的建模场景时检测不可识别性,而在复杂的建模场景中,概率研究中获取封闭形式的表达式非常复杂。此外,在这里,我们还展示了如何将DC用作检查和消除先验影响的工具,尤其是在先验诱导过程不是很简单的情况下。最后,当应用于系统发育推断时,DC可以用于研究至少两个重要的统计问题:评估离散参数(如树形拓扑)的可识别性,以及开发用于计算上昂贵的后验密度的有效采样方法。