BACKGROUND & AIMS:
BACKGROUND:The number of completely sequenced plastid genomes available is growing rapidly. This array of sequences presents new opportunities to perform comparative analyses. In comparative studies, it is often useful to compare across wide phylogenetic spans and, within angiosperms, to include representatives from basally diverging lineages such as the genomes reported here: Nuphar advena (from a basal-most lineage) and Ranunculus macranthus (a basal eudicot). We report these two new plastid genome sequences and make comparisons (within angiosperms, seed plants, or all photosynthetic lineages) to evaluate features such as the status of ycf15 and ycf68 as protein coding genes, the distribution of simple sequence repeats (SSRs) and longer dispersed repeats (SDR), and patterns of nucleotide composition.
RESULTS:The Nuphar [GenBank:NC_008788] and Ranunculus [GenBank:NC_008796] plastid genomes share characteristics of gene content and organization with many other chloroplast genomes. Like other plastid genomes, these genomes are A+T-rich, except for rRNA and tRNA genes. Detailed comparisons of Nuphar with Nymphaea, another Nymphaeaceae, show that more than two-thirds of these genomes exhibit at least 95% sequence identity and that most SSRs are shared. In broader comparisons, SSRs vary among genomes in terms of abundance and length and most contain repeat motifs based on A and T nucleotides.
CONCLUSION:SSR and SDR abundance varies by genome and, for SSRs, is proportional to genome size. Long SDRs are rare in the genomes assessed. SSRs occur less frequently than predicted and, although the majority of the repeat motifs do include A and T nucleotides, the A+T bias in SSRs is less than that predicted from the underlying genomic nucleotide composition. In codon usage third positions show an A+T bias, however variation in codon usage does not correlate with differences in A+T-richness. Thus, although plastome nucleotide composition shows "A+T richness", an A+T bias is not apparent upon more in-depth analysis, at least in these aspects. The pattern of evolution in the sequences identified as ycf15 and ycf68 is not consistent with them being protein-coding genes. In fact, these regions show no evidence of sequence conservation beyond what is normal for non-coding regions of the IR.
背景与目标:
背景:可用的完全测序的质体基因组的数量正在迅速增长。这种序列阵列为进行比较分析提供了新的机会。在比较研究中,比较广泛的系统发育跨度和在被子植物内部进行比较通常是有用的,包括来自基本不同谱系的代表,例如此处报道的基因组:Nuphar advena(来自最基层的谱系)和Ranunculus macranthus(基真杜鹃) )。我们报告这两个新的质体基因组序列,并进行比较(在被子植物,种子植物或所有光合谱系内)以评估特征,例如ycf15和ycf68作为蛋白质编码基因的状态,简单序列重复(SSR)的分布以及更长的时间分散重复序列(SDR)和核苷酸组成的模式。
结果:质体基因组Nuphar [GenBank:NC_008788]和毛an属[GenBank:NC_008796]与许多其他叶绿体基因组具有相同的基因含量和组织特征。像其他质体基因组一样,除了rRNA和tRNA基因外,这些基因组也富含AT。 Nuphar与另一个睡莲科Nymphaea的详细比较显示,这些基因组中有三分之二以上的序列具有至少95%的序列同一性,并且大多数SSR是共享的。在更广泛的比较中,SSR在基因组中的丰度和长度各不相同,并且大多数包含基于A和T核苷酸的重复基序。
结论:SSR和SDR丰度随基因组而变化,并且对于SSR而言,与基因组大小成正比。长SDR在评估的基因组中很少见。 SSR的发生频率比预期的要低,尽管大多数重复基序确实包含A和T核苷酸,但SSR的A T偏向要小于基础基因组核苷酸组成所预测的偏向。在密码子使用中,第三个位置显示A T偏向,但是密码子使用的变化与A T丰富度的差异不相关。因此,尽管质体组核苷酸组成显示出“ A T丰富性”,但至少在这些方面,在更深入的分析中A T偏见并不明显。鉴定为ycf15和ycf68的序列的进化模式与它们是蛋白质编码基因的不一致。实际上,这些区域没有显示超出IR非编码区域正常序列保守性的证据。