BACKGROUND & AIMS:
BACKGROUND:Large-scale cancer epidemiology cohorts (CEC) have successfully collected, analyzed, and shared patient-reported data for years. CECs increasingly need to make their data more findable, accessible, interoperable, and reusable, or FAIR. How CECs should approach this transformation is unclear.
METHODS:The California Teachers Study (CTS) is an observational CEC of 133,477 participants followed since 1995-1996. In 2014, we began updating our data storage, management, analysis, and sharing strategy. With the San Diego Supercomputer Center, we deployed a new infrastructure based on a data warehouse to integrate and manage data and a secure and shared workspace with documentation, software, and analytic tools that facilitate collaboration and accelerate analyses.
RESULTS:Our new CTS infrastructure includes a data warehouse and data marts, which are focused subsets from the data warehouse designed for efficiency. The secure CTS workspace utilizes a remote desktop service that operates within a Health Insurance Portability and Accountability Act (HIPAA)- and Federal Information Security Management Act (FISMA)-compliant platform. Our infrastructure offers broad access to CTS data, includes statistical analysis and data visualization software and tools, flexibly manages other key data activities (e.g., cleaning, updates, and data sharing), and will continue to evolve to advance FAIR principles.
CONCLUSIONS:Our scalable infrastructure provides the security, authorization, data model, metadata, and analytic tools needed to manage, share, and analyze CTS data in ways that are consistent with the NCI's Cancer Research Data Commons Framework.
IMPACT:The CTS's implementation of new infrastructure in an ongoing CEC demonstrates how population sciences can explore and embrace new cloud-based and analytics infrastructure to accelerate cancer research and translation.See all articles in this CEBP Focus section, "Modernizing Population Science."
背景与目标:
背景:大型癌症流行病学队列(CEC)多年来已成功收集,分析和共享患者报告的数据。 CEC越来越需要使它们的数据更易于查找,访问,互操作和可重用,或称为FAIR。 CEC如何应对这种转变尚不清楚。
方法:自1995年至1996年以来,加州教师研究(CTS)是一项133,477名参与者的观察性CEC。 2014年,我们开始更新数据存储,管理,分析和共享策略。通过圣地亚哥超级计算机中心,我们部署了一个基于数据仓库的新基础架构,以集成和管理数据以及带有文档,软件和分析工具的安全且共享的工作空间,这些文件,软件和分析工具可促进协作并加速分析。
结果:我们新的CTS基础架构包括一个数据仓库和数据集市,它们是为提高效率而设计的数据仓库中的重点子集。安全的CTS工作区利用远程桌面服务,该服务在符合《健康保险可移植性和责任法案》(HIPAA)和《联邦信息安全管理法案》(FISMA)的平台内运行。我们的基础架构可广泛访问CTS数据,包括统计分析和数据可视化软件和工具,可灵活管理其他关键数据活动(例如,清理,更新和数据共享),并将继续发展以推进FAIR原则。
结论:我们的可扩展基础架构提供了以与NCI的癌症研究数据共享框架一致的方式管理,共享和分析CTS数据所需的安全性,授权,数据模型,元数据和分析工具。
影响:CTS在持续进行的CEC中对新基础设施的实施演示了人口科学如何探索和采用新的基于云的分析基础设施,以加速癌症研究和翻译。请参阅本CEBP焦点部分“现代化人口科学”中的所有文章。