您现在所在的位置: 首页» 科学研究» 科研动态

马闯等《Advanced Science》2025年

       发布日期:2025-05-27   浏览次数:

论文题目:deepTFBS: Improving within- and Cross-Species Prediction of Transcription Factor Binding Using Deep Multi-Task and Transfer Learning

论文作者:Jingjing Zhai#, Yuzhou Zhang#, Chujun Zhang, Xiaotong Yin, Minggui Song, Chenglong Tang, Pengjun Ding, Zenglin Li, Chuang Ma*

论文摘要:

The precise prediction of transcription factor binding sites (TFBSs) is crucial in understanding gene regulation. In this study, deepTFBS, a comprehensive deep learning (DL) framework that builds a robust DNA language model of TF binding grammar for accurately predicting TFBSs within and across plant species is presented. Taking advantages of multi-task DL and transfer learning, deepTFBS is capable of leveraging the knowledge learned from large-scale TF binding profiles to enhance the prediction of TFBSs under small-sample training and cross-species prediction tasks. When tested using available information on 359 Arabidopsis TFs, deepTFBS outperformed previously described prediction strategies, including position weight matrix, deepSEA and DanQ, with a 244.49%, 49.15%, and 23.32% improvement of the area under the precision-recall curve (PRAUC), respectively. Further cross-species prediction of TFBS in wheat showed that deepTFBS yielded a significant PRAUC improvement of 30.6% over these three baseline models. deepTFBS can also utilize information from gene conservation and binding motifs, enabling efficient TFBS prediction in species where experimental data availability is limited. A case study, focusing on the WUSCHEL (WUS) transcription factor, illustrated the potential use of deepTFBS in cross-species applications, in our example between Arabidopsis and wheat. deepTFBS is publicly available at https://github.com/cma2015/deepTFBS.

基于深度学习的组学大数据智能挖掘是当前生物信息学领域的重要研究内容之一。针对转录因子-DNA结合的精确预测问题,本研究提出了一个新的深度学习框架 deepTFBS,通过结合大规模转录因子结合位点数据、多任务学习和迁移学习技术进行TFBS预测。deepTFBS 能够有效提取并迁移大规模转录因子结合数据中的规律信息,在应对小样本训练和跨物种预测等挑战性任务中表现出色。在对359个拟南芥转录因子的评估中,deepTFBS 相较于传统PWM方法、深度学习模型deepSEA和DanQ预测性能上明显提升。在小麦跨物种TFBS预测任务中,deepTFBS 的PRAUC比基线模型提高 30.6%。以转录因子  WUSCHEL(WUS) 为案例,进一步通过实验验证证实了deepTFBS的跨物种应用潜力。相较于AgroNT、PDLLMs等预训练大模型,deepTFBS在轻量化、速度上更适配特定任务(如,TFBS预测)的大规模基因组分析。该模型已在 GitHub 平台公开发布(https://github.com/cma2015/deepTFBS),供相关研究人员使用与拓展。

deepTFBS跨物种预测能力为缺乏实验数据的作物(如玉米、小麦)提供了调控网络解析、调控变异筛选、调控元件从头设计等研究新工具。其开源性和跨物种能力尤其适用于资源有限的作物研究,有望加速作物精准育种和功能基因组学的发展。

原文链接:

https://advanced.onlinelibrary.wiley.com/doi/10.1002/advs.202503135