nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2026, 04, v.10 49-54+59
GTransFusion:基于Transformer的多模态表示学习与图结构对齐的融合方法
基金项目(Foundation):
邮箱(Email):
DOI: 10.19850/j.cnki.2096-4706.2026.04.009
发布时间: 2026-02-25
出版时间: 2026-02-25
移动端阅读
摘要:

高通量基因组测序、高分辨率数字病理图像等多源医疗数据涌现,多模态生物学建模成为人工智能辅助病理诊断的关键。该研究提出一种新的多模态表示学习方法GTransFusion,用于联合分析病理全片图像与组学数据,以提高多种癌症的诊断准确性。该方法通过基于Transformer的联合表示学习模块,将不同模态数据映射为统一序列表示,过程中显式建模模态类型编码并借助自注意力机制实现动态模态加权;同时构建跨模态特征对齐图结构,利用图神经网络捕获模态间关联与共性信息,反作用于Transformer表示学习以实现跨模态特征对齐与关系建模。在多种肿瘤数据集上的实验表明,所提方法在生存预测性能指标上显著优于对比方法,验证了多模态联合表示和图结构对齐的有效性。

Abstract:

With the emergence of multi-source medical data such as high-throughput genome sequencing and high-resolution digital pathological images,multimodal biological modeling becomes the key to artificial intelligence-assisted pathological diagnosis.This study proposes a new multimodal representation learning method,GTransFusion,to jointly analyze pathological Whole Slide Images and omics data,so as to improve the diagnostic accuracy of various cancers.This method maps different modal data into a unified sequence representation through a Transformer-based joint representation learning module,explicitly models modal type encoding in the process,and realizes dynamic modal weighting by virtue of the self-attention mechanism.Meanwhile,this method constructs a cross-modal feature alignment graph structure,utilizes a Graph Neural Network to capture inter-modal association and common information,and feeds back to the Transformer representation learning to realize cross-modal feature alignment and relationship modeling.Experiments on multiple tumor datasets show that the proposed method is significantly superior to comparison methods in survival prediction performance indicators,which verifies the effectiveness of multimodal joint representation and graph structure alignment.

参考文献

[1]BENKIRANE H,VAKALOPOULOU M,PLANCHARD D,et al.Multimodal CustOmics:A Unified and Interpretable Multi-task Deep Learning Framework for Multimodal Integrative Data Analysis in Oncology[J/OL].PLOS Computational Biology,2025,21(6):e1013012[2026-01-28].https://doi.org/10.1371/journal.pcbi.1013012.

[2]STEYAERT S,PIZURICA M,NAGARAJ D,et al.Multimodal Data Fusion for Cancer Biomarker Discovery with Deep Learning[J].Nature Machine Intelligence,2023,5(4):351-362.

[3]LIN X V,SHRIVASTAVA A,LUO L,et al.MoMa:Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts[J/OL].arXiv:2407.21770[cs.AI].[2025-06-20].https://doi.org/10.48550/arXiv.2407.21770.

[4]YANG Q H,ZHAO Y,CHENG H.MMLF:Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation[J/OL].arXiv:2410.08739[cs.CV].[2025-06-25].https://doi.org/10.48550/arXiv.2410.08739.

[5]BOULAHIA S Y,AMAMRAA,MADI M R,et al.Early,intermediate and late fusion strategies for robust deep learning-based multimodal action recognition[J/OL].2021,32:121[2025-07-06].https://link.springer.com/article/10.1007/s00138-021-01249-8.

[6]CHEN R J,LU M Y,WANG J W,et al.Pathomic Fusion:An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis[J].IEEE Transactions on Medical Imaging,2022,41(4):757-770.

[7]HONG Y,FANG S H,SU J Q,et al.A Novel Approach for Image Encryption with Chaos-RNA[J].Computers,Materials & Continua,2023,77(1):139-160.

[8]DOSOVITSKIY A,BEYER L,KOLESNIKOV A,et al.An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale[J/OL].arXiv:2010.11929[cs.CV].[2025-06-17].https://doi.org/10.48550/arXiv.2010.11929.

[9]SCHNEIDER L,LAIOUAR-PEDARI S,KUNTZ S,et al.Integration of Deep Learning-Based Image Analysis and Genomic Data in Cancer Pathology:A Systematic Review[J].European Journal of Cancer,2022,160:80-91.

[10]LUO H,HUANG J S,JU H R,et al.Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction[J/OL].Scientific Reports,2025,15:10470[2026-01-28].https://doi.org/10.1038/s41598-025-93770-3.

[11]DING K X,ZHOU M,METAXAS D N,et al.Pathology-and-Genomics Multimodal Transformer for Survival Outcome Prediction[C]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2023.Vancouver:Springer,2023:619-629.

[12]LI C Y,ZHU X L,YAO J W,et al.Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics[C]//2022 26th International Conference on Pattern Recognition (ICPR).Montreal:IEEE,2022:4256-4262.

[13]LIU M X,LIU Y Z,CUI H,et al.MGCT:Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction Using Integrative Histopathology-Genomic Features[C]//2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).Istanbul:IEEE,2023:1306-1312.

[14]CHEN R J,LU M Y,WILLIAMSON D F K,et al.Pan-cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning[J].Cancer Cell,2022,40(8):865-878.

[15]LU M Y,WILLIAMSON D F K,CHEN T Y,et al.Data-efficient and Weakly Supervised Computational Pathology on Whole-slide Images[J].Nature Biomedical Engineering,2021,5:555-570.

[16]VELIKOVIĆ P,CUCURULL G,CASANOVA A,et al.Graph Attention Networks[J/OL].arXiv:1710.10903[stat.ML].[2025-06-19].https://arxiv.org/abs/1710.10903.

[17]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J/OL].arXiv:1810.04805[cs.CL].[2025-06-27].https://arxiv.org/abs/1810.04805.

[18]SUBRAMANIAN A,TAMAYO P,MOOTHA V K,et al.Gene Set Enrichment Analysis:A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles[J].PNAS,2005,102(43):15545-15550.

基本信息:

DOI:10.19850/j.cnki.2096-4706.2026.04.009

中图分类号:R318;TP18;TP391.41

引用信息:

[1]张显,庞慧,刘佳俊.GTransFusion:基于Transformer的多模态表示学习与图结构对齐的融合方法[J].现代信息科技,2026,10(04):49-54+59.DOI:10.19850/j.cnki.2096-4706.2026.04.009.

发布时间:

2026-02-25

出版时间:

2026-02-25

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文