现代信息科技

2026, 04, v.10 49-54+59

GTransFusion：基于Transformer的多模态表示学习与图结构对齐的融合方法

张显庞慧刘佳俊

1.河北建筑工程学院信息工程学院

基金项目(Foundation):

邮箱(Email):

DOI: 10.19850/j.cnki.2096-4706.2026.04.009

发布时间： 2026-02-25

出版时间： 2026-02-25

移动端阅读

120	0	442
下载次数	被引频次	阅读次数

引用本文下载本文

PDF

引用导出

GB/T 7714-2015 MLA APA Refworks EndNote NoteExpress NoteFirst

摘要全文参考文献出版信息相关文章

摘要：

高通量基因组测序、高分辨率数字病理图像等多源医疗数据涌现，多模态生物学建模成为人工智能辅助病理诊断的关键。该研究提出一种新的多模态表示学习方法GTransFusion，用于联合分析病理全片图像与组学数据，以提高多种癌症的诊断准确性。该方法通过基于Transformer的联合表示学习模块，将不同模态数据映射为统一序列表示，过程中显式建模模态类型编码并借助自注意力机制实现动态模态加权；同时构建跨模态特征对齐图结构，利用图神经网络捕获模态间关联与共性信息，反作用于Transformer表示学习以实现跨模态特征对齐与关系建模。在多种肿瘤数据集上的实验表明，所提方法在生存预测性能指标上显著优于对比方法，验证了多模态联合表示和图结构对齐的有效性。

关键词： 多模态融合; Transformer; 异构图; 联合表示学习;

Abstract：

With the emergence of multi-source medical data such as high-throughput genome sequencing and high-resolution digital pathological images,multimodal biological modeling becomes the key to artificial intelligence-assisted pathological diagnosis.This study proposes a new multimodal representation learning method,GTransFusion,to jointly analyze pathological Whole Slide Images and omics data,so as to improve the diagnostic accuracy of various cancers.This method maps different modal data into a unified sequence representation through a Transformer-based joint representation learning module,explicitly models modal type encoding in the process,and realizes dynamic modal weighting by virtue of the self-attention mechanism.Meanwhile,this method constructs a cross-modal feature alignment graph structure,utilizes a Graph Neural Network to capture inter-modal association and common information,and feeds back to the Transformer representation learning to realize cross-modal feature alignment and relationship modeling.Experiments on multiple tumor datasets show that the proposed method is significantly superior to comparison methods in survival prediction performance indicators,which verifies the effectiveness of multimodal joint representation and graph structure alignment.

KeyWords： multimodal fusion; Transformer; heterogeneous graph; joint representation learning;

如需获取全文，请访问cnki.net

参考文献

[1]BENKIRANE H，VAKALOPOULOU M，PLANCHARD D，et al.Multimodal CustOmics:A Unified and Interpretable Multi-task Deep Learning Framework for Multimodal Integrative Data Analysis in Oncology[J/OL].PLOS Computational Biology，2025，21(6)：e1013012[2026-01-28].https://doi.org/10.1371/journal.pcbi.1013012.

[2]STEYAERT S，PIZURICA M，NAGARAJ D，et al.Multimodal Data Fusion for Cancer Biomarker Discovery with Deep Learning[J].Nature Machine Intelligence，2023，5(4)：351-362.

[3]LIN X V，SHRIVASTAVA A，LUO L，et al.MoMa:Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts[J/OL].arXiv:2407.21770[cs.AI].[2025-06-20].https://doi.org/10.48550/arXiv.2407.21770.

[4]YANG Q H，ZHAO Y，CHENG H.MMLF:Multi-modal Multi-class Late Fusion for Object Detection with Uncertainty Estimation[J/OL].arXiv:2410.08739[cs.CV].[2025-06-25].https://doi.org/10.48550/arXiv.2410.08739.

[5]BOULAHIA S Y，AMAMRAA，MADI M R，et al.Early,intermediate and late fusion strategies for robust deep learning-based multimodal action recognition[J/OL].2021，32：121[2025-07-06].https://link.springer.com/article/10.1007/s00138-021-01249-8.

[6]CHEN R J，LU M Y，WANG J W，et al.Pathomic Fusion:An Integrated Framework for Fusing Histopathology and Genomic Features for Cancer Diagnosis and Prognosis[J].IEEE Transactions on Medical Imaging，2022，41(4)：757-770.

[7]HONG Y，FANG S H，SU J Q，et al.A Novel Approach for Image Encryption with Chaos-RNA[J].Computers,Materials & Continua，2023，77(1)：139-160.

[8]DOSOVITSKIY A，BEYER L，KOLESNIKOV A，et al.An Image is Worth 16x16 Words:Transformers for Image Recognition at Scale[J/OL].arXiv:2010.11929[cs.CV].[2025-06-17].https://doi.org/10.48550/arXiv.2010.11929.

[9]SCHNEIDER L，LAIOUAR-PEDARI S，KUNTZ S，et al.Integration of Deep Learning-Based Image Analysis and Genomic Data in Cancer Pathology:A Systematic Review[J].European Journal of Cancer，2022，160：80-91.

[10]LUO H，HUANG J S，JU H R，et al.Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction[J/OL].Scientific Reports，2025，15：10470[2026-01-28].https://doi.org/10.1038/s41598-025-93770-3.

[11]DING K X，ZHOU M，METAXAS D N，et al.Pathology-and-Genomics Multimodal Transformer for Survival Outcome Prediction[C]//Medical Image Computing and Computer Assisted Intervention-MICCAI 2023.Vancouver：Springer，2023：619-629.

[12]LI C Y，ZHU X L，YAO J W，et al.Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics[C]//2022 26th International Conference on Pattern Recognition (ICPR).Montreal：IEEE，2022：4256-4262.

[13]LIU M X，LIU Y Z，CUI H，et al.MGCT:Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction Using Integrative Histopathology-Genomic Features[C]//2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).Istanbul：IEEE，2023：1306-1312.

[14]CHEN R J，LU M Y，WILLIAMSON D F K，et al.Pan-cancer Integrative Histology-Genomic Analysis via Multimodal Deep Learning[J].Cancer Cell，2022，40(8)：865-878.

[15]LU M Y，WILLIAMSON D F K，CHEN T Y，et al.Data-efficient and Weakly Supervised Computational Pathology on Whole-slide Images[J].Nature Biomedical Engineering，2021，5：555-570.

[16]VELIKOVIĆ P，CUCURULL G，CASANOVA A，et al.Graph Attention Networks[J/OL].arXiv:1710.10903[stat.ML].[2025-06-19].https://arxiv.org/abs/1710.10903.

[17]DEVLIN J，CHANG M W，LEE K，et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[J/OL].arXiv:1810.04805[cs.CL].[2025-06-27].https://arxiv.org/abs/1810.04805.

[18]SUBRAMANIAN A，TAMAYO P，MOOTHA V K，et al.Gene Set Enrichment Analysis:A Knowledge-Based Approach for Interpreting Genome-Wide Expression Profiles[J].PNAS，2005，102(43)：15545-15550.

基本信息:

DOI：10.19850/j.cnki.2096-4706.2026.04.009

中图分类号:R318;TP18;TP391.41

引用信息:

[1]张显,庞慧,刘佳俊.GTransFusion：基于Transformer的多模态表示学习与图结构对齐的融合方法[J].现代信息科技,2026,10(04):49-54+59.DOI:10.19850/j.cnki.2096-4706.2026.04.009.

发布时间：

2026-02-25

出版时间：

2026-02-25

请选择需要下载的pdf数据

现代信息科技

MODERN INFORMATION TECHNOLOGY

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈

引用

GB/T 7714-2015 格式引文

MLA格式引文

APA格式引文

请选择需要下载的pdf数据

现代信息科技

MODERN INFORMATION TECHNOLOGY

使用微信“扫一扫”功能。将此内容分享给您的微信好友或者朋友圈

引用

使用微信“扫一扫”功能。
将此内容分享给您的微信好友或者朋友圈