About 614,000 results
Open links in new tab
  1. Multimodal learning - Wikipedia

    Multimodal learning is a type of deep learning that integrates and processes multiple types of data, referred to as modalities, such as text, audio, images, or video.

  2. Multimodal Deep Learning: Definition, Examples, Applications

    Multimodal Deep Learning is a machine learning subfield that aims to train AI models to process and find relationships between different types of data (modalities)—typically, images, video, …

  3. Dual-extraction modeling: A multi-modal deep-learning …

    Sep 9, 2024 · In this study, we present a dual-extraction modeling (DEM) architecture that incorporates a multi-head self-attention mechanism and a fully connected feedforward neural …

  4. In this work, we propose a novel application of deep networks to learn features over multiple modalities. We present a series of tasks for multimodal learning and show how to train deep …

  5. [2301.04856] Multimodal Deep Learning - arXiv.org

    Jan 12, 2023 · This book is the result of a seminar in which we reviewed multimodal approaches and attempted to create a solid overview of the field, starting with the current state-of-the-art …

  6. Multi-modal data clustering using deep learning: A systematic …

    Nov 28, 2024 · Multi-modal clustering represents a formidable challenge in the domain of unsupervised learning. The objective of multi-modal clustering is to categorize data collected …

  7. Multi-model Selection and Computation Resource Allocation …

    6 days ago · However, most existing CSS algorithms primarily focus on increasing the cooperative detection accuracy, while neglecting the computational complexity or sensing latency. …

  8. Defying Multi-Model Forgetting in One-Shot Neural Architecture …

    Feb 11, 2025 · We have theoretically and experimentally proved the effectiveness of the proposed paradigm in overcoming the multi-model forgetting. Besides, we apply the proposed paradigm …

  9. Robust image classification with multi-modal large language models

    In this second phase, Multi-Shield uses the CLIP model as a zero-shot vision-language classifier [17]. It compares the visual representation of the input image with class description prompts to …

  10. Investigation of Multiple Hybrid Deep Learning Models for …

    May 2, 2025 · To optimize the model performance in terms of accuracy and model complexity, the hyperparameters of each algorithm are selected using the Bayesian optimization algorithm. …

Refresh