About 149,000 results
Open links in new tab
  1. Architecture and Working of Transformers in Deep Learning

    Feb 27, 2025 · Encoder consists multiple layers and each layer is composed of two main sub-layers: Self-Attention Mechanism: This sub-layer allows the encoder to weigh the importance of different parts of the input sequence differently to capture dependencies regardless of their distance within the sequence.

  2. Transformer (deep learning architecture) - Wikipedia

    Its architecture consists of two parts. The encoder is an LSTM that takes in a sequence of tokens and turns it into a vector. The decoder is another LSTM that converts the vector into a sequence of tokens. Similarly, another 130M-parameter model used gated recurrent units (GRU) instead of …

  3. How Transformers Work: A Detailed Exploration of Transformer

    Jan 9, 2024 · Transformers are a current state-of-the-art NLP model and are considered the evolution of the encoder-decoder architecture. However, while the encoder-decoder architecture relies mainly on Recurrent Neural Networks (RNNs) to extract sequential information, Transformers completely lack this recurrency. So, how do they do it?

  4. 11.7. The Transformer Architecture — Dive into Deep Learning 1.

    At a high level, the Transformer encoder is a stack of multiple identical layers, where each layer has two sublayers (either is denoted as \ (\textrm {sublayer}\)). The first is a multi-head self-attention pooling and the second is a positionwise feed-forward network.

  5. A Deep Dive into Transformers Architecture - Medium

    Dec 3, 2024 · At its core, the Transformer architecture consists of a stack of encoder layers and decoder layers. To avoid confusion, we will refer to individual layers as Encoder or Decoder and use...

  6. Comparing Different Layers in a Transformer Architecture

    On the flip side lies the decoder section, which serves a distinct purpose in the Transformer architecture. While it also comprises self-attention and feedforward components, the decoder integrates an additional layer—the encoder-decoder attention mechanism.

  7. Understanding Transformer Architecture: The Backbone of …

    2 days ago · 1. Encoder: Understanding the Input. The encoder is responsible for taking the input sequence (e.g., a sentence) and transforming it into a rich, contextualized representation. It typically consists of a stack of identical layers. Each encoder layer has two main sub-layers: Multi-Head Self-Attention Mechanism: This is the heart of the Transformer.

  8. A Gentle Introduction to Attention and Transformer Models

    Mar 29, 2025 · The transformer architecture is a type of neural network that is designed to process sequential data, such as text. A signature of transformer models is the use of attention mechanisms to process the input sequence. The transformer architecture is composed of an encoder and a decoder. Each is a stack of identical layers.

  9. The Transformer Model - MachineLearningMastery.com

    Jan 6, 2023 · In a nutshell, the task of the encoder, on the left half of the Transformer architecture, is to map an input sequence to a sequence of continuous representations, which is then fed into a decoder.

  10. Transformer Architecture with Examples

    Mar 15, 2025 · The Transformer, introduced in "Attention is All You Need" (Vaswani et al., 2017), consists of an encoder and a decoder, both built from stacked layers. It’s designed for sequence-to-sequence tasks (e.g., translation), but I’ll describe the general architecture, noting dimensions at …

  11. Some results have been removed