News

The original transformer was designed as a sequence-to-sequence (seq2seq) model for machine translation (of course, seq2seq models are not limited to translation tasks).