(2017): The seminal paper by Vaswani et al. that first introduced the transformer architecture, replacing traditional recurrent networks with the self-attention mechanism.

[2311.17633] Introduction to Transformers: an NLP Perspective

For a broader introduction to the field, these resources are also highly recommended:

: This survey focusing on practical use explores open-access tools and real-world implementations, specifically where text is the primary modality.

: A 2023 review that demystifies the architecture by breaking it down into its core components for beginners.

An essential paper for anyone starting out is by Tong Xiao and Jingbo Zhu. It serves as a comprehensive 119-page guide that bridges the gap between basic concepts and recent advanced techniques.