Input Configuration
Attention Heads
Learning Modes
Token Information
Attention Heatmap - Head 1
How to read: Rows represent query tokens, columns represent key tokens. Darker colors indicate stronger attention weights.
Hover over cells to see which tokens this query token is attending to most strongly.
Attention Analysis & Insights
Head 1 - Top Attention Patterns
💡 Key Insights
Understanding Attention Mechanisms
✨ What is Attention?
Attention mechanisms allow neural networks to focus on the most relevant parts of the input when processing each element. Instead of treating all inputs equally, attention assigns weights based on relevance.
🔍 How It Works
Each token creates three vectors: Query (what it's looking for), Key (what it offers), and Value (the information it contains). Attention weights are computed by comparing queries with keys.
🎯 Self vs Cross Attention
- Self-Attention: Tokens attend to other tokens in the same sequence
- Cross-Attention: Tokens attend to tokens from a different sequence
🧠 Multi-Head Attention
Multiple attention heads run in parallel, each learning to focus on different types of relationships. This allows the model to capture various linguistic patterns simultaneously.
📚 Try These Experiments
Switch between different heads to see how they focus on different relationships in the same sentence.
Try different sentence lengths and observe how attention patterns change with more context.
Hover over tokens to see which words they attend to most strongly.
Use the step-by-step mode to understand the mathematical computation behind attention.