DevNinja AI Learning Platform

Input Configuration

Sample Sentences

Custom Sentence

Attention Heads

Number of Heads: 8

116

View Head

Learning Modes

Token Information

Thepos: 0

catpos: 1

satpos: 2

onpos: 3

thepos: 4

matpos: 5

Attention Heatmap - Head 1

6 × 6 matrix

How to read: Rows represent query tokens, columns represent key tokens. Darker colors indicate stronger attention weights.

Hover over cells to see which tokens this query token is attending to most strongly.

Attention Analysis & Insights

Tokens

Attention Heads

0.993

Max Weight

0.861

Avg Weight

Head 1 - Top Attention Patterns

cat→sat

0.993

sat→cat

0.993

the→mat

0.992

mat→the

0.992

sat→on

0.992

💡 Key Insights

Multi-Head Benefits: Different attention heads can specialize in different types of relationships (syntactic, semantic, positional).

Attention Patterns: Self-attention allows each token to look at all other tokens, enabling long-range dependencies.

Computational Efficiency: Attention can be computed in parallel for all positions, unlike recurrent mechanisms.

Understanding Attention Mechanisms

✨ What is Attention?

Attention mechanisms allow neural networks to focus on the most relevant parts of the input when processing each element. Instead of treating all inputs equally, attention assigns weights based on relevance.

🔍 How It Works

Each token creates three vectors: Query (what it's looking for), Key (what it offers), and Value (the information it contains). Attention weights are computed by comparing queries with keys.

🎯 Self vs Cross Attention

Self-Attention: Tokens attend to other tokens in the same sequence
Cross-Attention: Tokens attend to tokens from a different sequence

🧠 Multi-Head Attention

Multiple attention heads run in parallel, each learning to focus on different types of relationships. This allows the model to capture various linguistic patterns simultaneously.

📚 Try These Experiments

1. Compare Attention Heads:

Switch between different heads to see how they focus on different relationships in the same sentence.

2. Sentence Length Effect:

Try different sentence lengths and observe how attention patterns change with more context.

3. Token Relationships:

Hover over tokens to see which words they attend to most strongly.

4. Mathematical Foundation:

Use the step-by-step mode to understand the mathematical computation behind attention.

Attention Mechanism Explorer

Input Configuration

Attention Heads

Learning Modes

Token Information

Attention Heatmap - Head 1

Attention Analysis & Insights

Head 1 - Top Attention Patterns

💡 Key Insights

Understanding Attention Mechanisms

✨ What is Attention?

🔍 How It Works

🎯 Self vs Cross Attention

🧠 Multi-Head Attention

📚 Try These Experiments