AI/ML Glossary

Key terms and concepts explained in plain language

Attention

A mechanism that lets models focus on specific parts of the input when making predictions. Think of it like highlighting the most important words in a sentence to understand its meaning.

Backpropagation

The algorithm neural networks use to learn from mistakes. It works backwards through the network, adjusting each layer based on how much it contributed to the error.

BERT

Bidirectional Encoder Representations from Transformers. A model that reads text in both directions at once, making it great at understanding context and meaning.

BPE

Byte Pair Encoding, a popular tokenization method that breaks text into subword units. It finds the most common character pairs and merges them into single tokens.

Chain-of-Thought

A prompting technique where you ask the model to show its reasoning step-by-step. This often leads to better answers on complex problems.

CNN

Convolutional Neural Network, a type of model designed for processing images. It uses layers that scan across the image to detect patterns like edges and shapes.

Decoder

The part of a transformer that generates output, one token at a time. It takes the encoded representation and turns it into text, images, or other outputs.

Embedding

A way to represent words or tokens as vectors of numbers. Similar words end up with similar vectors, capturing meaning in mathematical form.

Encoder

The part of a transformer that reads and processes the input. It converts text into a rich representation that captures meaning and relationships.

Epoch

One complete pass through the entire training dataset. Training a model usually takes many epochs, with the model getting slightly better each time.

Fine-tuning

Taking a pre-trained model and training it a bit more on specific data for your task. Like teaching a chef who knows cooking basics to specialize in Italian cuisine.

GPT

Generative Pre-trained Transformer, a family of models trained to predict the next word. GPT models are decoder-only and excel at generating coherent text.

Gradient Descent

The optimization algorithm that adjusts model weights to reduce errors. It follows the gradient downhill toward better performance, taking small steps to avoid overshooting.

Hallucination

When a language model confidently generates information that sounds plausible but is actually incorrect or made up. A key challenge in making AI reliable.

KV-Cache

Key-Value Cache, a technique that stores previous attention calculations to speed up text generation. Instead of recalculating everything, it reuses what it already computed.

LLM

Large Language Model, a neural network trained on massive amounts of text. These models can understand and generate human-like text across many tasks.

LoRA

Low-Rank Adaptation, an efficient fine-tuning method that adds small trainable matrices to a frozen model. It achieves good results while updating far fewer parameters.

Loss Function

A mathematical measure of how wrong the model's predictions are. Training aims to minimize this loss, gradually improving the model's accuracy.

MoE

Mixture of Experts, an architecture that uses multiple specialized sub-models. A gating mechanism decides which experts to activate for each input, improving efficiency.

Perceptron

The simplest type of artificial neuron, the building block of neural networks. It takes inputs, weights them, adds them up, and outputs a result.

Pre-training

The initial phase where a model learns from massive datasets, picking up general patterns and knowledge. This foundation makes fine-tuning for specific tasks much easier.

Quantization

Reducing the precision of model weights from 32-bit to 8-bit or lower. This makes models smaller and faster with minimal accuracy loss.

RAG

Retrieval-Augmented Generation, a technique that gives models access to external knowledge. It retrieves relevant documents and includes them in the prompt for more accurate answers.

RLHF

Reinforcement Learning from Human Feedback, a training method that uses human preferences to guide the model. It helps align AI behavior with what humans actually want.

RoPE

Rotary Position Embedding, a way to encode position information that works well for long sequences. It rotates embeddings in a way that naturally captures relative positions.

Self-Attention

The core mechanism in transformers that lets each token look at all other tokens in the sequence. It learns which parts of the input are most relevant to each other.

Tokenization

Breaking text into smaller units called tokens. This is the first step in processing language, turning strings into pieces the model can work with.

Transformer

The architecture behind modern language models, based on attention mechanisms. It can process entire sequences in parallel, making it both powerful and efficient.

Vector Database

A specialized database for storing and searching embeddings. It can quickly find the most similar vectors, making it perfect for semantic search and RAG systems.