RLHF Training Pipeline
Supervised Fine-Tuning (SFT)
The pre-trained model is fine-tuned on curated datasets of instructions paired with high-quality human-written responses. This teaches basic instruction-following behavior.
Purpose
Teach the model to follow instructions and respond helpfully
Key Points
- •Uses supervised learning with labeled examples
- •Trains on instruction-response pairs
- •Establishes basic helpful behavior
- •Foundation for further RLHF training
Test Prompts
Training Controls
Training Progress0%
Session Stats
0
Positive
0
Negative
0
Responses
0
Epochs
Current Test Prompt
Instruction
Easy
"How do I bake a chocolate cake?"
Why RLHF Matters
Alignment Problem
RLHF helps align AI behavior with human values and preferences, ensuring models are helpful, harmless, and honest.
Scalable Oversight
Reward models enable scalable evaluation of AI outputs without requiring human review of every response.
Iterative Improvement
The process creates a feedback loop for continuous improvement based on human preferences and values.