Classical Machine Learning
Regression, trees, SVM, clustering — how algorithms think
Prerequisites
- math-intuition
Before deep learning took over the headlines, "regular" machine learning was already doing amazing things. Credit card fraud detection, email spam filters, product recommendations — all built with algorithms that are, honestly, pretty intuitive once you see them. These aren't relics of the past either. For lots of real-world problems, a well-tuned decision tree still beats a billion-parameter neural network.
The trick is knowing which tool to reach for. Neural networks are like power tools — incredibly capable, but sometimes you just need a screwdriver. Classical ML gives you a whole toolbox of screwdrivers, each designed for a different kind of problem. And the best part? You can usually understand why these algorithms made a particular decision, which matters a lot when you're explaining to your boss why the model rejected someone's loan application.
In this lesson, we'll walk through four algorithms that every ML practitioner should know. You'll see how each one "thinks" — literally, with visual decision boundaries — and start building intuition for when to use which.
Teaching machines to decide
Machine learning problems generally fall into two big buckets. Classification is about sorting things into groups: is this email spam or not? Is this tumor benign or malignant? Will this customer churn? Regression is about predicting a number: what will this house sell for? How many units will we sell next quarter? What's the temperature going to be tomorrow?
There's also a third category called clustering, which is a bit different because you don't have labels to learn from. Instead of "here are examples with the right answers, learn the pattern," it's "here's a bunch of data, find the natural groups." Customer segmentation is a classic example: you don't know ahead of time how many customer types you have, you want the algorithm to discover them.
The ML pipeline
The pipeline is the same regardless of which algorithm you pick: feed in data, let the algorithm learn a pattern, then use the trained model to make predictions on new data it hasn't seen before. The algorithms differ in how they find patterns — some draw lines, some build trees, some memorize examples. Each approach has trade-offs, and picking the right one is half the skill of being an ML practitioner.
Four algorithms you should know
Let's look at four fundamental algorithms. They're not just historically important — they're still used in production systems everywhere. Each one approaches the problem of "learning from data" in a completely different way.
Linear Regression / Classification
The simplest idea: draw the best line. For regression, you're fitting a line through data points to predict continuous values (like house prices). For classification, you're drawing a line that separates two classes.
Linear models are fast to train, easy to interpret, and surprisingly effective. They're the first thing data scientists try because they set a strong baseline. The catch? They assume the relationship is linear — if the real pattern is a curve, a straight line won't capture it.
Now try it yourself
Below is an interactive visualizer with 30 data points from two classes. Toggle between the four algorithms to see how each one draws its decision boundary differently. For KNN, click anywhere on the plot to place a new point and watch the algorithm classify it by majority vote. For K-Means, try adjusting the number of clusters to see how the same data gets partitioned in different ways.
Algorithm Visualizer
Draws the best straight line to separate the two classes. Simple, fast, and easy to interpret.
Key insight: A linear classifier draws one straight line. It works great when data is cleanly separable, but fails on complex patterns. Still used everywhere because it is fast and interpretable.
Key Takeaways
- Classification sorts things into groups (spam/not-spam), regression predicts numbers (house price), and clustering finds natural groupings without labels.
- Linear models draw a straight line to separate classes or predict values. They are fast, interpretable, and a great baseline, but they cannot capture non-linear patterns.
- Decision trees ask a series of yes/no questions to make predictions. They are easy to interpret and explain, but a single tree tends to overfit. Random forests fix this by averaging many trees together.
- K-Nearest Neighbors stores all training data and classifies new points by majority vote among the K closest examples. No training step, but prediction gets slow on large datasets.
- K-Means clustering is unsupervised: it discovers groups in data without labels by iteratively assigning points to the nearest centroid and updating centroids.
Common Misconceptions
- "Classical ML is outdated now that we have deep learning." -- Not at all. For tabular/structured data (spreadsheets, databases), gradient-boosted trees (XGBoost, LightGBM) still outperform neural networks most of the time. Deep learning shines on images, text, and audio, but classical ML dominates structured data.
- "More complex models always give better results." -- A complex model on a small dataset will memorize noise instead of learning patterns. A simple linear model with 100 training examples will usually beat a deep neural network. The best model depends on your data size, data type, and interpretability needs.
- "You always need labeled data to do machine learning." -- Clustering algorithms like K-Means are unsupervised: they find patterns without any labels at all. There is a whole spectrum from supervised (labeled data) to unsupervised (no labels) to semi-supervised (some labels).