Logistic regression
MNIST. Binary classification: 0 vs 1
Implementation in pure PHP
This is one of the most famous teaching cases in machine learning. It is simple to state, yet it clearly shows how the model works – and where it starts to break down. Here we make an important turn in the book: we move from tabular data to images. Case goal: learn to classify digit images as "0" or "1" with logistic regression and understand the limitations of a linear model.
Pure PHP implementation uses a toy MNIST-like dataset with flattened 28x28 images and the standard logistic regression training loop.
Example of code: Class LogisticRegression
<?php
namespace app\classes;
class LogisticRegression {
private array $weights;
private float $bias;
private float $learningRate;
public function __construct(int $numFeatures, float $learningRate = 0.1) {
// Start with zeroed parameters for every feature.
$this->learningRate = $learningRate;
$this->weights = array_fill(0, $numFeatures, 0.0);
$this->bias = 0.0;
}
// Convert a linear score into a probability in the range [0, 1].
private function sigmoid(float $z): float {
return 1.0 / (1.0 + exp(-$z));
}
// Compute the weighted sum of the input features.
private function dot(array $a, array $b): float {
$sum = 0.0;
foreach ($a as $i => $v) {
$sum += $v * $b[$i];
}
return $sum;
}
// Return the probability that the sample belongs to class 1.
public function predictProb(array $x): float {
return $this->sigmoid($this->dot($this->weights, $x) + $this->bias);
}
// Convert the probability into a binary class prediction.
public function predict(array $x): int {
return $this->predictProb($x) >= 0.5 ? 1 : 0;
}
// Train the model with simple gradient descent.
public function train(array $X, array $y, int $epochs = 5): void {
foreach (range(1, $epochs) as $epoch) {
foreach ($X as $i => $x) {
// Compare the prediction to the expected label.
$p = $this->predictProb($x);
$error = $p - $y[$i];
// Update each weight using the feature value and error.
foreach ($this->weights as $j => $w) {
$this->weights[$j] -= $this->learningRate * $error * $x[$j];
}
// Update the bias term separately.
$this->bias -= $this->learningRate * $error;
}
// echo "Epoch $epoch done\n";
}
}
// Calculate score a set of predictions
public function score(array $X, array $y): float {
$correct = 0;
foreach ($X as $i => $x) {
if ($this->predict($x) === $y[$i]) {
$correct++;
}
}
return count($X) > 0 ? ($correct / count($X)) : 0.0;
}
}
Example of code: Class MnistLoader
<?php
namespace app\classes;
use Exception;
/**
* MnistLoader - Utility class for loading and preprocessing MNIST digit dataset
*
* This class provides methods to load MNIST data from CSV files with various
* preprocessing options including normalization, digit filtering, and label formatting.
* It supports both array-based loading (for custom implementations) and
* iterable loading (for Rubix ML compatibility).
*/
class MnistLoader {
/**
* Load MNIST data as arrays for custom ML implementations
*/
public static function load(string $file, string $directory = '', bool $categoricalLabels = false, bool $normalize = true, array $digits = [0, 1]): array {
$features = []; // 2D array: each element is an array of 784 pixel values
$labels = []; // 1D array: each element is the corresponding digit label
$handle = self::openFile($file, $directory);
// Process each row in the CSV file
// Each row contains: [label, pixel1, pixel2, ..., pixel784]
while (($row = fgetcsv($handle)) !== false) {
if ($processed = self::processRow($row, $digits, $normalize, $categoricalLabels)) {
$features[] = $processed[0];
$labels[] = $processed[1];
}
}
// Return features and labels in the format expected by callers.
return [$features, $labels];
}
/**
* Load MNIST data as an iterator for Rubix ML compatibility
*/
public static function loadIterable(string $file, string $directory = '', bool $categoricalLabels = false, bool $normalize = true, array $digits = [0, 1]): iterable {
$handle = self::openFile($file, $directory);
// Read the CSV file row by row and keep only valid samples.
while (($row = fgetcsv($handle)) !== false) {
if ($processed = self::processRow($row, $digits, $normalize, $categoricalLabels)) {
yield array_merge($processed[0], [$processed[1]]);
}
}
}
private static function openFile(string $file, string $directory) {
$handle = @fopen($directory . $file, 'r');
if ($handle === false) {
throw new Exception('Dataset file not found: ' . $directory . $file);
}
return $handle;
}
private static function processRow(array &$row, array &$digits, bool $normalize, bool $categoricalLabels): ?array {
if ($row === [] || $row[0] === null || $row[0] === '') {
return null;
}
$label = (int)$row[0];
if (!in_array($label, $digits)) {
return null;
}
$pixels = array_slice($row, 1);
if ($normalize) {
$pixels = array_map(static fn ($v): float => ((float)$v) / 255.0, $pixels);
}
$formattedLabel = $categoricalLabels ? ($label === 1 ? 'one' : 'zero') : $label;
return [$pixels, $formattedLabel];
}
}
MNIST case with RubixML
In real projects, people rarely implement logistic regression from scratch. It is much more convenient to use a machine learning library. The same 0-vs-1 MNIST case with RubixML becomes shorter and closer to production-style code:
Example of code
<?php
use app\classes\MnistLoader;
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
// Build the training and test datasets from the filtered CSV rows.
$trainRows = MnistLoader::loadIterable('train.csv', categoricalLabels: true, normalize: true, digits: [0, 1]);
$testRows = MnistLoader::loadIterable('test.csv', categoricalLabels: true, normalize: true, digits: [0, 1]);
$dataset = Labeled::fromIterator($trainRows);
$testDataset = Labeled::fromIterator($testRows);
$model = new LogisticRegression(epochs: 5);
$model->train($dataset);