Logistic regression

MNIST. Binary classification: 0 vs 1


MNIST case with RubixML

In real projects, people rarely implement logistic regression from scratch. It is much more convenient to use a machine learning library. The same 0-vs-1 MNIST case with RubixML becomes shorter and closer to production-style code:

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;
use 
Rubix\ML\Extractors\CSV;

try {
    function 
mnistRows(string $file): iterable {
        foreach (new 
CSV($filefalse) as $row) {
            if (!isset(
$row[0])) {
                continue;
            }

            
$label = (int) $row[0];

            if (
$label !== && $label !== 1) {
                continue;
            }

            
$pixels array_map(static fn ($value): float => (float) $valuearray_slice($row1));

            yield 
array_merge($pixels, [$label === 'one' 'zero']);
        }
    }

    
$trainRows mnistRows('train.csv');
    
$testRows mnistRows('test.csv');

    
$dataset Labeled::fromIterator($trainRows);
    
$testDataset Labeled::fromIterator($testRows);
} catch (
Exception $e) {
    echo 
'<div class="alert alert-danger" role="alert">' htmlspecialchars($e->getMessage(), ENT_QUOTES'UTF-8') . '</div>';
    exit;
}

echo 
'Train samples handled: ' number_format($dataset->numSamples()) . PHP_EOL;
echo 
'Test samples handled: ' number_format($testDataset->numSamples()) . PHP_EOL PHP_EOL;

$model = new LogisticRegression(epochs5);
$model->train($dataset);

echo 
'Number of epochs: ' $model->params()['epochs'] . PHP_EOL PHP_EOL;

$correct 0;

foreach (
$testDataset->samples() as $i => $x) {
    
$prediction $model->predict(new Unlabeled([$x]))[0];

    if (
$prediction === $testDataset->labels()[$i]) {
        
$correct++;
    }
}

$accuracy $correct $testDataset->numSamples();
echo 
'Accuracy: ' round($accuracy 1002) . '%';
Sample digit of 0
Probability of digit 0: 1
Predicted digit: 0
Sample of digit 1
Probability of digit 1: 0.978
Predicted digit: 1
Result: Memory: 0 Mb Time running: < 0.001 sec.
Train samples handled: 12,666
Test samples handled: 2,116

Number of epochs: 5

Accuracy: 99.95%