MNIST: decision tree – how the model "sees" pixels

Implementation in RubixML

In this case, we will train a decision tree classifier on MNIST pixel vectors and see how a tree-based model predicts a digit from raw pixels. The goal is not to maximize accuracy, but to connect the idea of splitting the feature space with a very concrete representation: a 28×28 image as 784 numeric features.

Example of code:


<?php

use app\classes\MnistLoader;
use Rubix\ML\Classifiers\ClassificationTree;
use Rubix\ML\Datasets\Labeled;

try {
    $trainRows = MnistLoader::loadIterable('train.csv', categoricalLabels: true, normalize: true, digits: [0, 1]);
    $testRows = MnistLoader::loadIterable('test.csv', categoricalLabels: true, normalize: true, digits: [0, 1]);

    $dataset = Labeled::fromIterator($trainRows);
    $testDataset = Labeled::fromIterator($testRows);
} catch (Exception $e) {
    echo '<div class="alert alert-danger" role="alert">' . htmlspecialchars($e->getMessage(), ENT_QUOTES, 'UTF-8') . '</div>';
    exit;
}

$model = new ClassificationTree(
    maxHeight: 10,
    maxLeafSize: 5
);

$model->train($dataset);