MNIST: decision tree – how the model "sees" pixels

Implementation in RubixML

In this case, we will train a decision tree classifier on MNIST pixel vectors and see how a tree-based model predicts a digit from raw pixels. The goal is not to maximize accuracy, but to connect the idea of splitting the feature space with a very concrete representation: a 28×28 image as 784 numeric features.

Example of code:

 
<?php

use app\classes\MnistLoader;
use 
Rubix\ML\Classifiers\ClassificationTree;
use 
Rubix\ML\Datasets\Labeled;

try {
    
$trainRows MnistLoader::loadIterable('train.csv'categoricalLabelstruenormalizetruedigits: [01]);
    
$testRows MnistLoader::loadIterable('test.csv'categoricalLabelstruenormalizetruedigits: [01]);

    
$dataset Labeled::fromIterator($trainRows);
    
$testDataset Labeled::fromIterator($testRows);
} catch (
Exception $e) {
    echo 
'<div class="alert alert-danger" role="alert">' htmlspecialchars($e->getMessage(), ENT_QUOTES'UTF-8') . '</div>';
    exit;
}

$model = new ClassificationTree(
    
maxHeight10,
    
maxLeafSize5
);

$model->train($dataset);