Case 2. Classification using RubixML

Implementation with RubixML

Below is the runnable code: we build a Labeled dataset, train a ClassificationTree, and predict the class for a new sample.

Example of use


                
<?php

use Rubix\ML\Classifiers\ClassificationTree;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;

// Make the example deterministic between runs.
// RubixML may use randomness (e.g. tie-breaking) during training.
mt_srand(42);
srand(42);

// Our tiny training dataset
// Each sample is a 2D numeric vector: [visits, time]
$samples = [
    [5, 10],
    [7, 15],
    [1, 2],
    [2, 3],
    [6, 8],
    [3, 4],
    [4, 12],
    [6, 3],
];

// Class labels for each row in $samples
$labels = ['active', 'active', 'passive', 'passive', 'active', 'passive', 'active', 'passive'];

// Wrap the arrays into a RubixML Labeled dataset.
$dataset = new Labeled($samples, $labels);

// Create a decision tree classifier
$estimator = new ClassificationTree(
    maxHeight: 5,
    maxLeafSize: 2
);

// Fit the model
$estimator->train($dataset);

// Build an Unlabeled dataset for inference
$sample = [4, 6];

$dataset = new Unlabeled([
    $sample,
]);

// Predict returns an array of labels (one label per row)
$prediction = $estimator->predict($dataset);

// Print the predicted label for the first (and only) sample
echo $prediction[0];

Graph:

Decision Tree

Result: Memory: 0.464 Mb Time running: 0.013 sec.


                active

RubixML returns the predicted class label for the input vector. Conceptually it is the same split-based approach with a quality criterion (entropy / information gain), but the tree building and split selection are handled internally by the library.