Case 2. Classification using RubixML

Implementation with RubixML

Below is the runnable code: we build a Labeled dataset, train a ClassificationTree, and predict the class for a new sample.

 
<?php

use Rubix\ML\Classifiers\ClassificationTree;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;

// Make the example deterministic between runs.
// RubixML may use randomness (e.g. tie-breaking) during training.
mt_srand(42);
srand(42);

// Our tiny training dataset
// Each sample is a 2D numeric vector: [visits, time]
$samples = [
    [
510],
    [
715],
    [
12],
    [
23],
    [
68],
    [
34],
    [
412],
    [
63],
];

// Class labels for each row in $samples
$labels = ['active''active''passive''passive''active''passive''active''passive'];

// Wrap the arrays into a RubixML Labeled dataset.
$dataset = new Labeled($samples$labels);

// Create a decision tree classifier
$estimator = new ClassificationTree(
    
maxHeight5,
    
maxLeafSize2
);

// Fit the model
$estimator->train($dataset);

// Build an Unlabeled dataset for inference
$sample = [46];

$dataset = new Unlabeled([
    
$sample,
]);

// Predict returns an array of labels (one label per row)
$prediction $estimator->predict($dataset);

// Print the predicted label for the first (and only) sample
echo $prediction[0];

Graph:

Decision Tree
Result: Memory: 0.465 Mb Time running: 0.014 sec.
active

RubixML returns the predicted class label for the input vector. Conceptually it is the same split-based approach with a quality criterion (entropy / information gain), but the tree building and split selection are handled internally by the library.