Case 3. When a tree is convenient in a real product

Implementation with RubixML

In credit scoring, not only accuracy matters, but also transparency. A decision must be understandable: explainable to a risk manager, a customer, and sometimes a regulator. In this case we train a decision tree that outputs "approve" or "reject" based on numeric features, and then add a simple human-readable explanation that can be shown to users.

Example of code:

 
<?php

use Rubix\ML\Classifiers\ClassificationTree;
use 
Rubix\ML\Datasets\Labeled;

mt_srand(42);
srand(42);

$samples = [
    
// income, loan, dti, history, late, cards, age, job, home
    
[5000100000.25023551],
    [
3000150000.62342820],
    [
8000200000.2580345101],
    [
250050000.51212310],
    [
6000120000.36123871],
    [
4000180000.73453030],
    [
700090000.189024191],
    [
3200110000.552132920],
    [
5200160000.424223661],
    [
2800140000.652342620],
    [
9000250000.33100448121],
    [
420080000.384013340],
    [
3500220000.753553130],
    [
6500150000.287123981],
    [
270060000.491212410],
    [
5600130000.316123771],
    [
3800170000.683443030],
    [
480090000.225012761],
    [
7500140000.624434491],
    [
290070000.286022520],
    [
6200100000.582343451],
];

$labels = [
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
    
'approve',
    
'reject',
];

$dataset = new Labeled($samples$labels);

$tree = new ClassificationTree(
    
maxHeight10,
    
maxLeafSize2
);

$tree->train($dataset);