Logistic regression

Case 7. Medical screening

Medical tasks are a special domain where machine learning is used very carefully. It is important not only to predict a class, but to interpret the result correctly.

In such scenarios, logistic regression is often used for screening – a preliminary risk assessment, not a diagnosis.

Case Goal
Estimate the probability of having a disease based on basic patient indicators.

The model should:
1) Assess risk
2) Help identify patients who need additional examination
3) Work as a decision-support tool, not a replacement for a doctor

Example of use


                
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;

// Each sample is a patient described by 4 numeric indicators (toy example).
// Example interpretation (you can rename/adjust for your domain):
// - feature #1: age
// - feature #2: systolic blood pressure
// - feature #3: BMI
// - feature #4: cholesterol (or another lab measurement)
$samples = [
    [30, 120, 22, 90],
    [60, 150, 30, 140],
    [45, 140, 28, 130],
    [25, 110, 20, 85],
    [50, 135, 26, 120],
    [55, 145, 29, 135],
    [35, 125, 24, 100],
    [40, 130, 27, 110],
    [65, 160, 32, 150],
    [28, 115, 21, 95],
];

// Rubix classifiers require categorical (discrete) labels, so we use strings.
// Here the model is used for screening: low_risk/high_risk, not a diagnosis.
$labels = [
    'low_risk',
    'high_risk',
    'high_risk',
    'low_risk',
    'high_risk',
    'high_risk',
    'low_risk',
    'low_risk',
    'high_risk',
    'low_risk',
];

$dataset = new Labeled($samples, $labels);

// Train the logistic regression classifier on labeled patient examples.
$model = new LogisticRegression();
$model->train($dataset);

// A new patient to score.
$patient = new Unlabeled([[50, 145, 27, 135]]);

// Predicted class label.
echo 'Predicted label (low_risk or high_risk):' . PHP_EOL;
print_r($model->predict($patient));

// Class probabilities are useful for threshold-based decisions.
echo PHP_EOL . 'Class probabilities:' . PHP_EOL;
print_r($model->proba($patient));

Result: Memory: 1.192 Mb Time running: 0.006 sec.


                Predicted label (low_risk or high_risk):
Array
(
    [0] => high_risk
)

Class probabilities:
Array
(
    [0] => Array
        (
            [low_risk] => 0
            [high_risk] => 1
        )

)