Logistic regression
Case 7. Medical screening
Medical tasks are a special domain where machine learning is used very carefully. It is important not only to predict a class, but to interpret the result correctly.
In such scenarios, logistic regression is often used for screening — a preliminary risk assessment, not a diagnosis.
Case Goal
Estimate the probability of having a disease based on basic patient indicators.
The model should:
1) Assess risk
2) Help identify patients who need additional examination
3) Work as a decision-support tool, not a replacement for a doctor
Example of use
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
// Each sample is a patient described by 4 numeric indicators (toy example).
// Example interpretation (you can rename/adjust for your domain):
// - feature #1: age
// - feature #2: systolic blood pressure
// - feature #3: BMI
// - feature #4: cholesterol (or another lab measurement)
$samples = [
[30, 120, 22, 90],
[60, 150, 30, 140],
[45, 140, 28, 130],
[25, 110, 20, 85],
[50, 135, 26, 120],
[55, 145, 29, 135],
[35, 125, 24, 100],
[40, 130, 27, 110],
[65, 160, 32, 150],
[28, 115, 21, 95],
];
// Rubix classifiers require categorical (discrete) labels, so we use strings.
// Here the model is used for screening: low_risk/high_risk, not a diagnosis.
$labels = [
'low_risk',
'high_risk',
'high_risk',
'low_risk',
'high_risk',
'high_risk',
'low_risk',
'low_risk',
'high_risk',
'low_risk',
];
$dataset = new Labeled($samples, $labels);
// Train the logistic regression classifier on labeled patient examples.
$model = new LogisticRegression();
$model->train($dataset);
// A new patient to score.
$patient = new Unlabeled([[50, 145, 27, 135]]);
// Predicted class label.
echo 'Predicted label (low_risk or high_risk):' . PHP_EOL;
print_r($model->predict($patient));
// Class probabilities are useful for threshold-based decisions.
echo PHP_EOL . 'Class probabilities:' . PHP_EOL;
print_r($model->proba($patient));
Result:
Memory: 1.191 Mb
Time running: 0.025 sec.
Predicted label (low_risk or high_risk):
Array
(
[0] => high_risk
)
Class probabilities:
Array
(
[0] => Array
(
[low_risk] => 6.3793414994961E-13
[high_risk] => 0.99999999999936
)
)