Logistic regression
Case 6. Fraud vs normal transaction
Fraud is a task where mistakes cost money — sometimes a lot. Unlike the previous cases, here it is important not only to predict the class, but also to understand the consequences of an error. Missing a fraudulent transaction and blocking a legitimate one have very different costs. Logistic regression is often used in such tasks as a baseline risk scoring tool.
Case Goal
Determine whether a transaction is fraudulent and estimate the probability of fraud.
The model should:
1) Assess transaction risk
2) Provide a usable probability
3) Allow you to control system sensitivity via a threshold
Example of use
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
// Each sample is a transaction described by 3 numeric features (toy example).
// Example interpretation:
// - feature #1: transaction amount
// - feature #2: is international (0/1)
// - feature #3: number of past transactions in a short window
$samples = [
[50, 1, 2],
[5000, 0, 15],
[200, 1, 1],
[7000, 0, 20],
];
// Rubix classifiers require categorical labels.
$labels = ['normal', 'fraud', 'normal', 'fraud'];
$dataset = new Labeled($samples, $labels);
// Train a baseline logistic regression model to estimate fraud risk.
$model = new LogisticRegression();
$model->train($dataset);
// A new transaction that we want to score.
$transaction = new Unlabeled([[3000, 0, 10]]);
$prediction = $model->predict($transaction);
echo 'Predicted label (normal or fraud): ' . PHP_EOL;
print_r($prediction);
// Get class probabilities to make threshold-based decisions.
$probas = $model->proba($transaction);
$probabilityOfFraud = $probas[0]['fraud'] ?? null;
echo PHP_EOL . 'Probability of fraud (class=fraud): ';
print_r($probabilityOfFraud);
echo PHP_EOL;
// Adjusting the threshold lets you control sensitivity:
// higher threshold -> fewer blocks (but more fraud may pass), lower -> more blocks (but more false positives).
$threshold = 0.7;
$fraud = $probabilityOfFraud !== null && $probabilityOfFraud >= $threshold;
echo 'Threshold: ' . $threshold . PHP_EOL;
echo 'Decision: ' . ($fraud ? 'BLOCK' : 'ALLOW') . PHP_EOL;
Result:
Memory: 1.19 Mb
Time running: 0.02 sec.
Predicted label (normal or fraud):
Array
(
[0] => normal
)
Probability of fraud (class=fraud): 0
Threshold: 0.7
Decision: ALLOW