Logistic regression

Case 6. Fraud vs normal transaction

Fraud is a task where mistakes cost money — sometimes a lot. Unlike the previous cases, here it is important not only to predict the class, but also to understand the consequences of an error. Missing a fraudulent transaction and blocking a legitimate one have very different costs. Logistic regression is often used in such tasks as a baseline risk scoring tool.

Case Goal
Determine whether a transaction is fraudulent and estimate the probability of fraud.

The model should:
1) Assess transaction risk
2) Provide a usable probability
3) Allow you to control system sensitivity via a threshold

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;

// Each sample is a transaction described by 3 numeric features (toy example).
// Example interpretation:
// - feature #1: transaction amount
// - feature #2: is international (0/1)
// - feature #3: number of past transactions in a short window
$samples = [
    [
5012],
    [
5000015],
    [
20011],
    [
7000020],
];

// Rubix classifiers require categorical labels.
$labels = ['normal''fraud''normal''fraud'];
$dataset = new Labeled($samples$labels);

// Train a baseline logistic regression model to estimate fraud risk.
$model = new LogisticRegression();
$model->train($dataset);

// A new transaction that we want to score.
$transaction = new Unlabeled([[3000010]]);
$prediction $model->predict($transaction);

echo 
'Predicted label (normal or fraud): ' PHP_EOL;
print_r($prediction);

// Get class probabilities to make threshold-based decisions.
$probas $model->proba($transaction);
$probabilityOfFraud $probas[0]['fraud'] ?? null;

echo 
PHP_EOL 'Probability of fraud (class=fraud): ';
print_r($probabilityOfFraud);
echo 
PHP_EOL;

// Adjusting the threshold lets you control sensitivity:
// higher threshold -> fewer blocks (but more fraud may pass), lower -> more blocks (but more false positives).
$threshold 0.7;
$fraud $probabilityOfFraud !== null && $probabilityOfFraud >= $threshold;

echo 
'Threshold: ' $threshold PHP_EOL;
echo 
'Decision: ' . ($fraud 'BLOCK' 'ALLOW') . PHP_EOL;

Result: Memory: 1.19 Mb Time running: 0.02 sec.
Predicted label (normal or fraud): 
Array
(
    [0] => normal
)

Probability of fraud (class=fraud): 0
Threshold: 0.7
Decision: ALLOW