Logistic regression

Case 6. Fraud vs normal transaction

Fraud is a task where mistakes cost money – sometimes a lot. Unlike the previous cases, here it is important not only to predict the class, but also to understand the consequences of an error. Missing a fraudulent transaction and blocking a legitimate one have very different costs. Logistic regression is often used in such tasks as a baseline risk scoring tool.

Case Goal
Determine whether a transaction is fraudulent and estimate the probability of fraud.

The model should:
1) Assess transaction risk
2) Provide a usable probability
3) Allow you to control system sensitivity via a threshold

Example of use


                
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;

// Each sample is a transaction described by 3 numeric features (toy example).
// Example interpretation:
// - feature #1: transaction amount
// - feature #2: is international (0/1)
// - feature #3: number of past transactions in a short window
$samples = [
    [50, 1, 2],
    [5000, 0, 15],
    [200, 1, 1],
    [7000, 0, 20],
];

// Rubix classifiers require categorical labels.
$labels = ['normal', 'fraud', 'normal', 'fraud'];
$dataset = new Labeled($samples, $labels);

// Train a baseline logistic regression model to estimate fraud risk.
$model = new LogisticRegression();
$model->train($dataset);

// A new transaction that we want to score.
$transaction = new Unlabeled([[3000, 0, 10]]);
$prediction = $model->predict($transaction);

echo 'Predicted label (normal or fraud): ' . PHP_EOL;
print_r($prediction);

// Get class probabilities to make threshold-based decisions.
$probas = $model->proba($transaction);
$probabilityOfFraud = $probas[0]['fraud'] ?? null;

echo PHP_EOL . 'Probability of fraud (class=fraud): ';
print_r($probabilityOfFraud);
echo PHP_EOL;

// Adjusting the threshold lets you control sensitivity:
// higher threshold -> fewer blocks (but more fraud may pass), lower -> more blocks (but more false positives).
$threshold = 0.7;
$fraud = $probabilityOfFraud !== null && $probabilityOfFraud >= $threshold;

echo 'Threshold: ' . $threshold . PHP_EOL;
echo 'Decision: ' . ($fraud ? 'BLOCK' : 'ALLOW') . PHP_EOL;

Result: Memory: 1.191 Mb Time running: 0.019 sec.


                Predicted label (normal or fraud): 
Array
(
    [0] => normal
)

Probability of fraud (class=fraud): 0
Threshold: 0.7
Decision: ALLOW