Logistic regression
Case 6. Fraud vs normal transaction
Fraud is a task where mistakes cost money — sometimes a lot. Unlike the previous cases, here it is important not only to predict the class, but also to understand the consequences of an error. Missing a fraudulent transaction and blocking a legitimate one have very different costs. Logistic regression is often used in such tasks as a baseline risk scoring tool.
Case Goal
Determine whether a transaction is fraudulent and estimate the probability of fraud.
The model should:
1) Assess transaction risk
2) Provide a usable probability
3) Allow you to control system sensitivity via a threshold
Example of use:
<?php
use Rubix\ML\Classifiers\LogisticRegression;
use Rubix\ML\Datasets\Labeled;
// Each sample is a transaction described by 3 numeric features (toy example).
// Example interpretation:
// - feature #1: transaction amount
// - feature #2: is international (0/1)
// - feature #3: number of past transactions in a short window
$samples = [
[50, 1, 2],
[5000, 0, 15],
[200, 1, 1],
[7000, 0, 20],
];
// Rubix classifiers require categorical labels.
$labels = ['normal', 'fraud', 'normal', 'fraud'];
$dataset = new Labeled($samples, $labels);
// Train a baseline logistic regression model to estimate fraud risk.
$model = new LogisticRegression();
$model->train($dataset);