Logistic regression

Case 6. Fraud vs normal transaction

Fraud is a task where mistakes cost money — sometimes a lot. Unlike the previous cases, here it is important not only to predict the class, but also to understand the consequences of an error. Missing a fraudulent transaction and blocking a legitimate one have very different costs. Logistic regression is often used in such tasks as a baseline risk scoring tool.

Case Goal
Determine whether a transaction is fraudulent and estimate the probability of fraud.

The model should:
1) Assess transaction risk
2) Provide a usable probability
3) Allow you to control system sensitivity via a threshold

Example of use:

 
<?php

use Rubix\ML\Classifiers\LogisticRegression;
use 
Rubix\ML\Datasets\Labeled;

// Each sample is a transaction described by 3 numeric features (toy example).
// Example interpretation:
// - feature #1: transaction amount
// - feature #2: is international (0/1)
// - feature #3: number of past transactions in a short window
$samples = [
    [
5012],
    [
5000015],
    [
20011],
    [
7000020],
];

// Rubix classifiers require categorical labels.
$labels = ['normal''fraud''normal''fraud'];
$dataset = new Labeled($samples$labels);

// Train a baseline logistic regression model to estimate fraud risk.
$model = new LogisticRegression();
$model->train($dataset);