Linear regression as a basic model
Case 1. Apartment valuation based on parameters
Implementation with RubixML
Now we will do the same thing, but the way it is usually done in real projects. We will use linear regression trained with the least squares method. The library itself will solve the optimization problem and find the weights analytically.
Example of use
<?php
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Regressors\Ridge;
// Data: [area, floor, distance to city center, building age]
$samples = [
[50, 3, 5, 10],
[70, 10, 3, 5],
[40, 2, 8, 30],
];
$targets = [
66_000,
95_000,
45_000,
];
// Create labeled dataset
$dataset = new Labeled($samples, $targets);
// Create linear regression model (Ridge)
// With alpha = 1e-6, Ridge regression is almost equivalent to linear regression
$regression = new Ridge(1e-6);
// Train the model
$regression->train($dataset);
// Make a prediction for a new apartment
// [square footage, number of bedrooms, number of bathrooms, number of floors]
$newApartment = [60, 5, 4, 12];
// Ridge::predict expects a Dataset and returns an array of predictions
$dataset = new Unlabeled([$newApartment]);
$predictions = $regression->predict($dataset);
$predictedPrice = $predictions[0];
echo 'Apartment valuation: $' . number_format($predictedPrice) . PHP_EOL . PHP_EOL;
$weights = $regression->coefficients();
$bias = $regression->bias();
echo 'Weights: ' . implode(', ', array_map(fn ($weight) => number_format($weight, 2, '.', ''), $weights)) . PHP_EOL;
echo 'Bias: ' . number_format($bias, 2, '.', '') . PHP_EOL;
Charts:
Chart Type:
Result:
Memory: 0.993 Mb
Time running: 0.01 sec.
Apartment valuation: $78,037
Weights: 1192.98, 401.07, -132.48, -413.58
Bias: 9945.90
The block above shows the result of the script: the predicted apartment price for the following features:
- Apartment area: 60 m²
- Number of rooms: 5
- Number of bathrooms: 4
- Number of floors: 12