Linear Regression with PHP
Multiple Linear Regression with Rubix
Involves two or more independent variables. For example, predicting house prices based on factors like size, number of rooms, and location. This involves more than one independent variable and one dependent variable. The equation for multiple linear regression is: $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n$
Dataset
rooms,size,distance,price
1,500,8,157250
2,750,9,198500
2,700,7,207825
2,850,6,251175
2,900,5,276250
3,1150,6,327250
3,1200,4,374850
3,1350,5,368750
3,1250,5,382500
3,1300,3,425000
4,1500,4,442750
4,1600,3,493000
4,1650,2,531250
4,1700,3,535500
4,1800,3,545000
5,1900,2,612000
5,2000,1,657750
5,2050,2,663000
5,2150,2,678250
5,2100,1,701500
6,2300,2,742750
6,2400,1,786250
6,2450,2,807500
6,2500,1,850000
6,2550,1,862500
Example of use:
<?php
use Rubix\ML\Datasets\Labeled;
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Extractors\CSV;
use Rubix\ML\Regressors\Ridge;
use Rubix\ML\CrossValidation\Metrics\MeanAbsoluteError;
use Rubix\ML\CrossValidation\Metrics\MeanSquaredError;
use Rubix\ML\Transformers\MissingDataImputer;
use Rubix\ML\Transformers\NumericStringConverter;
// Load the raw data from CSV
$dataset = Labeled::fromIterator(new CSV(dirname(__FILE__) . '/data/houses2.csv', true));
// For PHP 8.2
// Convert samples and labels to float
$samples = array_map(fn($sample) => array_map('floatval', $sample), $dataset->samples());
$labels = array_map('floatval', $dataset->labels());
// Create new dataset with float values
$dataset = new Labeled($samples, $labels);
// For php 8.3
// Convert samples and labels to their equivalent integer and floating point types
//$dataset->apply(new NumericStringConverter())
// ->apply(new MissingDataImputer())
// ->transformLabels('intval');
// Create and train Ridge regression model
// 1.0 controls how much we prevent overfitting
$estimator = new Ridge(1e-3);
$estimator->train($dataset);
// Create new samples for prediction
// Important: Each sample must be its own array within the main array
$newSamples = [
[4, 1800, 3], // First house
[2, 1200, 8] // Second house
];
// Create Unlabeled dataset for prediction
$newDataset = new Unlabeled($newSamples);
// Make predictions
$predictions = $estimator->predict($newDataset);
// Print predictions
echo "Predictions for new houses:\n";
echo "--------------------------\n";
foreach ($predictions as $index => $prediction) {
echo sprintf(
"House %d: $%s\n",
$index + 1,
number_format($prediction, 2)
);
}
// Calculate error metrics for actual values
$actualValues = [450000, 280000];
echo "\n\nMetrics:";
echo "\n-------";
$mseMetric = new MeanSquaredError();
$score = $mseMetric->score($predictions, $actualValues);
echo "\nMean Squared Error: $" . number_format(abs($score), 2);
echo "\nRoot Mean Squared Error: $" . number_format(sqrt(abs($score)), 2);
$maeMetric = new MeanAbsoluteError();
$score = $maeMetric->score($predictions, $actualValues);
echo "\nMean Absolute Error: $" . number_format(abs($score), 2);