Linear Regression with PHP

Multiple Linear Regression with Rubix

Involves two or more independent variables. For example, predicting house prices based on factors like size, number of rooms, and location. This involves more than one independent variable and one dependent variable. The equation for multiple linear regression is: $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_n x_n$

rooms,size,distance,price
1,500,8,157250
2,750,9,198500
2,700,7,207825
2,850,6,251175
2,900,5,276250
3,1150,6,327250
3,1200,4,374850
3,1350,5,368750
3,1250,5,382500
3,1300,3,425000
4,1500,4,442750
4,1600,3,493000
4,1650,2,531250
4,1700,3,535500
4,1800,3,545000
5,1900,2,612000
5,2000,1,657750
5,2050,2,663000
5,2150,2,678250
5,2100,1,701500
6,2300,2,742750
6,2400,1,786250
6,2450,2,807500
6,2500,1,850000
6,2550,1,862500

Example of use:

 
<?php

use Rubix\ML\Datasets\Labeled;
use 
Rubix\ML\Datasets\Unlabeled;
use 
Rubix\ML\Extractors\CSV;
use 
Rubix\ML\Regressors\Ridge;
use 
Rubix\ML\CrossValidation\Metrics\MeanAbsoluteError;
use 
Rubix\ML\CrossValidation\Metrics\MeanSquaredError;
use 
Rubix\ML\Transformers\MissingDataImputer;
use 
Rubix\ML\Transformers\NumericStringConverter;

// Load the raw data from CSV
$dataset Labeled::fromIterator(new CSV(dirname(__FILE__) . '/data/houses2.csv'true));

// For PHP 8.2
// Convert samples and labels to float
$samples array_map(fn($sample) => array_map('floatval'$sample), $dataset->samples());
$labels array_map('floatval'$dataset->labels());
// Create new dataset with float values
$dataset = new Labeled($samples$labels);

// For php 8.3
// Convert samples and labels to their equivalent integer and floating point types
//$dataset->apply(new NumericStringConverter())
//    ->apply(new MissingDataImputer())
//    ->transformLabels('intval');

// Create and train Ridge regression model
// 1.0 controls how much we prevent overfitting
$estimator = new Ridge(1e-3);
$estimator->train($dataset);

// Create new samples for prediction
// Important: Each sample must be its own array within the main array
$newSamples = [
    [
418003],  // First house
    
[212008]   // Second house
];

// Create Unlabeled dataset for prediction
$newDataset = new Unlabeled($newSamples);

// Make predictions
$predictions $estimator->predict($newDataset);

// Print predictions
echo "Predictions for new houses:\n";
echo 
"--------------------------\n";
foreach (
$predictions as $index => $prediction) {
    echo 
sprintf(
        
"House %d: $%s\n",
        
$index 1,
        
number_format($prediction2)
    );
}

// Calculate error metrics for actual values
$actualValues = [450000280000];

echo 
"\n\nMetrics:";
echo 
"\n-------";
$mseMetric = new MeanSquaredError();
$score $mseMetric->score($predictions$actualValues);
echo 
"\nMean Squared Error: $" number_format(abs($score), 2);
echo 
"\nRoot Mean Squared Error: $" number_format(sqrt(abs($score)), 2);

$maeMetric = new MeanAbsoluteError();
$score $maeMetric->score($predictions$actualValues);
echo 
"\nMean Absolute Error: $" number_format(abs($score), 2);