A simple TF–IDF example in PHP

A simple TF–IDF example in PHP

Below is a minimal TF–IDF implementation in pure PHP. We take three short documents (about a cat and a dog), build a vocabulary, compute TF and IDF, and then produce TF–IDF weights for each term in each document.

 
<?php

// Source documents for TF-IDF calculation.
$documents = [
    
'The cat eats fish',
    
'The cat loves fish',
    
'The dog eats canned meat',
];

include 
'code.php';

echo 
'Vocabulary: ' implode(', '$vocab) . PHP_EOL PHP_EOL;

foreach (
$tfidfVectors as $i => $vector) {
    echo 
'Document ' . ($i 1) . ':' PHP_EOL;

    foreach (
$vector as $word => $value) {
        echo 
"  $word => " round($value3) . PHP_EOL;
    }

    if (
$i 2) {
        echo 
PHP_EOL;
    }
}

Documents:

The cat eats fish
The cat loves fish
The dog eats canned meat
Result: Memory: 0.01 Mb Time running: 0.002 sec.
Vocabulary: The, cat, eats, fish, loves, dog, canned, meat

Document 1:
  The => 0
  cat => 0.101
  eats => 0.101
  fish => 0.101

Document 2:
  The => 0
  cat => 0.101
  loves => 0.275
  fish => 0.101

Document 3:
  The => 0
  dog => 0.22
  eats => 0.081
  canned => 0.22
  meat => 0.22