A simple TF–IDF example in PHP
A simple TF–IDF example in PHP
Below is a minimal TF–IDF implementation in pure PHP. We take three short documents (about a cat and a dog), build a vocabulary, compute TF and IDF, and then produce TF–IDF weights for each term in each document.
Example of use
<?php
// Source documents for TF-IDF calculation.
$documents = [
'The cat eats fish',
'The cat loves fish',
'The dog eats canned meat',
];
include 'code.php';
echo 'Vocabulary: ' . implode(', ', $vocab) . PHP_EOL . PHP_EOL;
foreach ($tfidfVectors as $i => $vector) {
echo 'Document ' . ($i + 1) . ':' . PHP_EOL;
foreach ($vector as $word => $value) {
echo " $word => " . round($value, 3) . PHP_EOL;
}
if ($i < 2) {
echo PHP_EOL;
}
}
Documents:
The cat eats fish
The cat loves fish
The dog eats canned meat
Result:
Memory: 0.01 Mb
Time running: 0.002 sec.
Vocabulary: The, cat, eats, fish, loves, dog, canned, meat
Document 1:
The => 0
cat => 0.101
eats => 0.101
fish => 0.101
Document 2:
The => 0
cat => 0.101
loves => 0.275
fish => 0.101
Document 3:
The => 0
dog => 0.22
eats => 0.081
canned => 0.22
meat => 0.22