A simple Bag of Words example in PHP
A simple Bag of Words example in PHP
Below is a minimal Bag of Words implementation in pure PHP. We take three short documents, tokenize them, build a shared vocabulary, and convert each document into a word-frequency vector.
Example of use
<?php
$documents = [
'Cat eats fish',
'Cat likes fish',
'Dog eats meat',
];
include 'code.php';
echo 'Vocabulary: ' . implode(', ', $vocab) . PHP_EOL . PHP_EOL;
foreach ($bowVectors as $i => $vector) {
echo 'Document ' . ($i + 1) . ':' . PHP_EOL;
foreach ($vector as $word => $count) {
echo " $word => $count" . PHP_EOL;
}
if ($i < 2) {
echo PHP_EOL;
}
}
Documents:
Cat eats fish
Cat likes fish
Dog eats meat
Result:
Memory: 0.006 Mb
Time running: < 0.001 sec.
Vocabulary: Cat, eats, fish, likes, Dog, meat
Document 1:
Cat => 1
eats => 1
fish => 1
likes => 0
Dog => 0
meat => 0
Document 2:
Cat => 1
eats => 0
fish => 1
likes => 1
Dog => 0
meat => 0
Document 3:
Cat => 0
eats => 1
fish => 0
likes => 0
Dog => 1
meat => 1