A simple Bag of Words example in PHP
A simple Bag of Words example in PHP
Below is a minimal Bag of Words implementation in pure PHP. We take three short documents, tokenize them, build a shared vocabulary, and convert each document into a word-frequency vector.
Example of code:
<?php
function tokenize(string $text): array {
return explode(' ', $text);
}
$tokenized = array_map('tokenize', $documents);
$vocab = [];
foreach ($tokenized as $doc) {
foreach ($doc as $word) {
$vocab[$word] = true;
}
}
$vocab = array_keys($vocab);
function bagOfWords(array $doc, array $vocab): array {
$vector = array_fill_keys($vocab, 0);
foreach ($doc as $word) {
$vector[$word]++;
}
return $vector;
}
$bowVectors = [];
foreach ($tokenized as $doc) {
$bowVectors[] = bagOfWords($doc, $vocab);
}