A simple Bag of Words example in PHP

A simple Bag of Words example in PHP

Below is a minimal Bag of Words implementation in pure PHP. We take three short documents, tokenize them, build a shared vocabulary, and convert each document into a word-frequency vector.

 
<?php

$documents 
= [
    
'Cat eats fish',
    
'Cat likes fish',
    
'Dog eats meat',
];

include 
'code.php';

echo 
'Vocabulary: ' implode(', '$vocab) . PHP_EOL PHP_EOL;

foreach (
$bowVectors as $i => $vector) {
    echo 
'Document ' . ($i 1) . ':' PHP_EOL;

    foreach (
$vector as $word => $count) {
        echo 
"  $word => $countPHP_EOL;
    }

    if (
$i 2) {
        echo 
PHP_EOL;
    }
}

Documents:

Cat eats fish
Cat likes fish
Dog eats meat
Result: Memory: 0.006 Mb Time running: < 0.001 sec.
Vocabulary: Cat, eats, fish, likes, Dog, meat

Document 1:
  Cat => 1
  eats => 1
  fish => 1
  likes => 0
  Dog => 0
  meat => 0

Document 2:
  Cat => 1
  eats => 0
  fish => 1
  likes => 1
  Dog => 0
  meat => 0

Document 3:
  Cat => 0
  eats => 1
  fish => 0
  likes => 0
  Dog => 1
  meat => 1