Case 3: Comparing texts
Implementation in pure PHP
Below is a runnable pure PHP example: tokenization, term frequencies (TF), cosine similarity, and ranking documents by score.
Example of use
<?php
require_once __DIR__ . '/code.php';
echo 'Implementation in pure PHP' . PHP_EOL;
echo '----------' . PHP_EOL;
$documents = [
'Machine learning is used to analyze data, which is crucial for the outcome.',
'Deep learning is a branch of machine learning.',
'Cats and dogs are popular pets.'
];
$query = 'machine learning algorithms';
$queryVector = textToVector($query);
$scores = [];
foreach ($documents as $document) {
$docVector = textToVector($document);
[$alignedQuery, $alignedDoc] = alignVectors($queryVector, $docVector);
$scores[] = [
'text' => $document,
'score' => cosineSimilarity($alignedQuery, $alignedDoc),
];
}
usort($scores, fn($a, $b) => $b['score'] <=> $a['score']);
print_r($scores);
Result:
Memory: 0.012 Mb
Time running: 0.001 sec.
Implementation in pure PHP
----------
Array
(
[0] => Array
(
[text] => Deep learning is a branch of machine learning.
[score] => 0.54772255750517
)
[1] => Array
(
[text] => Machine learning is used to analyze data, which is crucial for the outcome.
[score] => 0.29814239699997
)
[2] => Array
(
[text] => Cats and dogs are popular pets.
[score] => 0
)
)
Takeaway: even without libraries, you can build a clear baseline for text search and relevance ranking.