Case 3: Comparing texts

Implementation in pure PHP

Below is a runnable pure PHP example: tokenization, term frequencies (TF), cosine similarity, and ranking documents by score.

Example of use


                
<?php

require_once __DIR__ . '/code.php';

echo 'Implementation in pure PHP' . PHP_EOL;
echo '----------' . PHP_EOL;

$documents = [
    'Machine learning is used to analyze data, which is crucial for the outcome.',
    'Deep learning is a branch of machine learning.',
    'Cats and dogs are popular pets.',
];

$query = 'machine learning algorithms';

$queryVector = textToVector($query);
$scores = [];

foreach ($documents as $document) {
    $docVector = textToVector($document);
    [$alignedQuery, $alignedDoc] = alignVectors($queryVector, $docVector);

    $scores[] = [
        'text' => $document,
        'score' => cosineSimilarity($alignedQuery, $alignedDoc),
    ];
}

usort($scores, fn ($a, $b) => $b['score'] <=> $a['score']);

print_r($scores);

Result: Memory: 0.01 Mb Time running: 0.001 sec.


                Implementation in pure PHP
----------
Array
(
    [0] => Array
        (
            [text] => Deep learning is a branch of machine learning.
            [score] => 0.54772255750517
        )

    [1] => Array
        (
            [text] => Machine learning is used to analyze data, which is crucial for the outcome.
            [score] => 0.29814239699997
        )

    [2] => Array
        (
            [text] => Cats and dogs are popular pets.
            [score] => 0
        )

)

Takeaway: even without libraries, you can build a clear baseline for text search and relevance ranking.