Case 3: Comparing texts

Implementation in RubixML

Below is a runnable RubixML example: TextNormalizer + WordCountVectorizer + TfIdfTransformer with cosine-based comparison.

 
<?php

use Rubix\ML\Datasets\Unlabeled;
use 
Rubix\ML\Transformers\TextNormalizer;
use 
Rubix\ML\Transformers\WordCountVectorizer;
use 
Rubix\ML\Transformers\TfIdfTransformer;
use 
Rubix\ML\Kernels\Distance\Cosine;

echo 
'Реализация на Rubix ML' PHP_EOL;
echo 
'----------' PHP_EOL;

$documents = [
    
'Machine learning is used to analyze data, which is crucial for the outcome.',
    
'Deep learning is a branch of machine learning.',
    
'Cats and dogs are popular pets.'
];

$query = ['machine learning algorithms'];

$dataset = new Unlabeled(array_merge($documents$query));

$dataset->apply(new TextNormalizer());
$dataset->apply(new WordCountVectorizer());

$tfidf = new TfIdfTransformer();
$tfidf->fit($dataset);
$dataset->apply($tfidf);

$vectors $dataset->samples();
$queryVector array_pop($vectors);

$kernel = new Cosine();
$scores = [];

foreach (
$vectors as $i => $vector) {
    
$similarity 1.0 $kernel->compute($queryVector$vector);

    
$scores[] = [
        
'text' => $documents[$i],
        
'score' => $similarity,
    ];
}

usort($scores, fn($a$b) => $b['score'] <=> $a['score']);

print_r($scores);
Result: Memory: 0.277 Mb Time running: 0.009 sec.
Реализация на Rubix ML
----------
Array
(
    [0] => Array
        (
            [text] => Deep learning is a branch of machine learning.
            [score] => 0.3515956885757
        )

    [1] => Array
        (
            [text] => Machine learning is used to analyze data, which is crucial for the outcome.
            [score] => 0.17245333859615
        )

    [2] => Array
        (
            [text] => Cats and dogs are popular pets.
            [score] => 0
        )

)

Takeaway: RubixML streamlines a practical text pipeline and provides a stronger foundation for search and text matching tasks.