Case 3: Comparing texts
Implementation in RubixML
Below is a runnable RubixML example: TextNormalizer + WordCountVectorizer + TfIdfTransformer with cosine-based comparison.
Example of use
<?php
use Rubix\ML\Datasets\Unlabeled;
use Rubix\ML\Transformers\TextNormalizer;
use Rubix\ML\Transformers\WordCountVectorizer;
use Rubix\ML\Transformers\TfIdfTransformer;
use Rubix\ML\Kernels\Distance\Cosine;
echo 'Реализация на Rubix ML' . PHP_EOL;
echo '----------' . PHP_EOL;
$documents = [
'Machine learning is used to analyze data, which is crucial for the outcome.',
'Deep learning is a branch of machine learning.',
'Cats and dogs are popular pets.'
];
$query = ['machine learning algorithms'];
$dataset = new Unlabeled(array_merge($documents, $query));
$dataset->apply(new TextNormalizer());
$dataset->apply(new WordCountVectorizer());
$tfidf = new TfIdfTransformer();
$tfidf->fit($dataset);
$dataset->apply($tfidf);
$vectors = $dataset->samples();
$queryVector = array_pop($vectors);
$kernel = new Cosine();
$scores = [];
foreach ($vectors as $i => $vector) {
$similarity = 1.0 - $kernel->compute($queryVector, $vector);
$scores[] = [
'text' => $documents[$i],
'score' => $similarity,
];
}
usort($scores, fn($a, $b) => $b['score'] <=> $a['score']);
print_r($scores);
Result:
Memory: 0.277 Mb
Time running: 0.009 sec.
Реализация на Rubix ML
----------
Array
(
[0] => Array
(
[text] => Deep learning is a branch of machine learning.
[score] => 0.3515956885757
)
[1] => Array
(
[text] => Machine learning is used to analyze data, which is crucial for the outcome.
[score] => 0.17245333859615
)
[2] => Array
(
[text] => Cats and dogs are popular pets.
[score] => 0
)
)
Takeaway: RubixML streamlines a practical text pipeline and provides a stronger foundation for search and text matching tasks.