Big Data Techniques in PHP

Dataset Generator

Generators provide a memory-efficient way to iterate over large datasets by yielding values one at a time.

Example of class DatasetGenerator:

 
<?php

class DatasetGenerator {
    private 
$bufferSize;

    public function 
__construct($bufferSize 8192) {
        
$this->bufferSize $bufferSize;
    }

    public function 
readLargeFile($filename) {
        
$handle fopen($filename'r');

        while (!
feof($handle)) {
            
$buffer fread($handle$this->bufferSize);
            
$lines explode("\n"$buffer);

            foreach (
$lines as $line) {
                if (
trim($line) !== '') {
                    yield 
json_decode($linetrue);
                }
            }
        }

        
fclose($handle);
    }

    public function 
processBatchedData($filename$batchSize 100) {
        
$batch = [];

        foreach (
$this->readLargeFile($filename) as $item) {
            
$batch[] = $item;

            if (
count($batch) >= $batchSize) {
                yield 
$batch;
                
$batch = [];
            }
        }

        if (!empty(
$batch)) {
            yield 
$batch;
        }
    }
}