Big Data Techniques in PHP
Chunked Processing
Chunked processing is crucial when dealing with datasets that are too large to fit in memory. This technique involves processing data in smaller, manageable pieces.
Example of use
<?php
$logHandler = new LogHandler();
$processor = new ChunkedProcessor(1000, '512M', $logHandler);
$result = $processor->processLargeDataset(dirname(__FILE__) . '/large_data.json', function($chunk) {
// Custom processing logic for each chunk
foreach ($chunk as $data) {
// Simulate processing (e.g., database insert, API call, etc.)
echo "Processing: " . json_encode($data) . "\n";
}
});
echo "\n\n";
print_r($result);
Result:
Memory: 0.002 Mb
Time running: < 0.001 sec.
Processing: {"id":1,"name":"John Doe","email":"john.doe@example.com","status":"active"}
Processing: {"id":2,"name":"Jane Smith","email":"jane.smith@example.com","status":"inactive"}
Processing: {"id":3,"name":"Alice Johnson","email":"alice.johnson@example.com","status":"active"}
Processing: {"id":4,"name":"Bob Brown","email":"bob.brown@example.com","status":"pending"}
Processing: {"id":5,"name":"Charlie White","email":"charlie.white@example.com","status":"active"}
Processing: {"id":6,"name":"Diana Green","email":"diana.green@example.com","status":"suspended"}
Processing: {"id":7,"name":"Emily Black","email":"emily.black@example.com","status":"active"}
Processing: {"id":8,"name":"Frank Harris","email":"frank.harris@example.com","status":"inactive"}
Processing: {"id":9,"name":"Grace Lee","email":"grace.lee@example.com","status":"pending"}
Processing: {"id":10,"name":"Henry Walker","email":"henry.walker@example.com","status":"suspended"}
INFO: Processed 10 rows | Memory: 6.00 MB | Speed: 432402.47 rows/sec
Array
(
[total_processed] => 10
[total_failed] => 0
[memory_peak_mb] => 6
[time_taken_sec] => 3.7193298339844E-5
)