GP– Predictor of Data Compression saving

In this research we used Genetic Programming (GP) to generate programs that predict data compression ratio for compression algorithms. GP evolves programs with multiple components. One component analyses statistical features extracted from the files’ byte frequency distribution to come up with a compression ratio prediction. Another component does the same but by analysing statistical features extracted from the files’ raw ASCII representation. A further (evolved) component acts as a decision tree to determine the overall output (compression ratio estimation) returned by an individual. The decision tree produces its result based on a series of comparisons among statistical features extracted from the files and the outputs of the two prediction components. The evolved decision tree has the choice to select either the outputs of the other components or alternatively, to integrate them into an evolved mathematical formula. Experiments with the proposed approach show that GP is able to accurately estimate the compression ratio of unseen files without the need

to run every compression algorithm in question.

 

Datasets:

Here you can download the test files that have been testing the performance of system and compare them with other methods.

· Test set

· Training set

Paper:

Genetic programming as a predictor of Data Compression saving,

GP– Predict the data compression ratio 

Computing and Electronic Engineering

Ahmed Kattan