SE250:lab-5:tlou006

From Marks Wiki
Jump to navigation Jump to search

LAB 5

Q1

Testing with

int sample_size = 1000;
  int n_keys = 1000;
  int table_size = 1000;

and running rt_add_buzhash Testing Buzhash low on 1000 samples


Entropy = 7.843786 bits per byte.

Optimum compression would reduce the size of this 1000 byte file by 1 percent.

Chi square distribution for the 1000 samples is 214.46, and randomly would exceed this value 95.00 percent of the times.

Arithmetic mean value of the data bytes is 128.0860 <127.5 = random>.

Monte Carlo value for Pi is 3.132530120 <error 0.29 percent>.

Serial correlation coefficient is -0.017268 <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 6, expecting 5.51384

Not sure what these results mean yet.


After increasing the sample size to first 100000 then 10000000

I observed..

Entropy got closer to 8 bits per byte.

The percentage compression would reduce the file size by decreased to 0.

Chi square distribution decreased.

Arithmetic mean value of the data byes gets closer to 127.5

Monte Carlo value got closer to PI.

Serial correlation coefficient got closer to 0

llps value was closer to the expected value.

All the results suggest increasing the sample size increases "randomness"



Running rt_add_buzhashn with sample size 100000 and low entropy

Entropy = 7.998236 bits per byte.

Optimum compression would reduce the size of this 100000 byte file by 0 percent.

Chi square distribution for the 100000 samples is 244.84, and randomly would exceed this value 50.00 percent of the times.

Arithmetic mean value of the data bytes is 127.4936 <127.5 = random>.

Monte Carlo value for Pi is 3.137635506 <error 0.13 percent>.

Serial correlation coefficient is -0.003092 <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 999 (!!!!!!), expecting 5.51384

llps = 999 suggests that a lot of values is bunched up in one place


Running rt_add_hash_CRC with sample size 100000 and low entropy

Entropy = 5.574705 bits per byte.

Optimum compression would reduce the size of this 100000 byte file by 30 percent.

Chi square distribution for the 100000 samples is 1398897.03, and randomly would exceed this value 0.01 percent of the times.

Arithmetic mean value of the data bytes is 95.7235 <127.5 = random>.

Monte Carlo value for Pi is 3.747989920 <error 19.30 percent>.

Serial correlation coefficient is -0.075371 <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 13, expecting 5.51384

Running rt_add_base256 with sample size 100000 and low entropy

Entropy = 0.00000 (!!!) bits per byte.

Optimum compression would reduce the size of this 100000 byte file by 100 percent.

Chi square distribution for the 1000 samples is 25500000.00, and randomly would exceed this value 0.01 percent of the times.

Arithmetic mean value of the data bytes is 97.0000 <127.5 = random>.

Monte Carlo value for Pi is 4.0000000 <error 27.32 percent>.

Serial correlation coefficient is undefined <totally uncorrelated = 0.0>.

Buzhash low 1000/1000: llps = 1000 (!!!!), expecting 5.51384

base256 and other hash functions produced many unexpected results.

Maybe the sample size is too large? Suggests buzhash performs well using large sample sizes.