SE250:lab-5:hals016

From Marks Wiki
Jump to navigation Jump to search

Task 1

For this task I chose the follow values: int sample_size = 200; int n_keys = 10000; int table_size = 100;

A sample size of 200 seems fair looking at a scenario of a small company assigning ID's to their employees.


BuzHash Low

Output for Buzhash Low

Testing Buzhash low on 200 samples
Entropy = 6.961838 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 12 percent.

Chi square distribution for 200 samples is 271.04, and randomly
would exceed this value 25.00 percent of the times.

Arithmetic mean value of data bytes is 129.8200 (127.5 = random).
Monte Carlo value for Pi is 3.030303030 (error 3.54 percent).
Serial correlation coefficient is -0.140593 (totally uncorrelated = 0.0).

Buzhash low 10000/100: llps = 134, expecting 125.959

The randomness of the Buzhash isn't very good given these results. The chi square distribution is only exceeded 25% of the time. Also the serial correlation coefficient is quite low.


BuzHash Typical

Output for Buzhash Typical

Testing Buzhash typical on 200 samples
Entropy = 6.987435 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 12 percent.

Chi square distribution for 200 samples is 240.32, and randomly
would exceed this value 50.00 percent of the times.

Arithmetic mean value of data bytes is 128.0400 (127.5 = random).
Monte Carlo value for Pi is 3.272727273 (error 4.17 percent).
Serial correlation coefficient is -0.005251 (totally uncorrelated = 0.0).

Buzhash typical 10000/100: llps = 127, expecting 125.959
Press any key to continue . . .

Comparing these results with the buzhash low, I think they represent more randomness. This is because the chi square is now a very good value of 50%, the arithmetic value is closer to 127.5(the random value) and also the serial correlation coefficient is a lot closer to 0.


Buzhashn low

Testing Buzhashn low on 200 samples
Entropy = 7.094984 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 11 percent.

Chi square distribution for 200 samples is 209.60, and randomly
would exceed this value 97.50 percent of the times.

Arithmetic mean value of data bytes is 120.9150 (127.5 = random).
Monte Carlo value for Pi is 3.151515152 (error 0.32 percent).
Serial correlation coefficient is 0.099943 (totally uncorrelated = 0.0).

Buzhashn low 10000/100: llps = 133, expecting 125.959
Press any key to continue . . .

The buzhashn low, compared to both buzhash's, does a lot better with the monte carlo value for pi, although not so well with the arithmetic value and llps.


Buzhashn typical

Testing Buzhashn typical on 200 samples
Entropy = 7.094984 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 11 percent.

Chi square distribution for 200 samples is 209.60, and randomly
would exceed this value 97.50 percent of the times.

Arithmetic mean value of data bytes is 120.9150 (127.5 = random).
Monte Carlo value for Pi is 3.151515152 (error 0.32 percent).
Serial correlation coefficient is 0.099943 (totally uncorrelated = 0.0).

Buzhashn typical 10000/100: llps = 127, expecting 125.959

The buzhashn typical looks the same as buzhashn low, except that the llps is a lot closer to the expected value.


hash_CRC low

Testing hash_CRC low on 200 samples
Entropy = 3.470509 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 56 percent.

Chi square distribution for 200 samples is 7305.92, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 94.3400 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.390902 (totally uncorrelated = 0.0).

hash_CRC low 10000/100: llps = 405, expecting 125.959

These values compared with the buzhash/buzhashn are a lot lower/worse. The 2 major ones are: llps 405 expecting 125.96 and the chi square would be exceeded by 0.01%.


hash_CRC typical

Testing hash_CRC typical on 200 samples
Entropy = 6.059310 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 24 percent.

Chi square distribution for 200 samples is 934.08, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 94.7650 (127.5 = random).
Monte Carlo value for Pi is 3.272727273 (error 4.17 percent).
Serial correlation coefficient is 0.129518 (totally uncorrelated = 0.0).

hash_CRC typical 10000/100: llps = 146, expecting 125.959

Results are similar to hash_CRC low. However the monte carlo pi and llps have improved dramatically.


base256 low

Testing base256 low on 200 samples
Entropy = 3.987359 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 50 percent.

Chi square distribution for 200 samples is 4146.88, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 101.0700 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.290495 (totally uncorrelated = 0.0).

base256 low 10000/100: llps = 10000, expecting 125.959

Largest difference between expected llps and actual llps (as of yet), 1000-125.96. Chi square is a large and undesirable number.


base256 typical

Testing base256 typical on 200 samples
Entropy = 3.987359 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 50 percent.

Chi square distribution for 200 samples is 4146.88, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 101.0700 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is 0.290495 (totally uncorrelated = 0.0).

base256 typical 10000/100: llps = 671, expecting 125.959

Little improvements from base256 low, more or less similar.


Java_Integer_hash low

Testing Java_Integer_hash low on 200 samples
Entropy = 2.178861 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 72 percent.

Chi square distribution for 200 samples is 29048.00, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 6.1250 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.227907 (totally uncorrelated = 0.0).

Java_Integer_hash low 10000/100: llps = 109, expecting 125.959


Java_Integer_hash typical

Testing Java_Integer_hash typical on 200 samples
Entropy = 2.178861 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 72 percent.

Chi square distribution for 200 samples is 29048.00, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 6.1250 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.227907 (totally uncorrelated = 0.0).

Java_Integer_hash typical 10000/100: llps = 932, expecting 125.959


Java_Object_hash low

Testing Java_Object_hash low on 200 samples
Entropy = 2.000000 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 75 percent.

Chi square distribution for 200 samples is 12600.00, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 95.5000 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.037404 (totally uncorrelated = 0.0).

Java_Object_hash low 10000/100: llps = 10000, expecting 125.959


Java_Object_hash typical

Testing Java_Object_hash typical on 200 samples
Entropy = 4.511741 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 43 percent.

Chi square distribution for 200 samples is 3604.16, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 78.2500 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.645940 (totally uncorrelated = 0.0).

Java_Object_hash typical 10000/100: llps = 406, expecting 125.959


Java_String_hash low

Testing Java_String_hash low on 200 samples
Entropy = 7.093661 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 11 percent.

Chi square distribution for 200 samples is 214.72, and randomly
would exceed this value 95.00 percent of the times.

Arithmetic mean value of data bytes is 130.9200 (127.5 = random).
Monte Carlo value for Pi is 3.030303030 (error 3.54 percent).
Serial correlation coefficient is 0.052529 (totally uncorrelated = 0.0).

Java_String_hash low 10000/100: llps = 109, expecting 125.959


Java_String_hash typical

Testing Java_String_hash typical on 200 samples
Entropy = 6.193853 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 22 percent.

Chi square distribution for 200 samples is 839.36, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 108.6100 (127.5 = random).
Monte Carlo value for Pi is 3.515151515 (error 11.89 percent).
Serial correlation coefficient is 0.103661 (totally uncorrelated = 0.0).

Java_String_hash typical 10000/100: llps = 123, expecting 125.959


rand low

Testing rand low on 200 samples
Entropy = 4.057145 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 49 percent.

Chi square distribution for 200 samples is 13296.32, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 44.6150 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.045063 (totally uncorrelated = 0.0).

rand low 10000/100: llps = 132, expecting 125.959

Comparing this Unix random number with the other hash functions, its not so good and at the same time not so bad.

rand typical

Testing rand typical on 200 samples
Entropy = 4.057145 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 49 percent.

Chi square distribution for 200 samples is 13296.32, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 44.6150 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is -0.045063 (totally uncorrelated = 0.0).

rand typical 10000/100: llps = 132, expecting 125.959

Similar results to rand low.


high_rand low

Testing high_rand low on 200 samples
Entropy = 0.000000 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 100 percent.

Chi square distribution for 200 samples is 51000.00, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 0.0000 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is undefined (all values equal!).

high_rand low 10000/100: llps = 133, expecting 125.959

Randomness ... not so random compared to everything else, I can only predict the typical to be similar.


high_rand typical

Testing high_rand typical on 200 samples
Entropy = 0.000000 bits per byte.

Optimum compression would reduce the size
of this 200 byte file by 100 percent.

Chi square distribution for 200 samples is 51000.00, and randomly
would exceed this value 0.01 percent of the times.

Arithmetic mean value of data bytes is 0.0000 (127.5 = random).
Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).
Serial correlation coefficient is undefined (all values equal!).

high_rand typical 10000/100: llps = 133, expecting 125.959

Similar to high_rand low.


Conclusion and Functions Ranked in Order of Randomness

The Unix results are quite low compared to the other hash functions and do not produce true randomness.

I have ranked the functions as follows(based on my judgment):

1) BuzHash (very dense information storage)
2) BuzHashn (dense information storage)
3) Java_String_Hash (low entropy, not so much typical entropy)
4) hash_CRC (typical entropy, not so much low entropy)
5) Java_Object_hash
6) Java_Integer_hash
7) base256
8) rand
9) high_rand

http://www.fourmilab.ch/random/