SE250:lab-5:jhor053

From Marks Wiki
Jump to navigation Jump to search

Lab 5

Task 1

With:

int sample_size = 1000;
int n_keys = 200;
int table_size = 100;

My sample size is a fairly large as we wanted to test to make sure the data is being tested to make sure ti is random enough. I also chose to make sure my ratio of keys to table size is 2 to make sure it can handle more keys than table size,

My results are:

For low_entropy_src
Type		Entropy	ChiSq	Mean	Pi % er	S. C. C.
buzhash         7.84379 95.00%  128.086 0.29%   -0.017268
buzhashn        7.82387 90.00%  127.373 1.06%   -0.007118
hash_CRC        4.04588 0.01%   94.848  27.32%  -0.395249
base256         0.00000 0.01%   97.000  27.32%  undefined
Java_Integer    2.79173 0.01%   31.125  27.32%  -0.230200
Java_Object     2.00000 0.01%   77.000  27.32%  -0.521556
Java_String     7.91760 99.99%  126.441 1.25%   0.003240
rand            7.71844 0.01%   110.541 8.92%   -0.048389
high_rand       7.79205 25.00%  134.546 4.12%   -0.028254

Now for typical_entropy_src
Type		Entropy	ChiSq	Mean	Pi % er	S. C. C.
buzhash         7.79778 50.00%  126.574 4.31%   -0.007005
buzhashn        7.82387 90.00%  127.373 1.06%   -0.007118
hash_CRC        4.21252 0.01%   92.006  26.56%  -0.465003
base256         0.00000 0.01%   97.000  27.32%  undefined
Java_Integer    2.79173 0.01%   31.125  27.32%  -0.230200
Java_Object     2.00000 0.01%   77.000  27.32%  -0.521556
Java_String     7.90224 99.99%  126.914 6.61%   0.025449
rand            7.76960 5.00%   112.412 11.98%  -0.044490
high_rand       7.82756 90.00%  128.999 1.82%   -0.025330

The difference between rand and high_rand is that high rand is generally better but comes at some slight processing power and memory use. High rand tends to 'over' randomize where as rand just 'under' randomizes (ie tends to be biased to under values rather than higher values).

I would rate in order from best to worse... 1, buzhash it generally turned out teh better random variables and overall got the better values compared to the 'expected random values'. 2, buzhashn, was more reliable across typical entropy and low entropy values 3, high_rand was off a bit more but still acceptable 4, rand was just below high random as even though its values are good its still off (+- but lower than) high_rand. 5, Java_string was good but its ChiSq value let it down as its putting it in the extreme for randomness. 6th, Java_integer was slightly better than the below but still really fail it seems. 7th equal, Java_Object, hash_CRC, and base256 failed as the tests didn't seems all that reliable to go off these tests and seems to be quite off 'expected random variables' and generally low quality.

Task 2

Overall

A very good intro to hashing, and great input from John H for explaining the different concepts and tests for randomness. The length was good (I was a bit slow this morn :S) A bit more explanation of the different test on the handout would have helped too though.