SE250:lab-5:klod004

From Marks Wiki
Jump to navigation Jump to search

Lab 5 This lab was dealing with hashing, which I think is a rather confusing topic. There are only 2 tasks involved. First is to compare the randomness of the Hash Functions provided and the other to measure the performance of the hash functions.


Task 1


Arithmetic Mean: 127.5 is the expected value (Centre between highest and lowest values).

Monte Carlo: the error generated by this number should be as close to 0 as possible for higher randomness.

Serial Correlation: value should be close to 0 to show that there is no relationship between the values.


Sample_size is the amount of values that we want to put into the table.

n_keys is the number of rows in the table.

table_size is the size of the table itself.



The values for sample_size, n_keys and table_size was:


Results are in the format: Entropy, Chi Square, Arithmetic Mean, Monte Carlo, Serial Correlation, llps, expectation


Trial 1: 70, 100, 50. Results: 6.003258, 226.67, 125.2917, 3.0000, -.106681, 3, 2.94456.

Trial 2: 70, 200, 50. Results: 6.003258, 226.67, 125.2917, 3.0000, -0.106681, 9, 9.11619.

Trial 3: 70, 50, 200. Results: 6.003258, 226.67, 125.2917, 3.0000, -1.006681, 2, 2.37434.

Trial 4: 10000, 100, 2000. Results: 7.985498, 201.50, 125.8253, 3.181272509, -0.000047, 2, 1.9509.

Trial 5: 5000, 40, 1000. Results: 7.969523, 208.17, 127.0374, 3.121248499, 0.009778, 2, 1.55167.

Trial 6: 50, 4, 20. Results:5.585055, 233.54, 125.6731, 3.0000, -0.180685, 1, 1.30341

Trial 7: 10000, 1000, 100000. Results: 7.985498, 99%, 125.8253, 1.26%, -0.000047, 2, 2.00948


Now after talking with the lecturer, different set of trials were made; the typical entropy set was used as well; other functions were also used to find good values for n_keys and table_size. The value for the sample space was considered to be good as the entropy was more settled at 5000+ values.


After the new trials, all functions with typical and low entropy; the results yielded was: Values used were: SS = 10000, keys = 1000, table = 50000.


Buzhash low          7.98550 99.00%  125.825 1.26%   -0.000047
Buzhash typical      7.97970 25.00%  127.177 1.95%   -0.007153
Buzhashn low         7.98214 50.00%  127.589 2.10%   0.000600
Buzhashn typical     7.98214 50.00%  127.589 2.10%   0.000600
hash_CRC low         4.85247 0.01%   99.261  26.10%  -0.285059
hash_CRC typical     7.77880 0.01%   123.087 1.11%   -0.002250
base256 low          0.00000 0.01%   97.000  27.32%  undefined
base256 typical      3.90309 0.01%   107.987 27.32%  0.079032
Java Integer low     4.11606 0.01%   32.385  27.32%  -0.217081
Java Integer typical 4.11606 0.01%   32.385  27.32%  -0.217081
Java Object low      2.00000 0.01%   77.000  27.32%  -0.521556
Java Object typical  4.93851 0.01%   114.920 1.26%   -0.404698
Java String low      7.99463 99.99%  127.867 1.95%   0.009136
Java String typical  7.89301 0.01%   126.009 1.42%   -0.021360
RAND low             7.93533 0.01%   112.054 10.82%  -0.066125
RAND typical         7.93951 0.01%   111.686 10.82%  -0.060458
High RAND low        7.98301 75.00%  127.541 0.42%   0.001185
High RAND typical    7.98338 75.00%  128.211 1.49%   -0.014177


Values used were: SS = 10000, keys = 300, table = 50000.


Buzhash low           7.98550 99.00%  125.825 1.26%   -0.000047
Buzhash typical       7.97970 25.00%  127.177 1.95%   -0.007153
Buzhashn low          7.98214 50.00%  127.589 2.10%   0.000600
Buzhashn typical      7.98214 50.00%  127.589 2.10%   0.000600
hash_CRC low          4.85247 0.01%   99.261  26.10%  -0.285059
hash_CRC typical      7.77880 0.01%   123.087 1.11%   -0.002250
base256 low           0.00000 0.01%   97.000  27.32%  undefined
base256 typical       3.90309 0.01%   107.987 27.32%  0.079032
Java Integer low      4.11606 0.01%   32.385  27.32%  -0.217081
Java Integer typical  4.11606 0.01%   32.385  27.32%  -0.217081
Java Object low       2.00000 0.01%   77.000  27.32%  -0.521556
Java Object typical   4.93851 0.01%   122.920 28.77%  -0.427386
Java String low       7.99463 99.99%  127.867 1.95%   0.009136
Java String typical   7.89301 0.01%   126.009 1.42%   -0.021360
RAND low              7.93533 0.01%   112.054 10.82%  -0.066125
RAND typical          7.93951 0.01%   111.686 10.82%  -0.060458
High RAND low         7.98301 75.00%  127.541 0.42%   0.001185
High RAND typical     7.98338 75.00%  128.211 1.49%   -0.014177

Values used were: SS = 10000, keys = 200, table = 20000.

Buzhash low           7.98550 99.00%  125.825 1.26%   -0.000047
Buzhash typical       7.97970 25.00%  127.177 1.95%   -0.007153 
Buzhashn low          7.98214 50.00%  127.589 2.10%   0.000600
Buzhashn typical      7.98214 50.00%  127.589 2.10%   0.000600
hash_CRC low          4.85247 0.01%   99.261  26.10%  -0.285059
hash_CRC typical      7.77880 0.01%   123.087 1.11%   -0.002250
base256 low           0.00000 0.01%   97.000  27.32%  undefined
base256 typical       3.90309 0.01%   107.987 27.32%  0.079032
Java Integer low      4.11606 0.01%   32.385  27.32%  -0.217081
Java Integer typical  4.11606 0.01%   32.385  27.32%  -0.217081
Java Object low       2.00000 0.01%   77.000  27.32%  -0.521556
Java Object typical   4.93851 0.01%   114.920 1.26%   -0.404698
Java String low       7.99463 99.99%  127.867 1.95%   0.009136
Java String typical   7.89301 0.01%   126.009 1.42%   -0.021360
RAND low              7.93533 0.01%   112.054 10.82%  -0.066125
RAND typical          7.93951 0.01%   111.686 10.82%  -0.060458
High RAND low         7.98301 75.00%  127.541 0.42%   0.001185
High RAND typical     7.98338 75.00%  128.211 1.49%   -0.014177


With all these experiments, and by mostly trying to pass the Chi test as this was understood to be the hardest test to pass, the Buzhashn function was ranked to be the best function for hashing.

There was no time for me to do task 2, but all I could understand was that the values gotten from llps and llps_i should be as close to the expected value as possible.