SE250:lab-5:hpan027

From Marks Wiki
Jump to navigation Jump to search

Initial problems

A lot of time was spent at the start of the lab to try to understand the code and the statistical results.


Determining the sample size

To determine the sample size, I carried out a series of ent_test with different sample sizes.

Buzhash low 5		3.00000	50.00%	132.875	100.00%	-0.509682
Buzhash low 10		3.58496	50.00%	107.667	36.34%	0.235256
Buzhash low 100		6.15366	2.50%	130.460	3.45%	-0.088601
Buzhash low 1000	7.84646	97.50%	126.550	2.01%	0.006193
Buzhash low 10000	7.97970	25.00%	127.177	1.95%	-0.007153
Buzhash low 100000	7.99827	50.00%	127.587	0.14%	0.000712
Buzhash low 1000000	7.99989	99.99%	127.501	0.23%	-0.000832

Basically, the conclusion was somewhere around 10000 the results stop varying much, and hence 10,000 was the chosen sample size for the rest of the tests.


Results for part one

rand low		7.71844	0.01%	110.541	8.92%	-0.048389
high rand low		7.79205	25.00%	134.546	4.12%	-0.028254
buzhash low		7.84379	95.00%	128.086	0.29%	-0.017268
buzhashn low		7.82387	90.00%	127.373	1.06%	-0.007118
hash_CRC low		4.04588	0.01%	94.848	27.32%	-0.395249
base256 low		0.00000	0.01%	97.000	27.32%	undefined
Java_Integer low	2.79173	0.01%	31.125	27.32%	-0.230200
Java_Object low		2.00000	0.01%	77.000	27.32%	-0.521556
Java_String low		7.91760	99.99%	126.441	1.25%	0.003240

rand high		7.76960	5.00%	112.412	11.98%	-0.044490
high rand high		7.82756	90.00%	128.999	1.82%	-0.025330
buzhash high		7.79778	50.00%	126.574	4.31%	-0.007005
buzhashn high		7.82387	90.00%	127.373	1.06%	-0.007118
hash_CRC high		7.20246	0.01%	114.932	2.01%	-0.032076
base256 high		3.91922	0.01%	106.410	27.32%	0.217294
Java_Integer high	2.79173	0.01%	31.125	27.32%	-0.230200
Java_Object high	3.77034	0.01%	41.971	27.32%	-0.099688
Java_String high	7.37782	0.01%	117.390	8.92%	-0.013887


Conclusions for part one

It was very difficult to determine the order of "randomness" of each function because it's hard to weigh each statistical test. In the end, the order was decided by how many categories each functions "won" in and how many they "lost" in.

1) buzhashn
2) buzhash
3) high rand
4) Java_String
5) rand
6) hash_CRC
7) base256
8) Java_Object
9) Java_Integer
  • It was surprising in terms of "score" using a low entropy source and high entropy source actually didn't make much difference to the performance of the hash functions. Although the above rank was decided using an overall score from both low and high entropy source, the standings would not have changed much if we were to rank the functions separately.
  • It is likely the ranks for randomness would change depending on sample size. This was clearly seen earlier when buzhash was ran multiple times. It is likely certain functions perform better within a certain range of numbers and hence are disadvantaged by this particular sample size.


Conclusions for part two

  • Buzhash is pretty much consistent with the expected result
  • Java_String seems to perform better than expected
  • Java_Object seems to be broken for with a low entropy source
  • hash_CRC tends to have a higher llps than the expected

Data for part two

Buzhash low 1000/10000: llps = 4, expecting 2.82556
hash_CRC low 1000/10000: llps = 2, expecting 2.82556
Java_Object_hash low 1000/10000: llps = 1000, expecting 2.82556
Java_String_hash low 1000/10000: llps = 1, expecting 2.82556

Buzhash low 1000000/100000: llps = 26, expecting 26.6057
hash_CRC low 1000000/100000: llps = 25, expecting 26.6057
Java_Object_hash low 1000000/100000: llps = 1000000, expecting 26.6057
Java_String_hash low 1000000/100000: llps = 18, expecting 26.6057

Buzhash low 10000/10000: llps = 7, expecting 6.67222
hash_CRC low 10000/10000: llps = 12, expecting 6.67222
Java_Object_hash low 10000/10000: llps = 10000, expecting 6.67222
Java_String_hash low 10000/10000: llps = 5, expecting 6.67222

Buzhash low 20000/10000: llps = 10, expecting 9.37449
hash_CRC low 20000/10000: llps = 22, expecting 9.37449
Java_Object_hash low 20000/10000: llps = 20000, expecting 9.37449
Java_String_hash low 20000/10000: llps = 6, expecting 9.37449

Buzhash low 40000/10000: llps = 15, expecting 13.7119
hash_CRC low 40000/10000: llps = 29, expecting 13.7119
Java_Object_hash low 40000/10000: llps = 40000, expecting 13.7119
Java_String_hash low 40000/10000: llps = 7, expecting 13.7119

Buzhash low 50000/10000: llps = 16, expecting 15.6448
hash_CRC low 50000/10000: llps = 49, expecting 15.6448
Java_Object_hash low 50000/10000: llps = 50000, expecting 15.6448
Java_String_hash low 50000/10000: llps = 10, expecting 15.6448


Buzhash low 100000/10000: llps = 22, expecting 24.2788
hash_CRC low 100000/10000: llps = 69, expecting 24.2788
Java_Object_hash low 100000/10000: llps = 100000, expecting 24.2788
Java_String_hash low 100000/10000: llps = 16, expecting 24.2788



Buzhash high 10000/100000: llps = 3, expecting 3.3271
hash_CRC high 10000/100000: llps = 3, expecting 3.3271
Java_Object_hash high 10000/100000: llps = 2, expecting 3.3271
Java_String_hash high 10000/100000: llps = 4, expecting 3.3271

Buzhash high 100000/100000: llps = 7, expecting 7.75952
hash_CRC high 100000/100000: llps = 9, expecting 7.75952
Java_Object_hash high 100000/100000: llps = 16, expecting 7.75952
Java_String_hash high 100000/100000: llps = 8, expecting 7.75952


Buzhash high 20000/10000: llps = 9, expecting 9.37449
hash_CRC high 20000/10000: llps = 11, expecting 9.37449
Java_Object_hash high 20000/10000: llps = 25, expecting 9.37449
Java_String_hash high 20000/10000: llps = 10, expecting 9.37449

Buzhash high 30000/10000: llps = 11, expecting 11.6473
hash_CRC high 30000/10000: llps = 12, expecting 11.6473
Java_Object_hash high 30000/10000: llps = 36, expecting 11.6473
Java_String_hash high 30000/10000: llps = 11, expecting 11.6473

Buzhash high 40000/10000: llps = 14, expecting 13.7119
hash_CRC high 40000/10000: llps = 15, expecting 13.7119
Java_Object_hash high 40000/10000: llps = 49, expecting 13.7119
Java_String_hash high 40000/10000: llps = 13, expecting 13.7119