SE250:lab-5:dols008

From Marks Wiki
Jump to navigation Jump to search

Task 1

I used a sample size of 100000 because beyond that it seemed to make very little difference to the results.

hash function			entropy exceed  mean    error   correlation
Buzhash low			7.99830	75.00%	127.578	0.71%	0.000327
Buzhash typical		7.99783	2.50%	127.374	0.77%	-0.000076
Buzhash n low			7.99824	50.00%	127.494	0.13%	-0.003092
Buzhash n typical		7.99824	50.00%	127.494	0.13%	-0.003092
Hash CRC low			5.59831	0.01%	81.783	24.53%	0.028545
Hash CRC typical		7.84046	0.01%	122.862	1.98%	0.018518
Base256 low			0.00000	0.01%	97.000	27.32%	undefined
Base256 typical		4.02297	0.01%	107.853	27.32%	0.034082
Add java integer low		4.82824	0.01%	43.883	27.32%	-0.092002
Add java integer typical	4.82824	0.01%	43.883	27.32%	-0.092002
Add java object low		2.00000	0.01%	77.000	27.32%	-0.521556
Add java object typical	5.72209	0.01%	117.318	2.95%	-0.350088
Add java string low		7.99957	99.99%	127.627	0.32%	-0.000272
Add java string typical	7.94554	0.01%	126.139	0.27%	0.021181
Add rand low			7.95308	0.01%	111.441	11.17%	-0.051837
Add rand typical		7.95272	0.01%	111.395	10.65%	-0.049131
Add high rand low		7.99828	75.00%	127.441	0.75%	-0.001213
Add high rand typical		7.99807	50.00%	127.406	0.07%	-0.002226

The more random a hash function is the better. So I expect the better hash functions to have higher entropy, mean closer to 127.5 and correlation closer to 0. I don't understand the chi square distribution. Good hash functions would probably result in a more accurate calculation of pi, but the other number seem more concrete. So, based on these criteria, I would rank the hash functions from best to worst like this:

Buzhash n, Buzhash, java string, hash CRC, java object, java integer, base256.

It does look like high rand is more random than rand, and is about as good as the better hash functions. Java string seems to be better than everything else for low entropy input, but not quite as good for typical entropy input.

Task 2

Buzhash n and java integer didn't work because they have a different function signature. Here are my results:

Buzhash low 60000/40000:		llps = 8,	expecting 8.8452
Buzhash typical 60000/40000:		llps = 9,	expecting 8.8452
Hash CRC low 60000/40000:		llps = 17,	expecting 8.8452
Hash CRC typical 60000/40000:		llps = 12,	expecting 8.8452
Base256 low 60000/40000:		llps = 60000,	expecting 8.8452
Base256 typical 60000/40000:		llps = 2020,	expecting 8.8452
Java object low 60000/40000:		llps = 60000,	expecting 8.8452
Java object typical 60000/40000:	llps = 22,	expecting 8.8452
Java string low 60000/40000:		llps = 4,	expecting 8.8452
Java string typical 60000/40000:	llps = 10,	expecting 8.8452

Java string is debatably the best hash function. Buzhash performed slightly better for typical data, and a fair bit worse for low entropy data. The prize for worst hash function goes to base256, with java object similarly terrible.