SE250:Lab-2006-05-09:mgar059

From Marks Wiki
Jump to navigation Jump to search

Lab Report for 9 May

Hashing: theory and practice

Question 1

The likelyhood of getting bins with chain lenghts "k" for a 1000 sided dice to number of runs were calculated. The first values are using poissons formula. As the number of rolls increase, it is clear that the probability of getting empty bins decreases and the chains start to get longer.

for 1000 rolls

For a k value of 0 the probability is: 36.8%
For a k value of 1 the probability is: 36.8%
For a k value of 2 the probability is: 18.4%
For a k value of 3 the probability is: 6.1%
For a k value of 4 the probability is: 1.5%
For a k value of 5 the probability is: 0.3%
For a k value of 6 the probability is: 0.1%
For a k value of 7 the probability is: 0.0%
For a k value of 8 the probability is: 0.0%
For a k value of 9 the probability is: 0.0%

for 2000 rolls

For a k value of 0 the probability is: 13.5%
For a k value of 1 the probability is: 27.1%
For a k value of 2 the probability is: 27.1%
For a k value of 3 the probability is: 18.0%
For a k value of 4 the probability is: 9.0%
For a k value of 5 the probability is: 3.6%
For a k value of 6 the probability is: 1.2%
For a k value of 7 the probability is: 0.3%
For a k value of 8 the probability is: 0.1%
For a k value of 9 the probability is: 0.0%

for 5000 rolls

For a k value of 0 the probability is: 0.7%
For a k value of 1 the probability is: 3.4%
For a k value of 2 the probability is: 8.4%
For a k value of 3 the probability is: 14.0%
For a k value of 4 the probability is: 17.5%
For a k value of 5 the probability is: 17.5%
For a k value of 6 the probability is: 14.6%
For a k value of 7 the probability is: 10.4%
For a k value of 8 the probability is: 6.5%
For a k value of 9 the probability is: 3.6%

Question 2

To verify the results in part one, an experiment was undertaken to find actual valuse of a distribution in a test. Random data was obtained from www.random.org. Four sets of data follow. These results refltect the theretical results from question 1. These results do vary a bit but appear to be are relativly consistant.

0:	378	(expected 367, Std.dev = 19.2)
1:	368	(expected 367, Std.dev = 19.2)
2:	164	(expected 183, Std.dev = 13.5)
3:	66	(expected 61, Std.dev = 7.8)
4:	16	(expected 15, Std.dev = 3.9)
5:	6	(expected 3, Std.dev = 1.7)

0:	363	(expected 367, Std.dev = 19.2)
1:	381	(expected 367, Std.dev = 19.2)
2:	169	(expected 183, Std.dev = 13.5)
3:	69	(expected 61, Std.dev = 7.8)
4:	16	(expected 15, Std.dev = 3.9)

0:	366	(expected 367, Std.dev = 19.2)
1:	364	(expected 367, Std.dev = 19.2)
2:	194	(expected 183, Std.dev = 13.5)
3:	60	(expected 61, Std.dev = 7.8)
4:	12	(expected 15, Std.dev = 3.9)

0:	380	(expected 367, Std.dev = 19.2)
1:	351	(expected 367, Std.dev = 19.2)
2:	183	(expected 183, Std.dev = 13.5)
3:	63	(expected 61, Std.dev = 7.8)
4:	21	(expected 15, Std.dev = 3.9)


For larger sets of numbers, the nextInt method was used instead of random data tables. These results also largly reflect the thoeretical results from question 1.

For 2000 using nextInt

0:	1134	(expected 735, Std.dev = 27.1)
1:	279	(expected 735, Std.dev = 27.1)
2:	259	(expected 367, Std.dev = 19.2)
3:	193	(expected 122, Std.dev = 11.0)
4:	78	(expected 30, Std.dev = 5.5)
5:	33	(expected 6, Std.dev = 2.4)
6:	21	(expected 1, Std.dev = 1.0)

For 5000 using nextInt

0:	4003	(expected 1839, Std.dev = 42.9)
1:	38	(expected 1839, Std.dev = 42.9)
2:	94	(expected 919, Std.dev = 30.3)
3:	133	(expected 306, Std.dev = 17.5)
4:	174	(expected 76, Std.dev = 8.7)
5:	166	(expected 15, Std.dev = 3.9)
6:	146	(expected 2, Std.dev = 1.4)
7:	112	(expected 0, Std.dev = 0.0)
8:	71	(expected 0, Std.dev = 0.0)
9:	31	(expected 0, Std.dev = 0.0)
10:	15	(expected 0, Std.dev = 0.0)
11:	14	(expected 0, Std.dev = 0.0)
12:	1	(expected 0, Std.dev = 0.0)

Visualisations

Question 1

The visualisation provides a graphical representation of the where elements are being stored and gives an idea of chain lengths in each bin.


  • Integer Hash (key 1): The Integer hash appears to be very good. The visualisation shows an even distribution. Every bin is used and all the chain lengths appear to be the same.


  • Integer(i*300) Hash (key 2): This hash appears to be relativly good for some bin sizes and poor for others. Frequently there are unused bins and very long chains. 660 is one such example. This function also has a high cost.


  • String Hash (key 3): This hash appears to be quite good for the majority of the time. Occasionaly however, the hashes are concentrated on some of the bins while others have none.


  • getWord Hash (key 4): This hash appears to be quite good. The keys were distributed randonly and stayed such with changes to the size of keys and bins.


  • Hash.hash(i) Hash (key 5): This hash appears to be quite good. The keys were distributed randonly and stayed such with changes to the size of keys and bins.


  • Hash.hash(i*300) Hash (key 6): This hash appears to be quite good. The keys were distributed randonly and stayed such with changes to the size of keys and bins.

The remaining hashes from keys 7-9 look very much like hash 4-6 displaying similar properties.


Question 2

JavaHash, ELFHash, KnuthHash and RCSHash appear to have a definate structure. Even the buzhash has a vague structure. Only the randomhash appeared to be truely random.

Questions

Question 1

The randomness comes from the process in hashing the data.

Question 2

Use a hash function which is very random and cheap to calcualte. Vary the hash function used so that the attacker does not know which hash to attack.

Question 3

By studying the code for a paticular hash. Identifying patterns in the hashes e.g. For some hash functions parts of the input may not be cosidered when entering the hash function. These could be changed without effecting the output hash.