SE250:Lab-2006-05-02:mgar059

From Marks Wiki
Jump to navigation Jump to search

Question 3

When adding a string to a HashSet of size n, what portion of the time is taken up in computing the hash code? Does the answer change for different sizes n?

Method

The average time to compute a single hash was caluclated from a number of initail operations. The sizes of the sets used ranged between 1000000 and 37000000. An average of 5 tests were used.


Test 1

The first test was using the string "a". This had limitations. Using "a" as a string for every hash function could effect the proportion of time calculating the hash as the HashSet has functionality for remembering hashs. As a consequence, the measurment of the time to calculate the hash could really be the time for the function to recognise the "a" and retrive a remembered hash. The time taken to create the string was not measured when measuring the time to take the hash of that string.


The average time taken for a single has was 1.309E-08s. The percentage of time taken for hashing when adding to the HashSet was 21.5%

Only a fifth of the time was spent on calculating the hash. For increasing sizes of the HashSet, the proportion of time spent calculating was approximatly constant at 21.5%. It does not seem from this test that hashing is the major computing cost to the function.

Test 2

The second test was using the string "hello" with a unique number appended on the end. This was to overcome the possibility that the hashing function remembered the previous hashed answers to short-cut the hasing process. A limitation when using this as the method was that the lenght of strings to be hased varied. Furthermore, the method for calculating the time for a single has included the time taken for the retrival of a string from a large array. At runtime the heap had to be increased to a couple hundred megabites to handle the sizes of arrays and the hash being used to do this. The time measurement was made for adding a unique string element from an array. The time taken to create these strings did not play a part in the time calculation.


The average time taken for a single has was 9.04E-08s. The percentage of time taken for hashing when adding to the HashSet was 16.5%

This again seems to show that the amount of time spent on the hashing was fairly small. This value is smaller than that of using the string "a" which isnt as expected. As such it is possible that test 1 was actually the time spent recognising the hash which for that example could have taken longer than actually calculating the hash each time. The reslut of this test could be skewed due to the limitations in gathering an accurate value for the time to calcluate a single hash. This was hard to do.


The time spent on the hashing computation in the HashSet was rather small at 1/5th of the time or less. If this is the case, the majority of the time is likelt to be spent calculateing where and how to store the data. The sizes of the strings used were relativly small. Despite the two tests where the longer string took a smaller proportion of time, it is my opinion that the time for calculating how and where to store the hash is consistent for every set size, and that increasing the lenght of the string would increase the proportion of time spent on the hashing function. These results arent very conclusive as there are many factors adding the the inaccuracys of the times generated.