SE250:lab-5:hbar055

From Marks Wiki
Jump to navigation Jump to search

Firstly tested the buzhash function. I chose a sample size of 100, 100 and then 100. I felt this was a goos sample size as this will let us test the capacity of each of the hash functions and how well it performs under critical load situations.

  • 1)
Testing Buzhash low on 100 samples
Entropy = 7.843786 bits per byte.
Optimum compression would reduce the size of this 100 byte file by 1 percent.
Chi square distribution for 100 samples is 214.46, and randomly would exceed this value 95.00 percent of the times.
Arithmetic mean value of data bytes is 128.0860 (127.5 = random).
Monte Carlo value for Pi is 3.132530120 (error 0.29 percent).
Serial correlation coefficient is -0.017268 (totally uncorrelated = 0.0).
Buzhash low 10/100: llps = 1, expecting 1
  • 2)


Testing Buzhash low on 100 samples
Entropy = 7.985498 bits per byte.
Optimum compression would reduce the size of this 10000 byte file by 0 percent.
Chi square distribution for 100 samples is 201.50, and randomly would exceed this value 99.00 percent of the times.
Arithmetic mean value of data bytes is 125.8253 (127.5 = random).
Monte Carlo value for Pi is 3.181272509 (error 1.26 percent).
Serial correlation coefficient is -0.000047 (totally uncorrelated = 0.0).
Buzhash low 10/100: llps = 1, expecting 1
  • 3)
Testing Buzhash low on 100 samples
Entropy = 7.998297 bits per byte.
Optimum compression would reduce the size of this 100000 byte file by 0 percent.
Chi square distribution for 100 samples is 235.54, and randomly would exceed this value 75.00 percent of the times.
Arithmetic mean value of data bytes is 127.5775 (127.5 = random).
Monte Carlo value for Pi is 3.119404776 (error 0.71 percent).
Serial correlation coefficient is 0.000327 (totally uncorrelated = 0.0).
Buzhash low 10/100: llps = 1, expecting 1
Entropy: the closer the value is to 8 bits per byte, the more random the numbers returned by the function must be. 

How i interpret the above results is by using the information I I got from the 2007 Hypertext Textbook:

  • Percentage by which the result could be compressed: The higher this percentage, the more redundancy the results must contain (ie repeated values). This means that a high value equates to low randomness.
  • Chi Square: The number of times the values would exceed the random variable obtained gives an indication of how random the numbers must be. Values close to 50% have the highest randomness.
  • Arithmetic mean: The expected mean for a set of random numbers is always the centre between the highest value and the lowest value. For this set of tests, 127.5 is the expected mean.
  • Monte Carlo: The numbers are plotted on a 2D grid containing a circle. The percentage of points inside the circle and the percentage of points outside the circle should be the same. The error indicates by how much the two differ. Hence, the closer the error is to 0, the more random the numbers generated must be.
  • Serial Correlation: Values close to 0 show that the relationship between the numbers is fairly non-existant (and hence must be random).

It can be seen that from increasing the sample size from 1000 to 10000, there is a considerable increase in randomness. However there is not much of an improvement in randomness when the sample size is changed fomr 10000 to 100000. Although the sample size of 100000 is defiantely more random, it seems to standardise at this sample size.

Next, I used ent_test for the rest of the functions. I commented out //#define VERBOSE_OUTPUT so that the results were displayed without all the text adjoining it.