SE250:lab-3:hpan027

From Marks Wiki
Jump to navigation Jump to search

Not a very exciting lab as most of the time is spent waiting for the programme to run.

Task 1

The following code was added to the arraylist.c file

int main () {

    ArrayList myArray;
    int i;
    int n = 1000000;
    element_t value = 3;
    clock_t clockStart;
    clock_t clockFinish;
    clock_t timeTaken;

    arraylist_init(&myArray);

    clockStart = clock();

    for (i = 0; i < n; i ++ ) {
	arraylist_push(&myArray, value);
    }

    clockFinish = clock();

    timeTaken = ( clockFinish - clockStart );

    printf("%ld", timeTaken);
    
    return 0; 

}

The the time taken was recorded for different values of n between 1,000,000 to 1,000,000,000

n = 1,000,000
clock = 46
n = 10,000,000
clock = 672
n = 100,000,000
clock = 6640
n = 1,000,000,000
Programme runs out of memory
  • Time taken scales in a somewhat linear fashion. However, sometime before n reaches the value of 1,000,000,000, the programme is stopped with the error message "assertion "alist->arr != (element_t*)0"". i.e. the programme has run out of memory and hence did not allocate any more to arr.


Task 2

Growth factor: 1.5

n = 1,000,000
clock = 94
n = 10,000,000
clock = 671
n = 100,000,000
clock = 7297
n = 1,000,000,000
Programme runs out of memory
  • No drastic change here. Takes a little bit longer, and runs out of memory at the same place


Growth factor: 3.0

n = 1,000,000
clock = 46
n = 10,000,000
clock = 577
n = 100,000,000
Programme runs out of memory
n = 1,000,000,000
Programme runs out of memory
  • Slight improvement to speed here, but the program runs out of memory earlier


Growth factor: 4.0

n = 1,000,000
clock = 77
n = 10,000,000
clock = 641
n = 100,000,000
Programme runs out of memory
n = 1,000,000,000
Programme runs out of memory
  • At this point it's pretty clear that there aren't enough data to draw any conclusions, so more data need to be collected.


  • The following data was collected using a loop with n as the counter variable.
(n = 1000000; n < 100000000; n += 5000000)

File:Hpan027lab3.jpg

  • Time in clock ticks along y axis, value of n along x. Each series represent a different growth factor
  • We can see there is some relationship between time taken and growth factor - increasing growth factor tends to decrease time taken
  • Higher growth factors run out of memory sooner
  • The above relationship can be quite easily explained by how many times the array list has to be reallocated
	1.5x	2x	3x	4x	5x
A	27	15	10	7	6
B	11	6	4	3	2
  • Row A is the number of reallocations required for a given factor to reach n = 1,000,000
  • Row B is the number of additional reallocations required for a given factor to reach n = 96,000,000
  • For larger growth factors, less reallocations are required - especially ones dealing with large chunks of memory - hence less time is taken
  • The programme clearly runs out of memory somewhere between n = 10,000,000 and n = 100,000,000
  • Closer examination of of values of n for different growth factors show that:
    • Making the jump from 76,527,504 to 229,582,512 and 67,108,864 to 268,435,456 for factors 3x and 4x respectively results in the programme running out of memory
    • Jumping from 31,250,000 to 156,250,000 for factor 5x did not cause any problems
    • Hence the programme runs out of memory somewhere between 600mb to 875mb of memory used
  • That was some pretty meaningless data


Task 3

Looking at the value of ARRAYLIST_MIN_ALLOC vs the number of reallocations required to reach n = 1,000,000 showed that there shouldn't really be much difference for differences in MIN_ALLOC, whereas increasing it by larger factors should have greater effects.

MIN_ALLOC	Number of realloc to reach n = 1,000,000
8		17
16		16
32		15
10000		7
100000		3

Results for n = 1,000,000 to n = 10,000,000 step 1,000,000

		8	16	32	10000	100000
1000000		62	46	62	62	61
2000000		109	94	110	110	110
3000000		157	157	156	187	156
4000000		264	171	235	219	235
5000000		329	250	297	250	282
6000000		329	328	359	376	359
7000000		406	438	439	437	453
8000000		484	453	468	453	516
9000000		579	562	593	515	546
  • For smaller steps of n there is no difference

Hopefully there is some change for larger steps of n - as this is taking 1 minute each time it is run

Results for n = 1,000,000 to n = 100,000,000 step 5,000,000

	 	8	16	32	10000	100000
1000000		62	62	62	78	62
6000000		344	360	376	344	328
11000000	688	641	671	734	609
16000000	1015	1032	938	969	1016
21000000	1360	1296	1328	1313	1328
26000000	1422	1719	1563	1703	1703
31000000	1984	1703	1781	1626	2016
36000000	2312	2172	2406	2124	2062
41000000	2657	2453	2468	2578	2297
46000000	2906	2749	3047	3109	2515
51000000	3000	3063	3031	3297	2734
56000000	3468	3280	3203	3437	3360
61000000	3548	3422	3609	3562	3969
66000000	3969	3797	3969	4141	4266
71000000	5016	4469	4625	4220	4703
76000000	5375	5047	5891	4343	4797
81000000	5422	5188	5563	4985	4937
86000000	6047	5562	5687	5500	5469
91000000	5640	5750	6046	5936	5343
96000000	6079	6172	5813	5720	5703
  • Again no real difference

The time taken was expected to drop but this didn't really happen. This could be due to ratio of memory reallocations to the number of times that a variable is pushed

i.e Memory is reallocated about 30 times, while there are about 970,000,000 calls to the arraylist_push function.

However that doesn't explain why changing the growth factor makes a difference while the the MIN_ALLOC value does not.

It is therefore more likely that reallocating small chunks of memory does not take much time at all - which is really what changing the MIN_ALLOC variable reduces. Going from n = 1,000,000 to 100,000,000 still requires pretty much the same number of reallocs regardless of MIN_ALLOC.


Task 4

Data with and without calls to the ensure_capacity function

Without		With
47		46
360		265
656		516
953		844
1281		1062
1468		1250
1922		1421
2282		1672
2656		2063
2844		2421
2985		2750
3250		2906
3563		3047
3828		3311
4391		3516
5250		3749
5250		4094
5484		4266
5531		4532
6547		4797
  • There is a clear difference here. Notice that for smaller values of n, the difference is not as great. This reinforces the idea that larger chunks of memory takes more time to move - which is kind of intuitive in the first place.


Task 5

	 		+1e6	+1e7	+1e8
1000000			31	47	47
6000000			468	297	281
11000000		1001	626	546
16000000		1656	937	843
21000000		2657	1265	1048
26000000		3843	1516	1265
31000000		5266	2094	1422
36000000		6781	2125	1687
41000000		8452	2859	1953
46000000		10750	3359	2141
51000000		12937	3718	2391
56000000		15173	4141	2891
61000000		17954	4719	3063
66000000		20469	5250	3328
71000000		23188	5875	3172
76000000		26938	6344	3813
81000000		30343	7047	4047
86000000		33548	7390	4250
91000000		38188	8312	4578
96000000		41781	8890	4767
Calls to realloc	970	97	9.7

File:Hpan027Task5.jpg

  • The realloc calls make a more noticeable difference in larger numbers


Task 6

arraylist_put(&myArray, value,0);
  • Needless to say it takes an unreasonable amount of time to have to shift the whole array every single time especially when dealing with such large values of n. I gave up waiting after ten minutes as I've finished typing up the rest of my report.
  • For smaller values of n it is most likely putting a value at the front of the array will take much longer