SE250:HTTB:Trade-offs

Introduction

When it comes to engineering software we may have particular constraints that we have to fulfill or desire to have, these can be put down to the basic three: Time, Resources and Use. In our endeavors we try to maximize all three but often we have to make Trade-offs. Here we will cover the Trade-offs and point out which is better for different situations/circumstances.

We can further specify our basic three to elaborate for Software Engineering:

Time
- We may have a set amount time, as little as possible or none at all
Resources
- A small or large amount of memory available.
- A slow or fast CPU
- A low or high amount of Hard Drive space (not covered here though).
Use
- May require to get your information back in a particular order
- May need to insert in either: In a particular order or randomly

Comparison of Data Structures

Data structures form the back bone of any program that needs to store information. They also decide how the information is going to be accessed as well as the time taken to access and store it.

Array-based Lists

Main article: SE250:HTTB:Array-based_Lists

Characterization of Array-based List

1.Size must be predetermined before the Array can be allocated.
2.Cannot grow beyond their predetermined size.
3.Whenever the list contains only a few elements,a substantial an\mount of space might be tied up in a largely empty array
4.There is no wasted space for an individual element. It is completey filled, there is no storage overhead.

Linked Lists

Main article: SE250:HTTB:Linked_Lists

Characterization of Linked List
1.It has the advantage that they only need space for the object actually on the list.
2.NO limit to the number of elements on a linked list,as long as there is free-store memory available
3.It requires a pointer be added to every list node

Array-based List and Linked-list

Efficiency of memory space

Linked lists require a pointer to be added to every list node.If the element size is small, then the overhead for links can be a significant fraction of the total storage. If an array-based list is completed filled, there is no storage overhead. In this situation, the array-based list is more space efficient by a constant factor.

A simple formula can be used to determine whether the array-based list or linked list will be more efficient in a particular situation.

n = number of elements
P = size of a pointer in storage units
E = size of a data element in storage units
D = maximum number of list elements

The amount of space required for the Array-based list:

D‧E

The amount of space required for the linked List:

n‧(P+E)

In general, the linked list requires less space than the array-based list when relatively few elements are in the list. Conversely, the array-based list become more space efficient when it is close to full.

Using the equation, we can determine the break-even point beyond which the array-based list is more space efficient in any particular situation

n〉D‧E/(P+E)

If P=E(4-byte long element and 4-byte pointer)
The break-even point is at D/2. That means an Array-based list would be more efficient whenever the array is more than half full.

Linked list is better when number of elements vary widely or is known. Array-based lists are generally more space efficient when the user knows approximately how large the list will become.

Access Time

Array-based Lists are faster for random access by position, and the operations always take O(1) time. Linked list has no explicit access to the previous element, and access by position requires marching down the list form the front to the specified position. These operations require O(n) time in the average and worst cases.

To visually sum this up see hals016's Screen-Cast on comparing Array-Based List's and Linked-Lists

Hash Tables

Main article: SE250:HTTB:Hash_Tables

Hash Tables are quite possibly the best data structures one can generally use if they don't want the data automatically sorted (E.g. Names sorted alphabetically), as they strike a balance between Memory use and CPU time.

Advantages
- Can be constant access and insert Time ( O(1) (...Link to Big O table)

Disadvantages
- Can have same memory overhead as a Linked-List
- Not user sorted
- Only as good as the Hashing function (Could index to particular rows too many times)

Binary Search Trees

Main article: SE250:HTTB:Binary_Search_Trees

Binary Search Trees have a desired property of being easily able to get sorted information as one of the quickest of our main data structures.

They can also have some calculation overhead if it is a self balancing tree as every time a node is inserted it will have to make the call of balancing the tree so as to have the minimal height. But this overhead can reduce its overhead (if it can effectively sort out the tree) as when it comes to accessing elements it will reduce the time taken (O(log n)).

Efficiency of memory space

The overhand in BST depends on several factors including which nodes store data values, whether there is a parent pointer, whether the leaves store child pointers and whether the tree is a full binary tree.

In a simple pointer-based binary tree, every node has two pointers to its children.

n = number of nodes
p = the amount of space required by a pointer in storage units
d = the amount of space required by a data value in storage units

The total space amount for a tree of n nodes:

n*(2p+d)

The total overhead space will be
2pn
for the tree.So the overhead fraction will be

2p/(2p+d)

If we assume that p=d, then a full tree has about two thirds of its total space taken up in overhead. Also, about half of the pointers are "wasted" NULL values that serve only to indicate tree structure, but which do not provide access to new data.

Algorithms

Algorithms decide how we interact with our data and as such the same Trade-offs apply. Here we will discuss how each apply to what you may want to use in practice.

State Based Search

Main article: SE250:HTTB:State-space_search

This is based on how much of the current state you know and use in deciding your searching algorithm and whether you can implement your knowledge or not.

Uninformed Search

Uninformed searches can be thought of as naive and a means of "brute-force". In other words they are like blind searches, they search a tree without any information that helps speed up the time to reach the goal. The two uninformed searches we are learning are the Breadth first search and the Depth first search. In these types of searches there are four main categories to determine the trade-offs:

Time complexity – How long, in the worst case, does the algorithm take to find the goal?
Space complexity – How much space is needed, in the worst case scenario, for storage by the algorithm?
Completeness – Is the algorithm guaranteed to find a solution?
Optimal – Does the algorithm find the shortest path to the goal? (if there is more than one goal)

To get an idea of how these algorithms work here are the links to the screencasts:

 BFS  and  DPS

Trade-offs: The following are used inside the equations: b: maximum branching factor of the search tree (largest number of children branching off one node) d: depth of the least-cost solution m: maximum depth of the state space (the maximum height when all the possibilities are shown) Breadth first search

Time complexity - Since in the worst case, breadth-first search has to consider all paths to all possible nodes. The time complexity is therefore "b + b2 + b3 + ... + bd"(every branch of every node) which asymptotically approaches O(b^(d+1)).
Space Complexity - Since all of the nodes of a level must be stored in memory until their children in the next level have been generated, the space complexity is proportional to the number of nodes at the deepest level. The asymptotic space complexity is the number of nodes at the deepest level, O(b^(d+1)).
Completeness - Yes since it visits every node.
Optimal - Yes, if cost per path is 1.

Depth first search

Time complexity - Assuming worst case, there is 1 goal leaf all the way to the RHS at depth d. DFS will be exponential O(b^m).
Space Complexity - Space complexity of DFS is much lower than BFS (breadth-first search). It also tends to behave much better with heuristic methods of choosing a likely-looking branch(leads closer to the goal). At depth m we have b nodes and b-1 nodes at earlier depths "total = b + (m-1)*(b-1)", therefore the space complexity is linear, O(bm). This is terrible if m is much larger than d however, if solutions are dense it may be much faster than BFS.
Completeness - No, fails in spaces with loops and large depth trees.
Optimal - No.

Conclusion

Time complexity is the same but in the worst-case BFS is always better than DFS. Sometimes DFS is better if there are many goals, no loops in the state spaces and no large depth paths. BFS is much worse memory-wise and may store the whole search space while the space of DPS is modeled linearly.

In conclusion, BFS is better if the goal is not deep, there are infinite paths, there are many loops and a small search space. While on the other hand, DFS is better if there are many goals and not many loops. Regardless of conditions, DFS is always better in terms of memory.

Informed Search

Informed searches are done by providing heuristic information with the problem that might help reduce the amount of traversal required to reach the goal by the search algorithm. The informed search we learn is Best First Search. In a informed search, the trade-off can be between these:

Time complexity - Whether or not the program is indeed faster.
Programming complexity - Whether the priority function is worth the time coding.
Completeness - Whether the heuristic information given is correct.
Optimal solution - Whether the priority function does find the shortest path to the goal.

ScreenCasts

User	Description
hlin079	The differences between hash tables and binary search trees
dcho040	Trade offs comparing 4 different data structures
hals016	The trade offs of a Linked List and an Array List (possible exam question)
hals016	The trade offs of Breadth-first search and Depth-first search algorithms