Search This Blog

Sunday, July 15, 2012

Data Structures

Data structures:

As soon as you read the title, your mind should wander around concepts like linked list, trees, arrays etc.. Before seeing any of them i will discuss about how a program runs, where data gets stored in memory.
You write a program in higher level language and then compile it. Compiler converts it finally into machine level instructions. These instructions are stored in code area in memory and executed sequentially ( remember the point i made while discussing about types of programming languages? Machine language is procedural )
The code area and other areas of memory, which i will be discussing are virtual. In real, main memory (RAM), where the program gets stored, is managed by OS.
Apart from code area we have globals area where the global variables of a program gets stored.
Other two important areas are,

  • the heap and
  • the stack

The Heap:
Note: Am not talking about the heap data structure.
You can explicitly allocate and deallocate space for your variables (dynamic allocation) from a big pool of memory called the heap. For variables like array whose size is known only during run time explicit memory allocation is the best. A pointer is then used to access the allocated memory. It is to be noted that the memory remains allocated unless you deallocate it manually.So remember every time to deallocate the memory you allocated.
For example:
int *a=(int*)malloc(sizeof(int)*100);

This allocates memory to store 100 integers. Pointer pointing to base address is returned !

to free this memory, in C you have free keyword.

free(a);

Now the allocated memory will be freed.

The Stack:
In general everyone would be aware of what a stack is and what operations can be performed on it. I will refresh the functions you can perform on a stack , you can push data, pop data, see what’s on top. It follows LIFO- last in first out architecture.
So when a function is called it gets pushed onto the stack.Usage of stack data structure is optimal for this scenario for the following reason.
Suppose main calls function A, A calls B and B calls C.
The stack would be like,

C
B
A
main

When C returns control is passed on to B and then to A and finally to main. Stack intuitively suits our needs here.

Lets see what actually gets pushed in detail here,

  • The address of the instruction beyond the function call.(helps in remembering where to go while the function returns)
  • Place holder for return type
  • function arguments
  • local variables
Stack pointer points to the top of the stack. When this function returns Stack pointer gets decremented and moves to the start of the previous function ( having the effect of popping out this function, without actually deleting contents of the stack )

Enough said about how program gets stored and executed. Lets move on to the Data structures we are familiar with.




A Data structure is just a format or way of storing and retrieving data efficiently.
Few examples are, arrays,lists,trees,graphs,Heaps,Tries,Hashes etc...

Explaining about each data structure is needless here, since you already have voluminous books explaining data structures clearly.
I will just explain why you need data structures and what data structures you need for what operations.

Need....
With ever increasing data we need efficient ways to store and retrieve them. In 2011 world has produced 18 zettabyte of digital data. Unless we have the correct data structure used for certain scenarios, we will face difficulties in even leading day to day life. For example, you have online ticket booking systems... with growing population we need efficient way of storing and processing data to make the system work. Complex algorithms need specific data structures to work properly. For example, implementing hash in linked list is pointless. ( implementing one such thing is almost impossible for that matter !!)

What data structures for what problem ??
Hashes are good at retrieving data(O(1) -- constant time ! ) while they are bad for finding maximum or minimum value (Heaps are good at that). In this way each data structure will have its own complexity and speciality. So choosing the right data structure depends on the problem statement given. And most of the times there may be more than one data structure meeting the need ! For example, trees are good for searching as well as representing sorted lists.

Complexities:
For those wondering what is O(1) mentioned in the last paragraph, i am explaining here a little about complexities of algorithms.
Usually two algorithms are compared based on their time and space complexities. Since most systems have ample space nowadays we mainly concentrate on time complexity to compare algorithms. Usually the complexity will be a function of the input size n.

  • Best case complexity-- Complexity calculated for the best possible input for the program.     For example, a sorted list is the best possible input for insertion sort
  • Average case complexity--Complexity calculated on an average. Calculated using probability functions of input.
  • Worst case complexity--Complexity calculated for the worst possible input for the program.    For example, For quicksort with pivot as first element a sorted list in reverse order is worst possible input.
The big-O notation or the asymptotic notation is usually used to denote complexity of an algorithm. It denotes the upper bound of the growth function. For example, The average case complexity of quicksort is O(nlogn) whereas its worst case complexity is O(n^2).
Big omega--> lower bound
Big theta-->tight bound

O(n^2) worst case means that if quicksort sorts 10 numbers in 1000 ms then on doubling the number of inputs (20) it takes (4x) 4000 ms . For this reason this is called quadratic complexity.
Input size-->Time
10-->1000 ms
20-->4000 ms
30-->9000 ms

On the other hand on an average case consider binary search... its complexity is O(logn). It's called logarithmic complexity.
So if it does search on an input of 10 numbers in 100 ms then on squaring the input size the time only doubles.
Input size-->time
10 --> 100 ms
100 --> 200 ms
1000 --> 300 ms

Linear Time
O(n) --> linear relation between size of input and time taken. For example, finding the maximum number in an array.
Input size-->Time
100-->100 ms
200-->200 ms
1000-->1000 ms

Constant time
O(1) --> whatever be the size of the input, time taken is constant.
For example, Finding maximum number in a max heap.
Input size-->Time
100-->10 ms
200-->10 ms
1000-->10 ms



That’s it about introduction to data structures. More about searching and sorting in next post. Cya.


Trained @ Sourcebits

No comments:

Post a Comment