Algorithm running time discrepencies

K29 · Sep 21, 2012

Twice I have had to calculate the speed of algorithms depending on input size. The first time I was working in MATLAB(which is based on C) and was timing the difference in time to perform matrix operations on the identity matrix in standard storage and in sparse storage, as I grew the matrix size.

More recently I have been testing (for increasing array input sizes) the running time of the bisection search and the linear search in the worst case.

In all the above experiments I get correct results, for example for bisection search it increases linearly with array size. For bisection search I get a graph looking like log(n). For matrix multiplication under standard storage, operation time increased exponentially, and under sparse, some function of log(n) was observed.

HOWEVER, in all the above experiments the input size was increased AUTOMATICALLY with a loop, and left for a few hours to automatically generate all data. For example, for the linear search it would increase it by 50000 inputs and always look for the element in the last position.

The first call of the searching function/matrix operation always takes much longer than the rest. Only thereafter do I observe the expected trend. In the case of linear search it is always the first two calls of the function.

In the case of MATLAB for matrix operations the algorithms were most definitely correct; it simply consisted of timing A*B with

Code:

tic

and

Code:

toc

In the case of the search algorithms I've verified I've coded them correctly. I am using

Code:

System.nanotime

to time the algorithms.

Why does the first one(or two) runs take so long, while the expected trend only follows thereafter?

I am not running any other software at the same time, nor doing anything other than letting the algorithm run. I do not have a screensaver; UBUNTU just switches off the screen after a while. I have an Intel Celeron 2.13Ghz, and 2 gigs of DDR 2 RAM

For the java bisection search and linear search I am using an ArrayList datastructure. Note that to get() operation on the array list it is O(1) and this is the only operation I perform on it during the linear search.

Also note, that the input size seems to be sufficiently large. It takes a good 15 minutes to sort lists of this size ascending before running and timing the bisection search algorithm.

I have an attached an image of linear search results. You can see after the first two loops it starts following the O(n) trend(for worst case scenario).

rcgldr · Sep 21, 2012

The first call might be getting impacted by cache loading or dynamic memory allocation when the list is being created, but I'm not sure why the second call would be slow. You could try running the first call twice in a row to see of the second instance of the first call runs faster.

The other but less likely issue is that your timer could be wrapping around on the longer calls. How long can the timer run before it wraps around? Do you know which timer is being used to generate these timings?

15 minutes to sort a list? How large is this list, in terms of element size and number of elements?

K29 · Sep 21, 2012

rcgldr said:

The first call might be getting impacted by cache loading or dynamic memory allocation when the list is being created, but I'm not sure why the second call would be slow. You could try running the first call twice in a row to see of the second instance of the first call runs faster.

A fresh list is automatically created inside a loop, for different arraysizes. The code follows this pseudocode:
loop{
List of size 50 000 created.
Linear search performed and timed on this list.
List of size 100 000 created.
Linear search performed and timed on this list.
List of size 150 000 created.
Linear search performed and timed on this list.
etc...
} until user-specified number of list-size-increments are done

Does your above statement still hold?
I will still try running a call for size 50000 twice in a row later and let you know...

rcgldr said:

The other but less likely issue is that your timer could be wrapping around on the longer calls. How long can the timer run before it wraps around? Do you know which timer is being used to generate these timings?

The timer takes the System.nanotime before executing the algorithm and after executing the algorithm to determine the duration. This is stored in a 'long'. The time to do a linear search is less than a second (observed) and the resultant duration from the program (a few millliion nanoseconds) corresponds to this. I am not entirely confident it is not wrapping around though. I will look into this as well and post back...

Code:

if(SearchType.contentEquals("LinearSearch")){
			System.out.println("Starting linear search...");
			
			long startTime = System.nanotime();
			int found_index = linearSearch(ready_array, query);
			long endTime = System.nanoTime();
			duration = endTime - startTime;
			
		}
		
		return duration;

duration is of type long. The durations are later stored in an ArrayList<Long> for graph plotting.

rcgldr said:

15 minutes to sort a list? How large is this list, in terms of element size and number of elements?

It is an ArrayList<Integer>. Bubble-sorted for the Bisection Search. It takes 10-15 minutes to sort, when the listsize is > 100 000. It consists of random Integers ranging randomly from -2,147,483,648 to 2,147,483,647.

rcgldr · Sep 21, 2012

K29 said:

It is an ArrayList<Integer>.

I'm not familiar with MATLAB. For C++ STL (Standard Template Library), the list type of classes are implemented as doubly linked lists, while vector type of classes are implemented as arrays, which are much quicker for doing operations like search or sort. I don't know if the same applies to MATLAB. Some example times using the faster sort methods with C++ STL vector class (microsoft sorts are part of its STL, merge sort is one I wrote):

Code:

Time to sort 4,194,304 64 bit elements of psuedo random data:

microsoft sort             461 ms  (quick sort)
merge sort                 539 ms  (merge sort)
microsoft stable_sort      554 ms  (insertion (32 element) / merge sort)

64 bit mode, Windows XP64, Visual Studio 2005 compiler.
Intel core 2 X6800 CPU, 2GB 4-4-4-10 DDR2 ram, Intel D975XBX motherboard.

K29 · Sep 22, 2012

rcgldr said:
I'm not familiar with MATLAB. For C++ STL (Standard Template Library), the list type of classes are implemented as doubly linked lists, while vector type of classes are implemented as arrays, which are much quicker for doing operations like search or sort. I don't know if the same applies to MATLAB. Some example times using the faster sort methods with C++ STL vector class (microsoft sorts are part of its STL, merge sort is one I wrote):
Code:
Time to sort 4,194,304 64 bit elements of psuedo random data:

microsoft sort             461 ms  (quick sort)
merge sort                 539 ms  (merge sort)
microsoft stable_sort      554 ms  (insertion (32 element) / merge sort)

64 bit mode, Windows XP64, Visual Studio 2005 compiler.
Intel core 2 X6800 CPU, 2GB 4-4-4-10 DDR2 ram, Intel D975XBX motherboard.

My apologies I forgot to state I'm doing searching and sorting in java :)

The MATLAB project was for timing matrix operations

K29 · Sep 22, 2012

I am thinking my sorting is taking long (I have checked; a good hour on lists size 1 000 000) because ArrayList.add(index,item) runs in linear time, O(n)
Inside the bubble sort, this essentially makes the sorting algorithm O(n^4) instead of O(n^2) since I am doing two adds every iteration.

I may have to leave it like this, because the searching uses a get(), and a LinkedList's get takes O(n) time, while the ArrayList get() takes O(1), the purpose is to time the searching, not the sorting.

Unless I sort with a LinkedList and then convert it to an ArrayList. I'm not sure how long that will take

Edit: actually it seems like my best bet is to use a standard array. Every operation I need will be O(1).

I will post back whether this fixes the timing discrepencies. I still need to check for overflow as well.

rcgldr · Sep 22, 2012

If the number of elements or the maximum number of elements is known, you could use ensureCapacity to preallocate memory for the arraylist. This should speed up the adds . As you mentioned, a standard array may be faster. The C++ STL vector isn't much different than a standard array except that it can change in size (implemented by allocating a new block of memory, copying the old vector to the new memory, then releasing the memory for the old vector. usually there's some additional memory allocated so that allocate, copy, release doesn't occur on small increases in size), and there are some methods included for vectors, such as sort or stable_sort.

Edgardo · Oct 2, 2012

Don't use bubble sort since it is slow. Use the built in Collections.sort(myArrayList) method:
Sorting an ArrayList
According to this discussion Java7 uses Timsort.

For arrays use the built in Arrays.sort(myArray) method:
Sorting an array

Algorithm running time discrepencies

Attachments

Is A.I. more than the sum of its parts?

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Algorithm running time discrepencies

Attachments

Similar threads