# Intro to Algorithms for Programming

[Total: 9    Average: 4.2/5]

Many threads here at PF include some question about how to learn programming. This is asked by Physics students who want to learn programming in order to help their study, work on some project or create some simulation among other things but also by students in other fields of science and by other members, including self learners. It is obvious that an exhaustive coverage of this subject cannot exist. There is a whole lot of textbooks and online resources covering various aspects of this subject, at various levels, from various points of view and from both theoretical and practical standpoint as well as in-between and then, a lot of practical aspects are learned only through experience. This insight about algorithms is the first part of a series that will try to give a number of fundamental notions both at theoretical and practical level, as well as a number of explanations and tips, in order to address some needs of a beginning programmer to a (hopefully) good extent. Last but not least, the mathematics needed for anyone wishing to become a decent programmer, are the ones that are prerequisites in a formal Computer Science curriculum i.e. mainly Calculus, Linear Algebra and Discrete Mathematics.

Introduction

In the last fifty years or so, programming has been recognized as a field providing fundamental and critical knowledge which helps towards the success of many jobs, endeavors and tasks and one that can be treated and stated in a scientific manner. The field of programming has evolved from a craftsmanship to an academic activity. The first important contributions to this end were made by E.W. Dijkstra and C.A.R. Hoare. Dijkstra’s paper “Notes on Structured Programming” $^{[1]}$ opened a new way for programming as a scientific subject and an intellectual challenge and Hoare’s paper “An axiomatic Basis of Computer Programming”$^{[2]}$showed that programs accept precise analysis based on mathematical reasoning. These two papers focused on the development and analysis of programs. More specifically, they focused on the structure of the algorithms that live into these programs. Also, it became evident that the scientific approach to the development of programs affects mostly big complicated programs with complicated data sets. This way, a new methodology for programming, capable of solving the issues of data structuring, was delineated. A very important contribution that put the confusing terminology and the notions of data structures in some order was Hoare’s “Notes on Data Structuring”$^{[3]}$. Through this paper, it became evident that no decision can be made on data structuring, if we don’t know the algorithms that will work with these data and conversely the choice and the structure of algorithms depends heavily on the choice of data structures. So, algorithms and data structures are two notions intimately related.

After this brief history, we will first deal with these two fundamental and extremely important notions: algorithms and data structures.

Algorithms

The concept of algorithm is very old. The use of the concept can be ascribed to Greek mathematicians but the term algorithm itself derives from the 9th Century mathematician Muḥammad ibn Mūsā al’Khwārizmī$^{[4]}$. But what is an algorithm?

Problem

We will begin this discussion by first addressing what is a problem. In general, a problem is a difficult situation which we wish to solve in some way. In the context of the present discussion, a problem represents a situation, in which we need to perform some computation in order to get the result or to aid in the solution of another bigger problem.

Problems are divided according to their solvability in three categories:

• Solvable, for which their solution is already known
• Unsolvable, for which it is known that there is no solution so far
• Open, for which their solution, although not found yet, we are not sure if can be found

There is also a categorization of the solvable problems according to their structure. The structure of a problem refers to the individual subproblems of which our original problem consists.

• Unstuctured: they accept many potential solutions but there is not a specific one
• Half-structured: they have a limited number of potential solutions
• Structured: the solution is automated. The problems in this category can be:

–  Decision problems:  problems that are solved through “Yes” or “No”

–  Computational problems:  problems that require computations

Optimization problems:  finding the optimal (most efficient) solution.

But what are the reasons of using a computer to solve a problem? These are mainly four:

• Repetitive operations
• Complexity of computations
• Speed of execution
• Large data volumes

Next, we come to our main question of what is an algorithm.

Definition of an algorithm

Informally speaking, an algorithm is a sequence of steps or a kind of “recipe” we can follow, in order to solve some problem. Now, this is not a particularly useful definition for a programmer, as it is very general while the computer has finite resources and our life is also finite. This brings us to the formal definition of an algorithm: an algorithm is a finite sequence of actions (steps) which are strictly defined and can be executed in a finite timespan for the purpose of solving a problem.

Criteria for an algorithm

Algorithms must satisfy five criteria: First, an algorithm must be able to accept an input (if needed). Second, an algorithm must always provide an output. Third, an algorithm must be decisive i.e. its behavior must be explicitly described under any circumstances. Fourth, an algorithm must finish its job after a finite number of steps. Fifth, effectiveness: each and every step of the algorithm must be clear i.e. straightforward.

Representations of algorithms

But how we represent an algorithm? There are various ways to do this. The four most widely used are

• Using natural language
• Using natural language in a step-wise fashion
• Drawing a flowchart
• Writing pseudocode

I’ll give an example of an algorithm represented by i) Natural language in steps, ii) Flowchart.

Example 1

Problem: A number and its order of magnitude are given via the keyboard of a computer . We are asked to write an algorithm which computes and displays the MSB and the LSB of the number.

Solution:  i)

1.   Read an integer
2.   Read the order of magnitude of the number
3.   Divide the number by the order of magnitude and put the quotient to MSB
4.   Divide the number by 10 and put the remainder to LSB
5.   Display MSB and LSB

ii)

From the above example we see that the representation of an algorithm in natural language with steps has the advantage of being very close to how we humans think but it is not so close to an implementation in a programming language as it is abstract enough for this. The situation is even worse in this regard, if we use just natural language. Flow charts as ii) above use various shapes in order to state what each step represents i.e. use oval shape for start, rectangular shape for a process, parallelogram for input / output of data, diamond(rhombus) for decision etc. so they are more descriptive than natural language but they are not used so much anymore. For more on flowcharts see Wikipedia$^{[5]}$

In order to be as close as we can to the human way of thinking but at the same time also to the implementation in a programming language, we use pseudocode. Pseudocode is an informal high level description of the operating principle of a computer program, i.e. it uses the structural conventions of a programming language in its representation of an algorithm but is intended for human reading i.e. it cannot be executed directly on a computer. Now, because it is closer to a programming language implementation this means that it takes various forms, depending on which programming language we intend to use. In older times, there was widespread use of Pascal-like pseudocode i.e. pseudocode with Pascal language structural conventions. Now, C-like pseudocode is much more common for educational purposes regarding development of a procedural-type program. Below, I give a list of the elements used in Pascal-like pseudocode, not just for historical reasons but because although different in various aspects from other pseudocode styles, it still gives the description of an algorithm in a way that can be transformed easily into a real program in any procedural type programming language.

First line Algorithm<name>
Last line end<name>
Assignment variable $\leftarrow$ expression
Simple conditional if <condition> then <series of actions to be executed>
end_if
Composite (multi-part) conditional if <condition> then
<series of actions_1 to be executed>
else
<series of actions_2 to be executed>
end_if
Multi-case conditional case <expression> of
<case 1>
<series of actions_1 to be executed>
………………………………………………….
<case n>
<series of actions_n to be executed>
else
<series of actions_else to be executed>
end_case
Repetition
 while do end_while repeat until for from to step end_for
Output write<elements list>
Constants Defined through assignment to a variable
Variables (Input-Output) Data //…//
Results //…//
Functions We don’t define any specific functions. We use whichever we need (e.g. mathematical functions etc.)

Now, a few words about the above elements. Comments are explanatory text intended for other people reading the pseudocode. Assignment is giving a value to a variable. A conditional structure is used in order to execute some series of actions if a certain codition is met. A case structure offers better readability and helps avoiding deep nesting of simpler conditional structures, provided that it is appropriate for the case at hand. Repetition structures are used in order to execute some specific series of actions while (i.e. as long as) a certain condition is fulfilled or for a specific number of times. A constant takes a specific value that does not change during the execution of the algorithm (i.e. of the program that implements it). In contrast, a variable can take different values within a specific range e.g. integers. As operators for the math we use $+$, $-$,  $*$,  $/$,  $=$, $>$, $<$, $\geq$, $\leq$ as known from math,  mod for the remainder operator and  <>  for “not equal”.

All the above elements give a pretty good idea about the description of an algorithm in pseudocode. I have omitted more advanced constructs –  like procedures and functions for example, for two reasons. First, Pascal is almost not used anymore – I mean the old Turbo Pascal but of course there are modern descendants like Object Pascal, for anything other than educational purposes – and even this, is very limited now. So, anyone wanting to get into the programming realm will likely pick some other language at some point but all the above will still be useful. Second, all this stuff is formally taught along with a real programming language, so it is advisable that a reader that does not have such a requirement in his / her formal courses and / or wants to learn programming on his / her own should follow some tutorial or go through some textbook or other relevant resource, in order to learn the programming basics. The way I was taught these was pseudocode in a pure algorithmic and data structures course along with a course on Turbo Pascal. Later on, I picked up C and I can say that this is by far the most influential, concise and “lowest” high level procedural programming language I’ve ever met as is also its descendant – albeit not only procedural and way richer in many regards, C++. Another thing that I also want to point out, is that you may come across Pascal-like pseudocode that is somewhat different, as there is no absolute standardization worldwide but in any case the basic elements and notions are still the same. In the third part of this insights series I’ll try to give some facts, thoughts, tips and things drawn from my own experience as a software developer about programming languages, programming and tools of the trade.

As the reasons why emphasizing so much the pseudocode representation have become clear, I’ll give two examples of somewhat more involved (for the present context) algorithms, in order to see some of the elements of pseudocode in action.

Example 2

Problem: Write an algorithm that reads (i.e. takes as input) ten integers and finds minimum (min) and maximum (max).

Solution:

Algorithm MAXMIN

Data //max, min, i, n:INT//        ! here we do variable definition and declaration
max = -30000                              ! we give some arbitrary values for min, max and use the trick
min = 30000                                ! of max be low and min be high

i$\leftarrow$1                                              ! initialize counter for repetitions

while i$\leq$10 do                             ! while…do repetition for ten numbers
$\space$$\space$$\space$$\space$$\space$read n

if n$>$max then                            ! if…then conditional structures for comparing numbers
$\space$$\space$$\space$$\space$$\space$max$\leftarrow$n
end_if

if n$<$min then
$\space$$\space$$\space$$\space$$\space$min$\leftarrow$n
end_if

i$\leftarrow$i + 1                                           ! increase counter by 1 to proceed to the next iteration
end_while
write max,min                               ! display max and min number

In the above example, the arbitrary values we give for min, max are meant to convey the idea of the trick.  The specific values we can pick for them depend on the range of the integers we want to cover. Evidently, giving all integers greater than the minimum or less than the maximum we have set, will not work. As a more efficient way, we can write our algorithm as below (I’ll write only the interesting part; the reader is encouraged to fill in the missing parts by comparing to the above version):

Algorithm MAXMIN

## ………………

i$\leftarrow$2

max $\leftarrow$ n

min  $\leftarrow$ n

while i$\leq$10 do

$\space$$\space$$\space$$\space$$\space$read n

## ……………..

Example 3

Problem: Write an algorithm which computes the sum $S = 1 – 2 + 3 – 4 + 5 – … + 99 – 100$

Solution:

In order to make this more interesting, let’s do it in two different ways (here I have only the interesting part of the pseudocode; the rest elements can be easily filled in)

1st way

Split the sum into two sums (positive – negative)

sum1$\leftarrow$0

for i from 1 to 99 step 2
$\space$$\space$$\space$$\space$$\space$sum1$\leftarrow$sum1 + i
end_for

sum2$\leftarrow$0

for i from -2 to -100 step -2
$\space$$\space$$\space$$\space$$\space$sum2$\leftarrow$sum2 + i
end_for

sum$\leftarrow$sum1 + sum2

2nd way

sum$\leftarrow$0
sign$\leftarrow$$\space$-1

for i from 1 to 100
$\space$$\space$$\space$$\space$$\space$sign$\leftarrow$sign$\cdot$(-1)
$\space$$\space$$\space$$\space$$\space$sum$\leftarrow$sum + (sign$\cdot$i)
end_for

As a couple of last things for representations, first, writing first pseudocode in the design phase, we can test the results with pen and paper and see if our construct does what is intended to do or not, otherwise we’ll end up having logical errors that may be extremely difficult to spot or realize inside a real program. In this regard, it is very useful to create a table that holds the values of variables and counter(s) of the loop(s), execute the algorithm in our mind and write down the results. Are these correct? As a programmer becomes better, he / she will at some point start to use premade code for the algorithms, in the form of libraries or various other forms but the above recommendations still pay off in the long run because there are many cases, where we want to develop a version of an algorithm fitting specific needs that may not be readily available or modify some premade code according to our specific needs. Second, the representations presented so far are all very simple because we have not used any data structures. In the second part of this series I’ll give some more involved examples.

Correctness of algorithms

Algorithms are mathematical procedures and as such their correctness is something that can be (and is) checked and verified. We use math in order to prove that an algorithm gives always correct result(s). In order to check the correctness, it is not sufficient to test the algorithm with any number of different sets of data because this, gives no guarantee that the algorithm always gives correct results or even finishes its job at all. In order to prove the correctness of an algorithm in a strict way, two conditions must be satisfied: we must prove that the algorithm gives correct results in any case of execution and also prove that the execution of the algorithm indeed terminates in any case. For the first, a very often used method is mathematical induction. Let’s see an example of how this method works.

Example 4

Euclides algorithm for $GCD$(Greatest Common Divisor) of two positive integers a.k.a. $HCF$(Highest Common Factor)  , is a very old one (developed in about 300 B.C.). It is based on the relations

$GCD(x,y) = GCD(y$, remainder of the division $\frac{x}{y})$ if $y > 0$,
$GCD(x,y) = x$                                     if $y = 0$

where $x$,$y$ are any non-negative integers.

We’ll use natural language in steps because we are just interested in the proof of correctness

while y$<>0$ do
$\space$$\space$$\space$$\space$$\space$calculate the remainder of x/y
$\space$$\space$$\space$$\space$$\space$put the value of y to x
$\space$$\space$$\space$$\space$$\space$put the value of the remainder to y
$\space$$\space$$\space$$\space$$\space$**
write x

Inductive hypothesis: When we reach the line **, $GCD(x,y) = k$

Induction basis: Initially, $x = ka$ and $y = kb$ for two numbers $a$ and $b$ which don’t have any common factors. Let $r$ be the remainder of the division $\frac{x}{y}$. Then $x = my + r$. So, we have $ka = mkb + r$ (1) so $r = k(a – mb)$. Consequently,  $r$ must have $k$ as a factor. From (1) it also follows that $\frac{r}{k}$ has no common factor with $b$ otherwise $a$ would have this same factor but this is a contradiction as we have assumed that $a$ and $b$ don’t have any common factors. So, it follows that $y$ and $r$ have $k$ as highest common factor. So, the first time we reach the line ** we have $GCD(x,y) = GCD(y$, remainder of $\frac{x}{y}) = GCD(y,r) = k$.

Inductive step: From the above we also have that the $(n + 1)$-th time we reach the line ** , $GCD(x,y)$ is equal to the $GCD$ of $x$, $y$ for the $n$-th time we reached this same line. This is the value $k$. So, we have this same value for the $(n + 1)$ – th time and so we have proved that the inductive hypothesis holds true for $n + 1$ Q.E.D

So, we proved that every time we reach the line ** the result is correct. This guarantees that the algorithm is partially correct. We say “partially” because there is also the second condition I mentioned above: that we must also prove that the execution of the algorithm indeed terminates in any case. Up to here, we have no guarantee for this. In order to prove it, we notice that $y$ decreases at least by $1$ for each loop execution. For each subsequent execution, $y$ becomes equal to a number between $0$ and $y – 1$ so, finally, it reaches $0$ and the loop terminates. So, we have proved the second condition as well.

Classification of algorithms

Algorithms can’t be classified in some universal way. In a broad sense, they can be classified according to their design pattern, according to the kind of problem they solve or according to their complexity. According to the first, they can be classified into divide and conquer, greedy, dynamic programming etc. algorithms, so the way they solve the problem gives a hint about what kind of problems is each best at. According to their complexity, they can be classified to constant time, linear time, logarithmic time, quadratic time, exponential time etc. Algorithmic complexity is concerned about how fast or slow a particular algorithm performs.

Of particular interest for a beginning programmer are sorting algorithms. Sorting is the process of putting a list of objects (e.g. numbers, strings, characters) in ascending or descending order, according to some criteria. Sorting, besides being useful in itself,  is also useful in giving us a sorted list of objects as an input to some other process.
The first algorithm of this kind often taught in an introductory CS context, is Bubble Sort$^{[6]}$. This algorithm is easy to be understood and implemented but is not particularly efficient for big data sets. Two very efficient sorting algorithms are Merge Sort$^{[7]}$ and Quicksort$^{[8]}$ and variations of them like 3-way Merge Sort and Quick Sort with 3-way partition.

In the second part of this series, I’ll refer to the dynamic programming strategy and some interesting cases of algorithms that follow this strategy.

Analysis of algorithms

In order to check the efficiency of an algorithm in a theoretical scientific way, we perform the task of analysis. We can also compare two algorithms regarding their efficiency using this task. The term was coined by Donald Knuth. Using analysis we determine the computational complexity of an algorithm i.e. the amount of time, space(memory – storage) and  / or other resources necessary for its execution. For the complexity of algorithms we use Asymptotic Analysis.

Can’t we just run our program on our computer and see its performance? Or implement two algorithms that we want to compare regarding their performance and just run the programs?
The answer is that the problem we don’t take into account this way is the input size. It maybe the case that one (implemented) algorithm has good performance for some input sizes but this changes from a certain size on. Or in the case of comparing the implementations of two different algorithms, the first performs better for some input sizes but the second for other. Also, if we run them on two different machines, the results may be not what we expect. So, in order to get rid of specific details and evaluate the performance of some algorithm(s) according to their input size, we use Asymptotic Analysis. In this type of analysis, we analyze the running time or space usage of programs by trying to estimate the time or space as function of the input size. The asymptotic behavior of such a function $f(n)$ refers to the growth of this function as $n$ gets large. So, we have a good measure of the efficiency – implied from the complexity, of an algorithm  but asymptotic analysis is not perfect. If, for instance, we have two algorithms with the same order of growth whose running time differs by some constant, we can’t tell which one is better, as we ignore constants in Asymptotic Analysis. Also, in this analysis, we talk about input sizes larger than some constant but it may be the case that such an input size is never really given to our algorithm.

Asymptotic Analysis uses asymptotic notation$^{[9]}$ This is a mathematical representation of its complexity.
Big – Oh $(O)$ notation is used to define the upper bound of an algorithm in terms of time complexity.
Big – Omega $(\Omega)$ notation is used to define the lower bound of an algorithm in terms of time complexity.
Big – Theta $(\Theta)$ notation is used to define the average bound of an algorithm in terms of τime complexity.

In the next part of this series, I’ll talk about data structures.

#### Bibliography

Niklaus Wirth “Algorithms and Data Structures” Longman Higher Education 1986
Les Goldschlager and Andrew Lister “COMPUTER SCIENCE A MODERN INTRODUCTION” Second Edition, Prentice-Hall 1988

#### Recommended Bibliography

Thomas H. Cormen, Charles E. Leiserson, Ronald R. Rivest, Clifford Stein “Introduction to Algorithms” Third Edition, MIT Press 2009
Steven S. Skiena “The Algorithm Design Manual” Second Edition, Springer 2008

Jon Kleinberg,   “Algorithm Design” First Edition, Pearson 2005

Ronald L. Graham, Donald E. Knuth, Oren Patashnik “Concrete Mathematics: A Foundation for Computer Science” Second Edition, Addison-Wesley Professional 1994

#### Recommended Resources

“Algorithms Design and Analysis” Part 1, Stanford Lagunita

Tags:
25 replies