Friday, 7 September 2012

Introduction to Multithreading & Concurrency Part 2

Continuing on from the last post I am going to discuss threads in a little more detail and more specifically show one way to implement them in C++.

There are a variety of ways to do this but I am going to use the code that was added into C++ Standard Library from the <thread> header. Therefore to compile the code you will require a fairly up to date compiler. Personally I am using Visual Studio 2012. The theory behind this is very similar independent of what you actually use for threading(posix etc) but for this example im sticking to the C++ std library. So Ready to start?

Once you have a funky, new up to date compatible compiler you are ready to look into the goodness of multithreading. But where do I start you ask? What does a multithreaded C++ program even look like you say? Well the answer to these questions is simple. A multithreaded C++ programs looks like any ordinary C++ program and has the usual mix of variables, classes, functions and methods etc. The only real difference is that some of these functions may run concurrently, so there is more emphasis on code management ensuring that shared data is safe for concurrent access. But don't worry too much, I will go over that in more detail in the near future. For now lets have a look at some code!

To start things off let's look at a nice and simple C++ program; Hello World!. 

#include <iostream>

int main( int argc, char* argv[] )
{
    std::cout << "Hello World!" << std::endl;
    return 0;
}


A really nice and simple program that everyone has seen that has coded in C++ for any longer than 5 minutes, and if not...well you really shouldn't probably be trying to learn multithreading in the first place.

All this program does is write "Hello World!" to the standard output stream. Now compare it to the following code.

#include <iostream>
#include <thread> // The new header allowing us to multithread in C++!

void HelloWorld( )
{
    std::cout << "Hello Concurrent World!" << std::endl;
}

int main( int argc, char* argv[] )
{
    std::thread t(HelloWorld);
    t.join( );

    return 0;
}


So what's different about this code? This is a multithreaded version of the same code. Yup that's right a multithreaded HelloWorld Program. Booyah!

So what are the real differences in the code then? Well the first difference is the additional header file <thread>. This is the declaration to allow us multithreading support in C++. The functions and classes for managing all the threads are declared in this header file. There are other useful headers that I will discuss in future posts for protecting data and the likes however.

The second difference is that the code for writing the message has now been moved to a separate function. The reason behind this is that every thread has to have an initial function, where the new thread begins. For the initial thread in an application this is the main( ) function, but for every other thread, as specified in the constructor of std::thread objects - in this program the thread object named t, has the function HelloWorld( ) as it's initial function.

Instead of writing directly to the standard output or calling the HelloWorld( ) function from main( ), this program instead launches a new thread to do it, and in doing so brings the total number of threads in the program to two. The initial thread that starts at main( ) and the new thread that starts at HelloWorld( ).

The last and final change is the t.join( ) call. After a new thread has been launched the initial thread continues execution. If it didn't wait until the new thread finished then it would simple reach the return call and exit out of the main function, possibly before the new thread has had a chance to finish what it had to do. This is why we use the call to t.join( ). This ensures that the calling thread(in main( ) ) waits for the new thread to finish before continuing on its merry way.

All of these changes have been highlighted in bold print so as to easily pick them out.

This may seem like a lot of work to simple display a hello world program, and your right it is, and it's certainly not worth multiple threads to do it, but at the same time its a simple program that shows off multhreading in its most basic form.

This is simply a taster of what is to come and in the next post I will go into more detail regarding multithreading and concurrency, given examples of programs that can benefit with running concurrently.

Until then, thanks for reading, James .

Introduction to Multithreading & Concurrency Part 1

Concurrency and Multithreading in Programming are two areas in which I have been interested in for a fair while now, but for some reason I have neglected to look into them properly. I therefore decided to fix this once and for all and have decided to invest some of my spare time and investigate these topics.

For those of you who do not know much about Multithreaded programming or concurrency I'll start by giving a brief explanation of these concepts. First up is a non-computer/code related example.

At the simplest level concurrency is allowing two or more separate activities to happen at the same time. A really simple way to think of this is to think of activities that you yourself carry out each day - walking while talking, performing different activities with each hand and probably the most important example, the ability to go about your lives independently of everyone else. For instance you can play a video game while someone else does a completely different activity.

In computer terms concurrency and multithreading essentially is the ability to run multiple pieces of code in parallel with each other rather than running the code sequentially. These pieces of code can therefore be completely independent of each other but they are allowed to depend closer on each other if needs be.

Just like everything else code related there are multiple ways to go about it but there are two main approaches when it comes to concurrency. To easier describe these approaches think of this scenario:

Two people work in the same building. If each person has his own office then it allows them to go about their work in peace without being disturbed by the other person. It also allows as them to have all their own materials and resources. Communication between the two people however is not straightforward; they either have to phone each other, email or visit the other person's office. They do not have the ability to just turn round and speak to the other person. There is also the downside of having to manage the overheads of two offices, two sets of resources etc.

On the other hand if there is two people that work in the same office then communication between the pair is extremely easy and there is no need to have the overheads for having more than one office or more than one set of resources. The downside to this is lack of peace for each worker and not being able to have the resources available to you at all times i.e. the other person may be using them.

Both of these examples are essentially the approaches that can be taken when it comes to concurrency. If each person represents a thread and each process is represented by an office then the first example is the same as having multiple single-threaded processes whereas the second approach is the same as having multiple threads in a single process. These approaches can be combined of course and you can have multiple processes, some of which are mulithreaded and some of which are not, but the basic principles still remain the same.

It is the latter of these approaches that I am going to focus on - Concurrency with Multiple Threads.

A thread is essentially a lightweight process: each thread runs independently of the others, and each thread can run its own set of instructions. Every thread however shares the same address space and most of the data can be accessed from all threads. Global variables remain global and references and pointers can be passed around among threads.

Because the overhead on memory is generally lower when launching multiple threads within a single process as opposed to launching multiple single-threaded processes means that this is the usual approach when using concurrency especially within C++. However this method is not without its own problems such as the management of shared resources. I will go into much more detail regarding this in further posts. 

Concurrency is not always the best solution however and each project will have to be carefully developed to ensure that multithreading and concurrency are worthwhile. Again this is something that I will go into more detail about it the future.

In the next post I will go into some more detail regarding Concurrency and Multithreading and more importantly I will be posting some code that will better explain methods of concurrency and multithreading in C++.

Until next time,
Thanks for reading,
                                James