Thread Programming vs Parallel Programming

I came across some interesting information the other day while googling a thread pool problem, apparently there’s this whole new “Parallel Programming” model coming to .NET 4.0. And, being the curious george I am, I read up a bit about it and it’s some pretty cool stuff. Lemme break it down for you.

Right now, if we want stuff running concurrently or asynchrounously, we use threads (or the method we call spawns a thread for us). Which is all fine and good, as far as we’re concerned we have two (or more) seperate methods running at the same time that can do tasks independent of the main application’s thread. But what actually happens with that? In a time-slice based OS (like most of Windows, Mac OS X, and a good chunk of *nix builds), we get something like this:

As you can see, each thread is actually running at different times, it just happens so fast that we don’t really notice it. The OS is assigning each process to a core, and then each thread within the process gets a chunk of time to execute on the designated core. Which has worked pretty good for a number of years. But, because we’ve reached a practical limit as far as processor speeds go, manufacturers have started dumping more cores onto a cpu, which works pretty similar to having multiple CPUs installed in a system. So now your OS can say “k, .net app 1 runs on core 1, MSN can run on core 2, and because core 1 isn’t really that busy we’ll let uTorrent run on there”. Again, seems to work pretty well. But what happens when your application is a huge resource hog? If your app is using 100% of Core 1, you’re really only using 50% (or less) of the entire system, regardless if the other core is busy or not! Here’s where one of the benefits of Parallel Programming kicks in:

With parallel programming, an application has access to all cores/cpus available to a system, and delegates what runs where and at what time. You might also notice that we now have “tasks” instead of threads – this is another key component of parallel programming. Instead of one big method that does what it needs to and then terminates when it’s done (i.e. a thread), we have a series of tasks that can either run asynchronously or be dependent on one another. Once we have these tasks and dependancies defined, the parallel framework takes care of the execution plan for us. In a code comparison (yes, it’s a horrible example):

Thread Programming
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Thread t = new Thread(new ThreadStart(ThreadMethod()));
t.start();

public void ThreadMethod()
{
    Connection c = new Connection();
    while(c.IsOpen())
    {
        string s = c.ReadLine();
        StreamWriter sw = new StreamWriter("c:\logs\streamlog.txt");
        sw.WriteLine(s);
        sw.Close();
        Console.WriteLine("Received message: " + s);
    }
}
Parallel Programming
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Task listen = new Task(Listen())
listen.start();

public void Listen()
{
    Connection c = new Connection();
    while(c.isOpen())
    {
        new Task(ReadInput(c)).Start();
    }
}

public void ReadInput(Connection c)
{
    string s = c.ReadLine();
    new Task(WriteToLog(s)).Start();
    new Task(WriteToConsole(s)).Start();
}

public void WriteToLog(String s)
{
    StreamWriter sw = new StreamWriter("c:\logs\streamlog.txt");
    sw.WriteLine(sw);
    sw.Close();
}

public void WriteToConsole(String s)
{
    Console.WriteLine(s);
}

Because each processing item is broken down into tasks, the tasks that are dependent on one another simply execute after their parent task is finished, and tasks that can run parallel are executed at the same time (like WriteToLog and WriteToConsole in the example above). Granted the example is very simple, and probably nothing like the actual implementation is going to be in .NET 4.0, but it gets the point across. Another neat thing to note is that the LINQ methods are going to be reworked to PLINQ methods, so they can automatically take advantage of this parallel processing model without having to change much of your code:

LINQ.AsParallel
1
2
3
4
5
return from baby in babies.AsParallel()
         where baby.Name == qi.Name && baby.State == qi.State &&
               baby.Year > qi.YearStart && baby.Year < qi.YearEnd
         orderby baby.Year ascending
         select baby;

In this example, we see a LINQ Where clause that normally would check each “Baby” in sequence from the collection and decide whether to return it or not – by adding “AsParallel()”, PLINQ knows that it can run each comparison as a Task and (providing there’s cores and/or CPU time to take advantage of) complete the operation much quicker.

Comments