# Multicore Parallel Programming & Pitfalls

CS 441/641 Lecture, Dr. Lawlor

Let's say I want to print out 20 integers, and for some insane reason I care how fast this happens.
`const int n=20;double start=time_in_seconds();#pragma omp parallel for    /* make this loop multicore! */for (int i=0;i<n;i++) {	#pragma omp critical  /* do this stuff single-core */	{		std::cout<<"i="<<i<<"\n";	}}double end=time_in_seconds()-start;double per=end/n;std::cout<<"Time: "<<per*1.0e6<<" microseconds per iteration\n";`

(Try this in NetRun now!)

This takes 10us per print, which is pretty slow.  But it's only using one core.

Right, then: let's put more processors on the job, using the very simple OpenMP directives.  These split up the loop iterations across the cores, so core 0 gets iterations 0-4, core 1 iterations 5-9, core 2 iterations 10-14, and core 3 iterations 15-19.
`const int n=20;double start=time_in_seconds();#pragma omp parallel for    /* make this loop multicore! */for (int i=0;i<n;i++) {	std::cout<<"i="<<i<<"\n";}double end=time_in_seconds()-start;double per=end/n;std::cout<<"Time: "<<per*1.0e6<<" microseconds per iteration\n";`

(Try this in NetRun now!)

Here's the other problem: it never quite runs the same way twice.  Hmmm.
 `i=i=0i=1i=2i=3i=4i=5i=6i=7i=8i=915i=16i=17i=18i=19i=10i=11i=12i=13i=14` `i=i=0i=1i=2i=3i=4i=10i=11i=12i=13i=1415i=16i=17i=18i=19i=5i=6i=7i=8i=9` `i=i=0i=1i=2i=3i=4i=15i=16i=17i=18i=195i=6i=7i=8i=9i=10i=11i=12i=13i=14` `i=i=0i=1i=2i=3i=4i=5i=6i=7i=8i=915i=16i=17i=18i=19i=10i=11i=12i=13i=14` `i=i=0i=1i=2i=3i=4i=15i=16i=17i=18i=195i=6i=7i=8i=9i=10i=11i=12i=13i=14` `i=i=i=0i=1i=2i=3i=45i=6i=7i=8i=915i=16i=17i=18i=19i=10i=11i=12i=13i=14`

We can improve matters a little bit by making each entire output line a "critical section", so it's done as a unit.  This makes the lines a little cleaner, and actually improves the speed a bit, to 24us per print, only 140% slower than single core.
`const int n=20;double start=time_in_seconds();#pragma omp parallel for    /* make this loop multicore! */for (int i=0;i<n;i++) {	#pragma omp critical  /* do this stuff single-core */	{		std::cout<<"i="<<i<<"\n";	}}double end=time_in_seconds()-start;double per=end/n;std::cout<<"Time: "<<per*1.0e6<<" microseconds per iteration\n";`

(Try this in NetRun now!)

 `i=5i=6i=7i=8i=9i=0i=1i=2i=3i=4i=15i=16i=17i=18i=19i=10i=11i=12i=13i=14` `i=5i=6i=7i=8i=9i=0i=1i=2i=3i=4i=15i=16i=17i=18i=19i=10i=11i=12i=13i=14` `i=10i=11i=12i=13i=14i=0i=1i=2i=3i=4i=15i=16i=17i=18i=19i=5i=6i=7i=8i=9` `i=5i=6i=7i=8i=9i=0i=1i=2i=3i=4i=15i=16i=17i=18i=19i=10i=11i=12i=13i=14` `i=5i=6i=7i=8i=9i=0i=1i=2i=3i=4i=15i=16i=17i=18i=19i=10i=11i=12i=13i=14`

`const int n=10000;double start=time_in_seconds();char results[n]; /* array of resulting strings (so each core can write its own)*/#pragma omp parallel for    /* make this loop multicore! */for (int i=0;i<n;i++) {	sprintf(results[i],"i=%d\n",i);}/* concat & print all strings? */double end=time_in_seconds()-start;double per=end/n;std::cout<<"Time: "<<per*1.0e6<<" microseconds per iteration\n";`