Random Facts

A 64-bit Architecture can technically address up to 2^64 Bytes, or 16 exabytes In practice x86-64 only uses the lower 48 bits, meaning it supports only 2^48 = 256 TB

Multi-Value Types

Struct

This can be defined as such:

struct point { int x; int y; };
int main() {
	struct point p = {1,2};
}

or using the shorthand:

typedef struct {int x; int y;} point;
int main() {
	point p = {1, 2};
}
Strings

Strings in C are represented as arrays of characters

char greeting[] = "Hello World";

is the same as

char greeting[] = {'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '\0'};

We use double quotes ” to write a (ASCII) string literal and single quotes ’ to write a character literal

Strings in C are terminated by the special character '\0' that is added automatically for string literals

To print a string with printf we use the %s formation character

printf("%s\n", greeting);

If we forget the terminating '\0' character this will print the content of the memory until it hits the next bit pattern equivalent of '\0'!

Functions

Definition vs Declaration

All declaration does is define the interface for how to interact with the function, i.e.:

int max(int lhs, int rhs);

This is what the compiler uses to verify the validity of calls, and then the linker will search for the Definition, which can be in a seperate file or library

The definition specifies how the function works:

A function definition in C looks like this:

int max(int lhs, int rhs) {
	if (lhs > rhs) { return lhs; } else { return rhs; } 
}

The return type specifies the data type of the value that will be returned after evaluating the function. If a function returns no value the special type void is used as return type

The name which should describe the behaviour of the function

A parameter list specifies the data type and name of each parameter expected by the function

The function body is a block containing the code executed when calling the function

To call a function we provide an argument for each parameter and capture/process the return value

int x = max(5, 3);

Pass by Value vs Pass by Reference

Pass by Value

In programming, “Pass by Value” means that:

  • When a function is called, a copy of the argument’s value is passed to the function.
  • Changes made to the parameter inside the function do not affect the original argument outside the function.

This method is typically used with primitive data types like integers and characters.

It ensures that the original data remains unchanged and secure from unintended modifications within functions. However, it can be less efficient for large data structures or objects due to the overhead of copying data.

#include <stdio.h>
 
void modifyValue(int num) {
    num = 20;  // Attempt to modify the parameter within the function
    printf("Value inside function: %d\n", num);
}
 
int main() {
    int x = 10;
    modifyValue(x);  // x is passed by value
    printf("Value in main after function call: %d\n", x);  // x remains unchanged
    return 0;
}
 
// Output:
// Value inside function: 20
// Value in main after function call: 10
Side Note: Arrays

Arrays are typically not passed fully by value, as this is expensive

Pass by Reference

“Pass by Reference” involves passing the actual address (reference) of an argument to a function, rather than a copy of its value.

This allows the function to modify the original argument directly, reflecting changes outside the function.

It is more memory-efficient, especially for large objects or arrays, as it avoids the need for copying data. However, it can lead to bugs if the data is unintentionally altered.

This method is commonly used in languages that support reference types, like C++ and Java (via object references).

#include <stdio.h>
 
void modifyReference(int *num) {
    *num = 20;  // Directly modify the value at the address
    printf("Value inside function: %d\n", *num);
}
 
int main() {
    int x = 10;
    modifyReference(&x);  // Address of x is passed
    printf("Value in main after function call: %d\n", x);  // x is changed to 20
    return 0;
}
 
// Output:
// Value inside function: 20
// Value in main after function call: 20

Function Pointer

Function names are automatically converted to function pointers, i.e. we do not have to write &func_name

return_type (* function_name)(argument_types)

Pointers

All pointers have a size of 8 bytes, at least on 64 bit architectures.

The & “address of” operator gives the value of the address that the variable is stored at. The * “dereference” operator is the inverse, and lets us access the value of the variable at the address we are “pointing” to.

We typically use the macro NULL to represent a pointer that currently points to nothing. This unfortunately has the problem of: Dereferencing a NULL pointer will crash your program

Const

Pointers can be const, i.e. unmodifiable, in three ways:

  1. The pointer itself, i.e. the address, can not be changed: float * const ptr;
  2. The value we are pointing to can not be changed: const float * ptr;
  3. Both value and pointer can not be changed: const float * const ptr;

Pointer Arithmetic

Pointer arithmetic to modify the value of a pointer. (useful in the context of arrays)

We can:

  1. Add and subtract integer values to/from a pointer (float, etc. is invalid)
  2. subtract two pointers from each other
  3. compare pointers
int vector[6] = {1, 2, 3, 4, 5, 6}; 
int * ptr = vector; // start at the beginning 
while (ptr <= &(vector[5])) {
	printf("%d ", *ptr); // print the element in the array 
	ptr++; // go to the next element 
 } 

Pointer arithmetic takes into account the size of the type the pointer is pointing to:

int * i_ptr = &i;
char* c_ptr = &c;
i_ptr++; // this adds 4-bytes (1x sizeof(int)) to the address stored in i_ptr c_ptr+=2; // this adds 2-bytes (2x sizeof(char)) to the address stored in c_ptr 

Note: Pointers cannot be added or multiplied

Linked Lists

Typically:

struct node { char value; struct node * next; };

With the last node pointing at NULL:

int main() {
	struct node c = {'c', NULL};
	struct node b = {'b', &c};
	struct node a = {'a', &b};
	
	
	struct node * ptr = &a; 
	while (ptr) {
		printf("%d\n", ptr->value);
		ptr = ptr->next; 
	} 
}

Note: ptr->m is a notation to access member m of a struct-pointer ptr and is equivalent to (*ptr).m

Void Pointer

This provides a generic means to address something, but is not concrete, which means that dereferencing a void pointer is forbidden

Dynamic Memory Management

The heap is a region of a computer’s memory used for dynamic memory allocation, where variables are allocated and freed at runtime. Unlike the stack, which manages memory in a last-in-first-out (LIFO) manner for static memory allocation, the heap allows for more flexible memory usage, accommodating varying-sized data blocks as needed by an application.

Mismanagement of the heap, such as failing to free allocated memory, can lead to memory leaks.

Usage:

We request memory from the heap using the malloc function, specifying how many bytes we want:

void* malloc( size_t size); // in <stdlib.h>

This returns a void pointer to the first byte of the block on success, or NULL if failed.

Memory that is malloced must be freed

void free( void* ptr );

Otherwise this memory is “Leaked”, which means that it is allocated, and unusable by other programs but not used. This means the memory is potentially permanently wasted until the end of the program.

Segmentation Fault (SegFault)

Dangling pointer

  • a reference to memory which has been de-allocated then allocated to another var
  • if you NULL after freeing, it is easier to check if this pointer is “active” Derefrencing a NULL pointer
  • NULL is an invalid memory location Buffer overflow
  • accessing memory outside allocated bounds, typically with arrays
  • you can read the values in these extra locations, but this data would be corrupted Stack overflow
  • trying to write more data than your allocated memory
  • often triggered by a recursion without a base case Heap overflow
  • Memory leaks

Debugging

First, if we use the -g flag while compiling and enter lldb:

$ lldb -- ./program 12345
(lldb) run Process 85058 launched: './program' (x86_64) 
2018-10-21 20:56:00.106714+0100 program[85058:12827554] detected buffer overflow 
Process 85058 stopped

At this point the debugger has stopped, and we can use a backtrace (bt) :

(lldb) bt 
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT 
* frame #0: 0x00007fff76d1ab86 libsystem_kernel.dylib`__pthread_kill + 10 
...
frame #6: 0x00007fff76ca8e84 libsystem_c.dylib`__strcpy_chk + 8 
frame #7: 0x0000000100000eaf program overflow(argc=2, argv=0x00007ffeefbff300) at program.c:35 
frame #8: 0x0000000100000efb program`main(argc=2, argv=0x00007ffeefbff3c8) at program.c:42 
frame #9: 0x00007fff76bdc085 libdyld.dylib`start + 1 
frame #10: 0x00007fff76bdc085 libdyld.dylib`start + 1

Here frame 7 shows us the file (program.c) and line (35) that triggered the segfault

Linter

A linter (the name comes from the first UNIX tool to perform static analysis on C)

clang-tidy is a flexible tool that allows to enforce coding guidelines and to modernize source code. It is possible to extend clang-tidy by writing your own checks

It is invoked like clang, accepting the same flags but after two dashes: — A series of checks can be en- and disabled. Here we enable the checks for readability:

$ clang-tidy -checks="readability-*" 6.c -- -Wall -Werror /Users/lito/Desktop/examples/6.c:2:16: warning: pointer parameter 'p' can be pointer to const [readability-non-const-parameter] 
void test(int *p) {
           ~~~ ^ 
const /Users/lito/Desktop/examples/6.c:4:9: warning: statement should be inside braces [readability-braces-around-statements] 
if (p)
      ^
       {

– It suggests to to use const and put braces around the branch of an if statement

Dynamic Analysis

There exists a family of bug detection tools that use dynamic analysis

These tools need the program to run and can only detect bugs which are encountered during the execution of a particular test input

The clang project calls these tools sanitizers. The most important are:

  • [[#addresssanitizer|AddressSanitizer]] - a memory error detector
  • [[#memorysanitizer|MemorySanitizer]] - a detector of uninitialized reads
  • [[#leaksanitizer|LeakSanitizer]] - a memory leak detector
  • [[#undefinedbehaviorsanitizer|UndefinedBehaviorSanitizer]] - a detector of undefined behaviour
  • [[#threadsanitizer|ThreadSanitizer]] - a data race detector

AddressSanitizer

Address Sanitizer is a memory error detector for: – Out-of-bounds access (buffer overflow) – Use-after-free (dangling pointer) – Double free

This is executed as such: clang -fsanitize=address -fno-omit-frame-pointer -O1 -g -Wall -Werror program.c -o program

Where the additions of:

  • fno-omit-frame-pointer produces a readable call stack
  • O1 enables basic optimizations

MemorySanitizer

Memory sanitizer detects uninitialized reads to memory: clang -fsanitize=memory -fno-omit-frame-pointer -g -O2 umr.c

Example Output:

$ clang -fsanitize=memory -fno-omit-frame-pointer -g -O2 umr.cc 
$ ./a.out 
WARNING: MemorySanitizer: use-of-uninitialized-value
#0 0x7f45944b418a in main umr.c:6 
#1 0x7f45938b676c in __libc_start_main libc-start.c:226

This tool is under active development and currently only available for Linux and BSD

LeakSanitizer

This detects memory leaks: clang -fsanitize=address -g memory-leak.c

$ clang -fsanitize=address -g memory-leak.c 
$ ./a.out 
==23646==ERROR: LeakSanitizer: detected memory leaks 
Direct leak of 7 byte(s) in 1 object(s) allocated from:
#0 0x4af01b in __interceptor_malloc /projects/compiler-rt/lib/asan/asan_malloc_linux.cc:52:3 
#1 0x4da26a in main memory-leak.c:4:7 
#2 0x7f076fd9cec4 in __libc_start_main libc-start.c:287 
SUMMARY: AddressSanitizer: 7 byte(s) leaked in 1 allocation(s).

UndefinedBehaviorSanitizer

Undefined behaviour sanitizer detects various types of undefined behaviour • Here, an integer overflow is detected:

int main(int argc, char **argv) {
	int k = 0x7fffffff; // this is the largest possible signed int value ...
	k += argc; // ... this will produce an integer overflow
	return 0; 
}
 
 
% clang -fsanitize=undefined -Wall -Werror intOverflow.c -o intOverflow 
% ./intOverflow
intOverflow.c:3:5: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'

RAII

RAII, or Resource Acquisition Is Initialization, is a programming paradigm used primarily in C++ to manage resource allocation and deallocation automatically. The core idea behind RAII is that an object’s lifetime, specifically when it is constructed and destructed, should control resource usage such as memory allocation, file handles, and network connections.

In RAII, resources are acquired in a constructor and released in a destructor. This ensures that resources are properly cleaned up when an object goes out of scope, thus preventing resource leaks and ensuring more robust exception safety.

For example, when an object that manages a file handle is created, it opens the file in its constructor. When this object is no longer needed and is destroyed, its destructor automatically closes the file. This automatic management of resources simplifies the code and reduces errors since the cleanup activities are made implicit and tied to the object lifecycle.

The beauty of RAII lies in its leveraging of C++‘s deterministic object lifetime, which provides automatic resource management without requiring explicit cleanup code from the programmer. This results in cleaner, safer, and more maintainable code, particularly in the face of exceptions where manually managing resource cleanup can be complicated and error-prone.

For storing a single value on the heap we should use one of two “smart” pointers:

  • std::unique_ptr for unique ownership the default case, when a single variable owns the value on the heap
  • std::share_ptr for shared ownership when it is not possible to identify a single owner (e.g. multi-threaded progs)

POSIX Threads

POSIX Threads (short pthreads) is the most commonly used threading implementation for C To use it we need to #include <pthread.h> and specify a compiler flag –lpthread

Create Thread

Threads are created with:

int pthread_create(pthread_t *thread,
	const pthread_attr_t *attr,
	void *(*start_routine)(void*),
	void *arg);

It takes four arguments:

  1. A thread identifier, i.e. a pointer to a memory location of type pthread_t.   2. Thread attributes which set properties such as scheduling policies or stack size.           Passing NULL results in default attributes.   3. A function pointer to the start_routine.             This function takes a single argument of type void* and returns a value of type void*.   4. The argument that is passed to start_routine.

It returns 0 if the thread is created successfully or a non-zero error code.

Passing pointers to and from start_routine allows the passing of arbitrary data.

It also requires care to ensure that the memory locations pointed to have appropriate lifetimes

Wait for Other Thread to Terminate

To wait for another thread to terminate we use pthread_join

int pthread_join(pthread_t thread, void **value_ptr);

It takes two arguments:

  1. A thread identifier.
  2. A pointer to a memory location of type void*.     The return value of the start_routine passed to pthread_create will be copied to this location.

It returns 0 on success and a non-zero error code otherwise

Key Threading Concepts

Race Conditions

A Race Condition occurs when two or more threads or processes attempt to change shared data at the same time.

Imagine you and your friend are both trying to be the first to grab the last slice of pizza; you both reach for it simultaneously without knowing who will get it first. In computer terms, this uncertainty can cause problems.

For instance, if two threads are trying to update the same counter, they might both read the value at the same time, increment it, and write it back, resulting in a lost update if not handled correctly.

Critical Regions

A Critical Region is a part of the code where shared resources are accessed or modified.

Using the pizza slice example, the moment when someone reaches out to grab the slice can be seen as a critical region. In programming, you need to make sure that only one thread or process enters this critical part at a time to prevent confusion or errors.

This is like saying only one person can reach for the pizza slice at any one moment, ensuring that the slice goes to one person without dispute.

Mutexes

Mutexes (Mutual Exclusions) are tools used to avoid race conditions by controlling access to critical regions.

A mutex is like a key to a locked door that leads to the critical region or the pizza slice. If one thread has the key (owns the mutex), it can enter the critical region while other threads must wait outside until the mutex is released (the door is unlocked).

This ensures that only one thread can access the shared resource at a time.

Monitor

A class that provides thread safe access to a shared resource. (Like a bounded buffer) All public functions must provide mutual exclusion A java class where all the public methods are synchronised is an example of a monitor

General Flow of Resolving Multithreading Issues

  1. Identify the Problem (Race Condition): You realize that if multiple threads try to update the same data concurrently, you might end up with incorrect data because the threads are racing each other to complete their task.
  2. Protect the Solution (Critical Region): You determine which parts of your code access or modify shared resources. This part is your critical region, which needs protection to ensure data accuracy.
  3. Implement a Tool (Mutex): You use a mutex as a tool to lock access to the critical region. Only the thread that holds the lock can enter this protected section at a time, while others must wait, thus preventing any overlap and ensuring data is updated correctly and consistently.

This means, before executing a critical section, so as to avoid a race condition, the thread must acquire a lock (mutex), and release this lock (mutex) on leaving.

If a thread tried to acquire a lock that is already owned by another thread, the new thread is blocked until the owner releases.

ThreadSafe Linked List in Cpp

We encapsulate an std::list<int> and add a thread-safe interface protected by a (private) std::mutex

#include <list> #include <thread> #include <optional> #include <mutex>
struct list {
 
private:
	std::list<int> list; std::mutex mutex;    // mutex to protect critical section
	
public:
	void append_to_list(int value) {
	  std::unique_lock<std::mutex> lock(mutex); // lock mutex
	  list.push_back(value);
	}          // mutex will be automatically unlocked here
	
	std::optional<int> remove_from_list(int position) {
		std::unique_lock<std::mutex> lock(mutex); // lock mutex
		auto iter = list.begin();
		while (position > 0 && iter != list.end()) { iter++; position--; }
		if (position != 0 || iter == list.end()) { return {}; /* return nothin */ }
		int value = *iter;
		list.erase(iter);
		return value;
	}          // mutex will be automatically unlocked here
};
 

Deadlock and Busy Waiting in the Bounded Buffer Problem

In concurrent programming, deadlock can occur when multiple processes hold onto resources while waiting for others to release different resources, preventing any from moving forward.

For Example: The bounded buffer, is shared between producers (who add items) and consumers (who remove items), and provides a clear example of potential deadlock situations.

Deadlock Scenario

Consider a setup with two mutexes: buffer_mutex for buffer access and count_mutex for managing the count of items. A deadlock may develop if:

  1. Producer Process:

    • Acquires buffer_mutex to add an item.
    • Also needs count_mutex to update the item count.
  2. Consumer Process:

    • Acquires count_mutex to check the buffer’s item count.
    • Attempts to acquire buffer_mutex to remove an item.

If both processes hold one mutex while attempting to acquire the other, neither can proceed:

  • The producer holds buffer_mutex and waits for count_mutex.
  • The consumer holds count_mutex and waits for buffer_mutex.

Busy Waiting as a Solution

Busy waiting involves a process repeatedly checking a condition while waiting for a resource to become available. It’s a simple method to avoid deadlock under certain conditions but can lead to inefficient CPU usage. In the bounded buffer context, busy waiting can be implemented by having processes repeatedly attempt to acquire a lock until they succeed, without ever blocking.

To incorporate busy waiting effectively:

  • Ensure processes release their mutex if they cannot acquire the second one, then retry acquiring both after a short delay.
  • Use it conservatively to avoid high CPU consumption, especially in systems where thread or process responsiveness is critical.

Busy waiting can be a practical, if resource-intensive, strategy to prevent deadlocks in systems like the bounded buffer, where resource allocation order might lead to process standstills. However, this approach should be balanced with efficient CPU usage to maintain system performance.

Tea Maker Example
pthread_mutex_t m;
bool teaIsReady = false;
 
void *me(void* arg) { 
  (void)arg;
  printf("Me: Waiting for my tea ...\n");
  pthread_mutex_lock(&m);
  while (!teaIsReady) {
    pthread_mutex_unlock(&m);
    printf("Me: (Unamused) // do nothing\n");
    pthread_mutex_lock(&m); }
  pthread_mutex_unlock(&m);
  printf("Me: (Happy) ... finished waiting.\n");
  return NULL; 
}
 
void *teaRobot(void* arg) { 
  (void) arg;
  printf(" Tea Robot: Making tea ...\n");
  usleep(randInt());
  pthread_mutex_lock(&m);
  teaIsReady = true;
  pthread_mutex_unlock(&m);
  printf(" Tea Robot: Done!\n");
  return NULL; 
}
 
int main() {
  pthread_t t1; pthread_t t2; int err;
  err = pthread_mutex_init(&m, NULL); assert(err == 0); 
  
  err = pthread_create(&t1, NULL, me, NULL); assert(err == 0);
  err = pthread_create(&t2, NULL, teaRobot, NULL); assert(err == 0);
  
  err = pthread_join(t1, NULL); assert(err == 0); 
  err = pthread_join(t2, NULL); assert(err == 0);
  
  pthread_mutex_destroy(&m); 
}
 

Conditional Variables as a Solution

As we saw, busy waiting is wasteful for energy and CPU cycles (something, something PSD)

Its better instead to use a system where we control access to code, we use CVs to block threads and wait until we have a condition change, to unblock, and let the pick the lock up.

pthread_cond_wait(&cv, &m) must be called with a locked mutex m It releases the mutex and blocks the calling thread on cv

pthread_cond_signal(&cv) assigns mutex ownership to one of the threads blocked on cv, and wakes it

•The condition may have changed before the wakened thread is scheduled, so it’s important to check the condition again:

  pthread_mutex_lock(&m);
  while (!cond) {
    pthread_cond_wait(&cv, &m);}

This is the same idea as a hardware interrupt/signal

Tea Maker Example
pthread_mutex_t m; pthread_cond_t cv;
bool teaIsReady = false;
 
void *me(void* arg) { 
  (void)arg;
  pthread_mutex_lock(&m);
  while (!teaIsReady) {
     printf("Me: Waiting for my tea ...\n");
     pthread_cond_wait(&cv, &m); }
   printf("Me: (Happy) ... finished waiting.\n");
   printf("Me: Is the tea really finished? %s\n",
   teaIsReady ? "Yes" : "No");
   pthread_mutex_unlock(&m);
   return NULL; 
   }
 
void *teaRobot(void* arg) {
  (void) arg;
  printf(" Tea Robot: Making tea ...\n");
  usleep(randInt());
  pthread_mutex_lock(&m);
  teaIsReady = true;
  pthread_mutex_unlock(&m);
  pthread_cond_signal(&cv);
  printf(" Tea Robot: Done!\n");
  return NULL; 
}
 
int main() {
  pthread_t t1; pthread_t t2; int err;
  
  err = pthread_mutex_init(&m, NULL); assert(err == 0); 
  err = pthread_cond_init(&cv, NULL) ; assert(err == 0)
  
  err = pthread_create(&t1, NULL, me, NULL); assert(err == 0);
  err = pthread_create(&t2, NULL, teaRobot, NULL); assert(err == 0);
  
  err = pthread_join(t1, NULL); assert(err == 0);
  err = pthread_join(t2, NULL); assert(err == 0);
  
  pthread_cond_destroy(&cv) ; pthread_mutex_destroy(&m); 
}
 
Bounded Buffer Example

MAIN

struct BoundedBuffer {
  int start; int end; int size; int* buffer;
  pthread_mutex_t m;
  pthread_cond_t add;
  pthread_cond_t remove; 
};
 
typedef struct BoundedBuffer BoundedBuffer;
 
BoundedBuffer * createBoundedBuffer(int size) { ... }
void destroyBoundedBuffer(BoundedBuffer * bb) { ... }
void addItem(BoundedBuffer * bb, int item) { ... }
int removeItem(BoundedBuffer * bb) { ... }
void * producer(void * arg) { ... }
void * consumer(void * arg) { ... }
 
int main() {
	pthread_t t1; pthread_t t2; int err;
	BoundedBuffer * bb = createBoundedBuffer(4);
	
	err = pthread_create(&t1, NULL, consumer, bb); assert(err == 0);
	err = pthread_create(&t2, NULL, producer, bb); assert(err == 0);
	
	err = pthread_join(t1, NULL); assert(err == 0);
	err = pthread_join(t2, NULL); assert(err == 0);
}

PRODUCER

void * producer(void * arg) {
  BoundedBuffer * bb = (BoundedBuffer*)arg;
  for (int i = 0; i < 10; i++) {
     int item = randInt();
     printf("produced item %d\n", item);
     addItem(bb, item);
     usleep(randInt());
  }  
  return NULL;
}

CONSUMER

void * consumer(void * arg) {
  BoundedBuffer * bb = (BoundedBuffer*)arg;
  for (int i = 0; i < 10; i++) {
    int item = removeItem(bb);
    printf(" consumed item %d\n", item);
    usleep(randInt());
  }
  return NULL;
}

AddItem

void addItem(BoundedBuffer * bb, int item) {
  If (!bb) return;
  pthread_mutex_lock(&bb->m);
  while (bb->start == bb->end) { // buffer is full
     printf("== Buffer is full ==\n");
     pthread_cond_wait(&bb->add, &bb->m);
  }
  // buffer is no longer full
  bb->buffer[bb->start] = item;
  bb->start = (bb->start + 1) % bb->size; // move start one forward
  pthread_mutex_unlock(&bb->m);
  pthread_cond_signal(&bb->remove);
}

RemoveItem

int removeItem(BoundedBuffer * bb) {
  if (!bb) assert(0);
  pthread_mutex_lock(&bb->m);
  while ( ((bb->end + 1) % bb->size) == bb->start ) { // buffer is  empty
    printf("== Buffer is empty ==\n");
    pthread_cond_wait(&bb->remove, &bb->m);
  }
  // buffer is no longer empty
  bb->end = (bb->end + 1) % bb->size; // move end one forward
  int item = bb->buffer[bb->end];
  pthread_mutex_unlock(&bb->m);
  pthread_cond_signal(&bb->add);
  return item;
}

Coordination in Concurrent Programming

Partitioning

Determining what parts of the computation should be separately evaluated, e.g. a thread to serve each request, to render each frame of a film.

Data Sharing

What data to share between threads, and when to acquire it e.g. film rendering threads may hold 2 frames: 1 being processed and another ready to go as soon as the current frame is complete

Synchronisation

ensuring threads can cooperate without interference, e.g. if threads representing 2 players compete to get a single resource then only one succeeds.

Semaphore

Imagine you’re at a library with a limited number of study rooms. To manage these rooms so that no more than the available rooms are used at one time, the librarian uses a counter or a tracking system. This system ensures that everyone gets a turn without overcrowding. A semaphore operates similarly in computer systems—it’s a tool (more specifically, a variable) used to control access to a common resource by multiple processes in a concurrent system, such as a multitasking operating system.

Types of Semaphores

There are mainly two types of semaphores:

  1. Counting Semaphore
  2. Binary Semaphore (or Mutex)

Counting Semaphore

A counting semaphore can hold any non-negative value and is used to represent the number of resources available. Here’s how it works:

  • Initialize: You set a semaphore to the number of identical resources available (like the number of free study rooms).
  • Wait (or P operation): When a process needs to use a resource, it performs a wait operation. This decrements the semaphore’s value. If the value is positive, the resource is allocated to the process. If the value is zero (meaning no resources are free), the process has to wait until a resource becomes free.
  • Signal (or V operation): When a process is finished using the resource, it performs a signal operation, incrementing the semaphore’s value, signaling that a resource has become free.

For example, if there are 5 identical printers, the semaphore would be initialized to 5. Each time a process needs a printer, the semaphore is decremented. When a printer is released, the semaphore is incremented.

Binary Semaphore (Mutex)

A binary semaphore is simpler, as it can only be 0 or 1. It’s mainly used to ensure that only one process can access the resource at a time (mutual exclusion).

  • Initialize: Typically initialized to 1, indicating that the resource is available.
  • Wait: If a process wants the resource, it attempts to perform a wait operation. If the semaphore value is 1, the semaphore is set to 0 and the process uses the resource. If it’s already 0, the process must wait.
  • Signal: When the process no longer needs the resource, it performs a signal operation, setting the semaphore back to 1.

This type of semaphore acts like a key to a single room. If you have the key (semaphore is 1), you can enter the room. If someone else has the key (semaphore is 0), you must wait outside.

Bounded Buffer Example

struct BoundedBuffer { int start; int end; int size; int* buffer; };
typedef struct BoundedBuffer BoundedBuffer;
 
sem_t fillCount; // data in the buffer
sem_t emptyCount; // free space in the buffer
 
BoundedBuffer * createBoundedBuffer(int size) { ... }
void destroyBoundedBuffer(BoundedBuffer * bb) { ... }
void addItem(BoundedBuffer * bb, int item) { ... }
int removeItem(BoundedBuffer * bb) { ... }
void * producer(void * arg) { ... }
void * consumer(void * arg) { ... }
 
int main() {
  pthread_t t1; pthread_t t2; int err;
  BoundedBuffer * bb = createBoundedBuffer(4);
  fillCount = sem_create(0, 4); // no data in the buffer yet
  emptyCount = sem_create(4, 4); // all spaces in the buffer are free
  err = pthread_create(&t1, NULL, consumer, bb); assert(err == 0);
  err = pthread_create(&t2, NULL, producer, bb); assert(err == 0);
  err = pthread_join(t1, NULL); assert(err == 0);
}
 

Producer

void * producer(void * arg) {
  BoundedBuffer * bb = (BoundedBuffer*)arg;
  for (int i = 0; i < 10; i++) {
    sem_wait(emptyCount);
    int item = randInt();
    printf("produced item %d\n", item);
    addItem(bb, item);
    sem_signal(fillCount);
    usleep(randInt());
  }
  return NULL;
}  

Consumer

void * producer(void * arg) {
  BoundedBuffer * bb = (BoundedBuffer*)arg;
  for (int i = 0; i < 10; i++) {
    sem_wait(emptyCount);
    int item = randInt();
    printf("produced item %d\n", item);
    addItem(bb, item);
    sem_signal(fillCount);
    usleep(randInt());
  }
  return NULL;
}  

REMOVEITEM

int removeItem(BoundedBuffer * bb) {
  if (!bb) assert(0);
  sem_wait(emptyCount);
  // buffer is no longer empty
  bb->end = (bb->end + 1) % bb->size; // move end one forward
  int item = bb->buffer[bb->end];
  sem_singal(fillCount);
  return item;
}

Note how much simpler this is than cond. vars

Lambda Expression / Anon Functions

Example:

We have seen a function pointer used to pass a function as an argument, e.g. to create threads with pthread:

 
  void * PrintHelloWorld(void* arg) { printf("Hello World\n"); }
  int main() {
    // ...
    int error = pthread_create(&thread, NULL, PrintHelloWorld, NULL);
                                       // ... ^ this is passed as a function pointer
  }

By writing a lambda expression we can write a function without naming it:

int main() { auto t = std::thread( [](){ printf("Hello World\n"); } ); t.join(); }

[](){ printf("Hello World\n"); is the lambda

Other names for lambdas are: function literal or anonymous functions

Lambda expressions in C++ are written as:

  [ /*captures*/ ] ( /*params*/ ) { /*body*/ }

The captures lists variables from the surrounding scope that are passed to the lambda when it is created

The params list the parameters that are passed to the lambda when it is called

The body is the function body that executes when the lambda is called

As for function parameters variables are captured by-value i.e. they are copied

We can share access to a variable by capturing a pointer to it using this notation:

  int main() {
    auto l = list{}; l.append_to_list('a'); l.append_to_list('b’);
    auto t1 = std::thread([l_ptr = &l] (){ l_ptr->remove_from_list(1); });
  t1.join();
  }

Calling a standard pthread, our function pointer is passed in as a void* which means they lose their type information, however using captures we can mantain this information

Async Call

auto f6 = std::async([] { return fib(6); });

This is same as:

std::future<int> f6 = std::async([] { return fib(6); });
printf("fib(6) = %d (computed asynchronously)\n", f6.get() );

Futures and Promises:

  • Futures: A future provides a mechanism to access the result of asynchronous operations. It acts as a handle to a result that is initially unknown but will be provided once the asynchronous operation completes.
  • Promises: A promise is an object that can deliver a value to a corresponding future. A promise can set the value once, and the future can read the value, potentially in another thread.
Simplification with Futures and Promises:
  • Futures and promises decouple the production of a value from its consumption, which simplifies code by eliminating the need for explicit thread management. This abstraction allows developers to focus on the computation’s logic rather than on synchronization details like locking mechanisms and condition variables, leading to code that is easier to write, read, and maintain.

We can see a future as the reading end of a communication channel: some_future.get() extracts the value

The writing end of the channel is called a promise: some_promise.set_value(42);

We can obtain a std::future object from a std::promise:

std::future<int> some_future = some_promise.get_future();

Promise Example

// This sum function writes to the sum_promise
void sum(std::vector<int>::iterator begin, 
                 std::vector<int>::iterator end, 
                 std::promise<int> sum_promise) 
{
  int sum = std::accumulate(begin, end, 0);
  sum_promise.set_value(sum); // 4. write result
}
 
int main() {
  auto numbers = std::vector<int>{ 1, 2, 3, 4, 5, 6 };
  std::promise<int> sum_promise; // 1. create promise for an int
  std::future<int> sum_future = sum_promise.get_future(); // 2. get future from promise
  // 3. create thread that takes ownership of the promise (ownership transfer with std::move)
  auto t = std::thread(sum, numbers.begin(), numbers.end(), std::move(sum_promise) );
  printf("result = %d\n", sum_future.get() ); // 4. wait and then read result
  t.join();
} 
 

Synchronisation

We can achieve this with

  • an std::promise<void>, a promise to produce nothing (but say when you’re done)
  • the std::future<T>::wait() method
void do_work(std::promise<void> barrier) {
  std::this_thread::sleep_for(std::chrono::seconds(1)); // do something (like sleeping)
  barrier.set_value(); // 4. send signal to other thread
}
int main() {
  std::promise<void> barrier; // 1. create promise
  std::future<void> barrier_future = barrier.get_future(); // 2. get future from it
  auto t = std::thread(do_work, std::move(barrier) ); // 3. launch thread
  barrier_future.wait(); // 4. wait for signal
     …                                  // 5. continue execution: we know that do_work has completed
  t.join(); }
 

Futures Example

int main() {
   auto fs = std::vector<std::future<int>> ();
      // launch 10 asynchronous tasks to compute fibonacci numbers
   for (int i = 0; i < 10; i++) { 
	   fs.push_back(std::async([] {
								    return fib(i+1);
								   })); 
   }
      // ... do some other work; then print the computed numbers
    for (int i = 0; i < 10; i++) {
	     printf("fib(%d) = %d\n", i+1, fs[i].get()); 
     }
  }
 

Call future.get() to retrieve the value. Blocks calling thread until the value has been computed

Parallel Sum

BAD EXAMPLE

int parallelSum(std::vector<int>::iterator begin, std::vector<int>::iterator end) {
  auto len = end - begin;
  
  // compute sequentially for small arrays
  if (len < 1000) { return std::accumulate(begin, end, 0); }
  auto mid = begin + len/2;
                 
  // launch asynchronous task for the left half of the array
  auto left_side = std::async([=] {
								   return parallelSum(begin, mid);
								    });
                           
  // compute right half of array recursively
  int right_side = parallelSum(mid, end);
                         
  // block to wait for left side to finish
  return left_side.get() + right_side;
}
 
int main() {
  std::vector<int> vec = createLargeVector();
  auto sum = parallelSum(vec.begin(), vec.end());
  printf("sum: %d\n", sum);
}
int parallelSum(std::vector<int>::iterator begin, std::vector<int>::iterator end, int depth = 0 )    
{
  auto len = end - begin;
  if (len < 1000 || depth > 3 ) { return std::accumulate(begin, end, 0); }
  auto mid = begin + len/2;
  
  auto left_side = std::async([=] {
								   return parallelSum(begin, mid, depth + 1 );
								  });
 
  int right_side = parallelSum(mid, end, depth + 1 );
  return left_side.get() + right_side;
}
 

The naïve parallel sum creates too many threads, and the time managing all of these threads outweighs the time saved by running the threads

If we have a small number of cores (say 8) we can fix this, by only spawning threads for the first few recursive calls:

Task Packaging

int main() {
  auto task = std::packaged_task<int(int,int)>([](int a, int b) {
															   return pow(a, b); 
																});
 
  std::future<int> result = task.get_future() ;
 
    // The task can now be stored, or passed as a parameter. 
    // When we are ready to use it either
    // launch task in the same thread via:
    // task(2, 10);
    // or start a new thread:
  auto t = std::thread(std::move(task) , 2, 10);
  t.join();
  printf("task result: %d\n", result.get() );
}