Synchronisation

Two types of processes

Independent : Not affected by other process
Cooperating : Affected by each other
- Need data generated by the other process (data sharing
- Need to execute after a certain computation by another process (control)
- Split computation to many parallel processes (parallelism)

The method of communication is Synchronisation

Part of the OS
- Implementation depends on OS
- Another reason for compiling programs for particular OS
Mechanism for passing messages between processes
Processes can SEND and RECEIVE messages with data or control commands

Similar to trap commands between process and OS

Primitives

Mutex
Semaphore

Linux kernel synchronisation primitives:

Atomics: read-modify-write (RMW) operations
Memory barriers: impose a perceived partial ordering over the memory operations
Spin locks
Futexes And also special mutexes and semaphores

Mutex

A Boolean variable (locked, unlocked) Asking to lock a locked mutex blocks execution Resume on release Called locks, or binary semaphores

#include <pthread.h> 
pthread_mutex_t lock;
void* trythis(void* arg)
{
 
    pthread_mutex_lock(&lock);
 
    …  //do some work
 
    pthread_mutex_unlock(&lock);
 
}

Semaphore

A counter atomically incremented/decremented

If (--sem & sem <1) block
Block until sem >= 1

Modern processors provide special atomic instructions that allow implementing semaphores efficiently.

#include <pthread.h>
 
#include <semaphore.h> 
 
sem_t lock;
 
void* trythis(void* arg)
 
{
 
    sem_wait (&lock);
 
    …  //do some work
 
    sem_post (&lock);
 
}

Atomics

Read-modify-Write (RMW) operations No other write will occur to that location between the read and the write

Two classes: Operate on atomic_t or atomic64_t data type Operate on bitmaps Unsigned long/ array of unsigned long

#include <stdatomic.h>
atomic_int acnt;
…
    ++acnt;
…

Barriers

Multi-CPU memory access is relaxed - not ordered Strict overall order for threads before_atomic,after_atomic are not the only barriers available.

General barrier

Only serves as an instruction to the compiler to prevent reordering of memory access A general barrier does not affect runtime. It is only a compiler instruction to keep the correct order.

Mandatory barrier

General, write, read Mandatory barriers are most useful when communicating with memory-mapped external peripherals. There are three types, the general, write and read barrier and act as full system memory barriers! Any operation before the barrier must commit its outputs to memory before any operations after the barrier can start execution.

SMP conditional barrier

SMP barriers ensure consistency between CPU cores meaning cache coherency.

Implicit barrier

Use locking constructs to act as implicit SMP barriers

Implicit barriers use other locking constructs to perform the operation of a barrier , often as a loop to spin the program until the memory commitment condition is met.

atomic_fetch_add();
 
// is equivalent to :  
smp_mb before_atomic();  
atomic_fetch_add_relaxed();
 
smp_mb after_atomic();

Locks

Simplest form of locking Task goes into a loop doing nothing until it acquires a lock Occupies the CPU while waiting (busy wait) Should be put to sleep if there are other tasks using the CPU

while (! has_lock ) {
   // try to get the lock
}

Futexes

Fast user-space mutex Linux specific Optimised for performance for the case when there is no contention for resources

Occurs entirely in user space for the non-contended case Kernel is only involved to handle the contended case

Lightweight and scalable Semaphore semantics

Kernel Mutexes

Exclusive to the kernel Three-state atomic counter Unlocked 1 Locked no waiters 0 Locked with waiters < 0 Spin lock with wait-queue Three paths wen acquiring mutex

Fastpath Tries to atomically acquire the lock by decrementing the counter. If it was already taken by another task, it goes to the next possible path. This logic is architecture-specific but typically requires only a few instructions.
Midpath aka optimistic spinning, tries to spin for acquisition while the lock owner is running, and there are no other tasks ready to run that have higher priority (need_resched). The rationale is that if the lock owner is running, it is likely to release the lock soon.
Slowpath if the lock is still unable to be acquired, the task is added to the wait queue and sleeps until woken up by the unlock path. Under normal circumstances, it blocks as TASK_- UNINTERRUPTIBLE.

Core observation: busy-waiting for a few cycles has a lower overhead than putting a task on the wait queue

Kernel Semaphores

Sempaphores are also locks with blocking wait (sleep), they are a generalized version of mutexes. Where a mutex can only have values 0 or 1, a semaphore can hold
an integer count, i.e., a semaphore may be acquired count times before sleeping. If the count is zero, there may be tasks waiting on the wait_list. The spin lock controls access to the other members of the semaphore.

Unlike the mutex above, the semaphore always sleeps. Incrementing and decrementing the counter follows the generic semaphore rules but there is also a waiters queue like in kernel mutexes.

POSIX primitives

As Kernel Mutex & Semaphores happen in kernel space, there is an interface in user space to use them:

#include <pthread.h> 
 
pthread_mutex_t lock; = PTHREAD_MUTEX_INITIALIZER;  
pthread_mutex_init(&lock);
 
pthread_mutex_lock(&lock);
 
pthread_mutex_unlock(&lock);
 
pthread_mutex_destroy(&lock);
 
#include <semaphore.h>
 
//unnamed version
 
int sem_init(sem_t *sem, int pshared, unsigned int value);
 
int sem_destroy(sem_t *sem);
 
  
int sem_post(sem_t *sem);  //release
 
int sem_wait(sem_t *sem); //acquire  
// Variants  
int sem_trywait(sem_t *sem);
 
int sem_timedwait(sem_t *sem, const struct timespec *abs_timeout);

Semaphores

Named: have a name id Two processes can operate under the same name
Unnamed: memory-based Memory is shared between threads/processes

Condition variables

Condition on the actual value of data
Wait until condition is satisfied without polling
Always used together with a mutex lock

Conditional variables can also have a signal or a broadcast sent out to one or all threads waiting for a condition to be met.

Quartz 4

Explorer