The Linux kernel schedules threads not processes The scheduler maintains a queue for each priority level
Priority can be modified by changing the “niceness” of a process
Two main philosophies:
Scheduling for Responsiveness
- When OS performs multi-tasking, it is actually sharing the CPU between different processes - giving each process in turn a slice of CPU time Scheduling for Performance
- switch to the next process when the current process is waiting for I/O
Scheduling Criteria
How would we know if we make the right decisions when designing a scheduling policy? Take a moment to think about this.
The easiest way is to measure the policy performance against scheduling criteria. The criteria are designed to let us understand the performance of a policy and the overall performance of the OS. The criteria are
-
CPU utilization: the percentage of CPU clock cycles that the CPU is using to execute tasks. Ideally the CPU should be busy 100% of the time (cf. top command)
-
Throughput: The number of processes completed per unit time (not cycles!). The goal is to maximize this number so that the user does not experience delays.
-
Load average: The average number of processes in the ready queue. The goal is to minimize this number so that there are not too many tasks waiting for a CPU to become available to avoid user experience issues.
-
Turnaround time: The elapsed time required for a particular task to complete from birth to death (typically not in cycles)
-
Waiting time: The time spent by a task in the ready queue
-
Response time: The time taken between submitting a request and obtaining a response
Scheduling Policies
The policies we will discuss are the simple FCFS and Round-robin, the Priority driven SJF, the Priority scheduling and the Real-time Scheduling.
The Simple policies have the smallest overhead as they are easy to implement. The priority driven scheduling starts considering some of the process attributes. The priority scheduling simply takes the explicitly priority attribute as the main deciding factor. The real time scheduling aims to support applications such as the auto-pilot of an airplane where a process must execute within strict time limits or the security of the airplane might be compromised.
Simple
Priority-driven scheduling
Priority scheduling
Starvation
Some of the policies like priority based can also cause unfair treatment for tasks. For example, some low priority task can always be placed at the tail of the Ready queue because of new high priority tasks arriving. These low priority tasks will never move to the Running state.
We call this Starvation. A solution is to add an Aging attribute to the task and make the policy consider this attribute.
Linux Completely Fair Scheduler (CFS)
For implementing normal scheduling policies (non real-time) All normal scheduling policies in the Linux kernel (SCHED_OTHER, SCHED_IDLE, and SCHED_BATCH) are implemented as part of what is known as the “Completely Fair Scheduler” (CFS).
In other words, the CFS attempts to balance the virtual runtime over all tasks. The CFS scheduler run queue is a priority queue with the task with the smallest virtual runtime at the head of the queue. The CFS algorithm computes the duration of the next time slice for this task based on the priorities of all tasks in the queue and runs it.
CFS basically models an “ideal, precise multitasking CPU” on real hardware. Red-black tree used to model priority queues CFS models an ideal precise multitasking CPU for N runnable processes, each should get a 1/N time slice, scaled according to priority CFS attempts to balance the virtual runtime over all the tasks CFS tracks how long a task has run on the processor The lower a task’s runtime, the more deserving the task is for time on the processor task with smallest virtual runtime is at head of priority queue
Kernel uses an RB tree to implement its priority queue Example use:
- The deadline and CFQ I/O schedulers use RB trees to track requests
- The CD/DVD driver packet does the same
- The high-resolution timer code uses an RB tree to organise time requests-
- The ext3 filesystem tracks directory entries using an RB tree
- VMAs are tracked using an RB tree, as are epol file descriptors, cryptographic keys and network packets in the scheduler
Kernel Preemption Models
No Forced Pre-emption (Server) Maximum throughput Pre-emption points include system call returns and interrupts Voluntary Kernel Pre-emption (Desktop) Reduced latency Adds explicit pre-emption points Preemptible Kernel (Low-Latency Desktop) Further reduced latency Most kernel code is pre-emptible Pre-emptible Kernel (Basic Real Time) Forces threaded interrupt handlers Test/debug PREEMPT_RT patch Full Pre-emptible Kernel (Real Time) Most kernel code pre-emptible Forces threaded interrupt handlers Locking and substitution used to reduce pre-emption disabled sections
7 State Model Different Scheduling
Long Term/Job Scheduling
General Processes being moved from secondary memory to “primary” memory
Medium Term Scheduler
Exists inside the OS, and decides between applicaitons (process clusters) Helps send the process back to memory 7 State Model :
// 7 State Model Image
Short Term Scheduler (Dispatcher)
Thread Management, selecting only READY processes for execution Context Switching **
dump scheduling
Linux scheduling commands and API For normal processes:
•For a process to be executed:
•`nice –n 12 command`
•For a currently running process:
•`renice –n 15 –p pid`
•Note that sudo is required to set niceness < 0
For real-time processes:
•`chrt` used to retrieve/set scheduling attributes
•Flags are used to set scheduling policy
•Note that sudo is required for all real-time processes
•`sudo chrt --rr 32 pwd`
Linux RB tree implementation is optimised for speed and better cache locality
Each instance of the struct rb_node is embedded in the data structure it organizes instead of the traditional approach of using pointers
Users of the API are expected to write their own tree search and insert functions which call the provided rbtree functions
Lock management is also left up to the user of the API