Networking in the OS

The network interface controller (NIC, aka network adapter) is a peripheral IO device . Therefore, as with all peripherals, access to this device is controlled via a device driver which must be part of the kernel.

However there is also a kernel implementation of the communication protocols used by this driver and it acts as a middle man between the driver received data and the application waiting for this data.

Performance gain is not the most important gain here. The main reason for a TCP/IP implementation and networking in the kernel are that: 1. we take advantage of the OS preemptive scheduling policies to provide access to multiple processes that can be clients and/or servers at the same time, 2. the kernel can control incoming and outgoing data which is more important here as data can be coming from many different sources outside the system!

Communication networks have traditionally been represented as layered models - OSI Model

They are called “host” layers because their functionality is implemented — at least in principle — solely by the host systems, and the intermediate systems in the network doesn’t need to implement these layers. The functionality of the media layers is typically implemented in the network adapter.

The Linux Networking Stack

The Linux kernel provides the link layer, network layer, and transport layer. The link layer is implemented through POSIX-compliant device drivers; the network and transport layers (TCP (TCP-IP)) are implemented in the kernel code.

The physical network devices (NIC) are managed by device drivers. The device driver is a software interface between the kernel and the device hardware. On the kernel side, it uses a low-level but standardized API so that any driver for a different NIC can be used in the same way.

The normal file operations (read, write, …) do not make sense when applied to the interaction between a driver and a NIC, so they do not follow the “everything is a file” philosophy. The main difference is that a file, and by extension a file storage device is passive, whereas a network device actively wants to push incoming packets toward the kernel. So NIC interrupts are not a result of a previous kernel action (as is the case with, e.g. file operations), but of the arrival of a packet. Consequently, network interfaces exist in their own namespace with a different API.

The network layer connects various protocols to a variety of hardware device drivers and calls work on a packet-by-packet basis so that it is not necessary to inspect the packet content or keep protocol-specific state information at this level.

Packets get handed over the actual network protocol functionality in the kernel. The TCP/IP protocol, is known in the Linux kernel as inet. This is a whole suite of protocols, the best-known of which are Internet Protocol, TCP, and UDP.

The network protocols interface with a protocol-agnostic layer that provides a set of common functions to support a variety of different protocols. This layer is called the sockets layer, and it supports not only the common TCP and UDP transport protocols but also the IP routing protocol, various Ethernet protocols, and others,

The socket interface is an abstraction for the network connection. The socket data structure contains all of the required state of a particular socket, including the particular protocol used by the socket and the operations that may be performed on it.

Diagram

Sockets

Socket Buffers

A consequence of having many layers of network protocols, each one using the services of another, is that each protocol needs to add protocol headers (and/or footers) to the data as it is transmitted and to remove them as packets are received. This could make passing data buffers between the protocol layers difficult as each layer would need to find where its particular protocol headers and footers are located within the buffer. Copying buffers between layers would, of course, work, but it would be very inefficient.

Instead, the Linux kernel uses socket buffers (a.k.a. sk_buffs) to pass data between the protocol layers and the network device drivers. Socket buffers contain pointer and length fields that allow each protocol layer to manipulate the application data via standard functions.

Essentially, an sk_buff combines a control structure with a block of memory plus routines to manipulate doubly linked lists of sk_buffs; functions for controlling the attached memory

POSIX Socket Interface Library

If we focus on IPv4 TCP packets: All incoming IP network layer packets marked with the relevant TCP identifier in the IP protocol ID header field are passed upwards to TCP, and all outgoing TCP packets are passed down to the IP layer for sending. In turn, TCP is responsible for identifying the (16-bit) port number from the TCP packet header and forwarding the TCP packet payload to any active socket associated with the specified port number.

TCP is reliable and connection-oriented and as such employs various handshaking activities in the background between the TCP layers in the communicating nodes to handle the setup, reliability control and shutdown of the TCP connection.

The socket API provides a simplified programming model for the TCP to application interface, and the connected stream sockets can be considered as the communication endpoints of a virtual data circuit between two processes.

To establish a socket connection, one of the communicating processes (the server) needs to be actively waiting for a connection on an active socket and the other process (the client) can then request a connection and if successful the connection is made. The sequence of the API calls are in the picture below.

The read() and write() low level I/O library functions are not part of the standard socket library; however stream sockets behave in much the same manner as any other operating system device (standard input/output, file, etc) and low-level system device I/O operations are therefore compatible with stream socket I/O.

A read from a stream socket (using the read() or recv() functions) may not return all of the expected bytes in the first go, and the read operation may need to be repeated an unspecified number of times with the read results concatenated until the full number of expected bytes has been received. If the expected number of bytes is not known in advance, the stream should be read a small block of bytes (possible 1 byte) at a time until the receive count is identified using a data size field within the received data or a predefined data terminator sequence. It is up to the individual internet application to define any data size field syntax and/or data terminators used. Attempting to read more data that has been sent will block the read() or recv() function call which will hang waiting for new data.

Quartz 4

Explorer

Networking in the OS

The Linux Networking Stack

Diagram

Sockets

Socket Buffers

POSIX Socket Interface Library

Graph View

Table of Contents

Backlinks