Skip to main content

Resource Control & Isolation in Linux

Resource Control & Isolation

Linux offers a robust suite of resource control and isolation mechanisms to manage and limit how processes and users utilize system resources. These mechanisms ensure fair resource allocation, prevent resource starvation, and enhance security by isolating processes from each other. Here's a breakdown of some key mechanisms:

Control Group (cgroups)

Cgroups is used to organize processes and allocate resource to it. This is the foundation of common technology "containerization" like docker. It main function though is to limit & monitor (CPU, memory, I/O, network) for groups of processes. Provide resource accounting and prioritization.

# Limiting memory usage for a process group (cgroups v2)

# Create a cgroup for memory
sudo mkdir /sys/fs/cgroup/memory/my_group

# Set a memory limit of 200MB
sudo echo 200M > /sys/fs/cgroup/memory/my_group/memory.max

# Get the PID of a process you want to limit
PID=$(pgrep my_process)

# Add the process to the cgroup
sudo echo $PID > /sys/fs/cgroup/memory/my_group/cgroup.procs

Namespaces

Isolate processes from each other by providing separate instances of system resources. Often used in conjunction with cgroups to create containers.

# pid
Process ID isolation. Processes in different PID namespaces have different process IDs.

# net
Network isolation. Each namespace has its own network interfaces, routing tables, etc.

# mnt
Mount point isolation. Allows different namespaces to have different filesystem views.
# ipc
Inter-process communication isolation. Shared memory and semaphores are namespaced.
# uts
UTS (Unix Timesharing System) isolation. Hostname and domain name are per-namespace.
# user
User and group ID isolation. A process can have different user IDs inside and outside a user namespace. This is crucial for container security, allowing processes to run as root inside the container without having root privileges on the host.

Resource Limit (ulimit/prlimit)

Control resource usage for individual users & processes. often used to limit resources like core file size, data segment size, CPU time, open files, etc.

# Limit the maximum number of open files for the current shell and its children
ulimit -n 2048

# Limit the maximum resident set size (memory) for a specific process
prlimit --pid <PID> --rss=500000000 # 500MB

Secure Computing Mode (seccomp):

Restrict the system calls a process can make. Enhances security by reducing the attack surface. If a process attempts a disallowed system call, it is terminated.

#include <linux/seccomp.h>
#include <unistd.h>
#include <sys/prctl.h>
#include <stdio.h>

int main() {
// Set up a very restrictive seccomp filter (allow only read, write, exit)
scmp_filter_ctx ctx;
ctx = seccomp_init(SCMP_ACT_KILL); // Default action: kill process
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(read), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(write), 0);
seccomp_rule_add(ctx, SCMP_ACT_ALLOW, SCMP_SYS(exit), 0);
seccomp_load(ctx);

printf("Hello, restricted world!\n"); // This will work
// Any other system call will terminate the process

return 0;
}

Linux Security Modules (LSM - Security Enhancement & Process Isolation)

Framework for implementing mandatory access control (MAC) policies. such as SELinux, AppArmor. later it will be used to restrict access to resources based on predefined security policies.

# /etc/apparmor.d/usr.bin.my_program (AppArmor profile)
#include <tunables/global>

/usr/bin/my_program {
# Allow reading and writing to /tmp/my_data
/tmp/my_data rw,

# Deny access to everything else
deny /** rwm,
}


# Load the profile
sudo apparmor_parser -r /etc/apparmor.d/usr.bin.my_program

# Check the status
sudo aa-status

Kernal Space & User Space

The difference between user space and kernel space is a fundamental concept in operating system architecture, including Linux. It defines a separation of memory and execution privileges for the sake of stability and security

User Space
  1. Where User Programs Run: This is the memory region where regular applications, user processes, and system libraries (like glibc) execute.
  2. Restricted Privileges: Processes running in user space have limited access to system hardware and resources. They cannot directly access hardware like the CPU, memory, or disk drives. They need to make system calls to request the kernel to perform these privileged operations on their behalf.
  3. Memory Protection: Each user-space process operates within its own virtual memory space, protected from other processes. A bug in one user-space program is less likely to affect other processes or the stability of the kernel.
  4. Portability: User-space code is more portable because it doesn't directly interact with hardware.
Kernel Space
  1. The Core of the OS: This is where the kernel resides and executes. The kernel is the heart of the operating system, responsible for managing the system's resources.
  2. Unrestricted Privileges: The kernel has complete and unrestricted access to the system's hardware and resources. It can directly manipulate memory, control the CPU, and access peripheral devices.
  3. System Call Interface: User-space processes interact with the kernel through system calls. A system call is a request from a user-space process to the kernel to perform a specific action, such as reading from a file, allocating memory, or sending network packets.
  4. Stability and Security: The kernel is responsible for ensuring the stability and security of the system. By isolating user programs from direct hardware access, the kernel prevents them from interfering with each other or causing system crashes.
Key Differences
  1. Privileges: User space has limited access to system resources; kernel space has full access.
  2. Memory Access: User space processes have protected virtual memory; the kernel has direct access to all physical memory.
  3. Hardware Interaction: User space interacts with hardware indirectly via system calls; the kernel interacts directly.
  4. Stability: User space errors are isolated; kernel space errors can crash the system.
  5. Portability: User space code is more portable; kernel code is hardware-dependent.