Linux Performance Tuning: Proven Techniques Every SRE Must Master

Introduction

Did you know that 80% of production outages can be traced back to misconfigured or under-optimized Linux systems? Site Reliability Engineers (SREs) are constantly challenged to keep systems running optimally under high workloads, making Linux performance tuning an essential skill. In this guide, you’ll discover powerful, practical techniques to proactively optimize your Linux systems, enhancing reliability, performance, and operational efficiency.

Step-by-Step Linux Optimization Guide

Step 1: Adjust Swappiness for Optimal Memory Management

Check current swappiness:

cat /proc/sys/vm/swappiness

Set recommended swappiness value:

sudo sysctl vm.swappiness=10

Step 2: Increase File Descriptor Limits

Check current limits:

ulimit -n

Update limits:

echo '* soft nofile 65535' | sudo tee -a /etc/security/limits.conf
echo '* hard nofile 65535' | sudo tee -a /etc/security/limits.conf

Step 3: Resource Isolation with cgroups

Create a memory cgroup:

sudo cgcreate -g memory:/critical_service
echo $((1024*1024*1024)) | sudo tee /sys/fs/cgroup/memory/critical_service/memory.limit_in_bytes

Step 4: Networking Optimization

Adjust TCP parameters:

sudo sysctl -w net.ipv4.tcp_tw_reuse=1
sudo sysctl -w net.core.somaxconn=1024

Step 5: Select Appropriate I/O Scheduler

Check current scheduler:

cat /sys/block/sda/queue/scheduler

Set deadline scheduler:

echo 'deadline' | sudo tee /sys/block/sda/queue/scheduler

Step 6: Real-time Diagnostics with perf

Monitor kernel-level events:

sudo perf top

Step 7: Disable Transparent Huge Pages (THP)

Check THP status:

cat /sys/kernel/mm/transparent_hugepage/enabled

Disable THP:

echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

Step 8: Enable HugePages

Configure HugePages:

sudo sysctl vm.nr_hugepages=1024

Step 9: Tweak Cache Behavior

Adjust dirty ratios:

sudo sysctl -w vm.dirty_ratio=15
sudo sysctl -w vm.dirty_background_ratio=5

Step 10: Optimize IRQ Balancing

Install and configure irqbalance:

sudo apt-get install irqbalance
sudo systemctl enable irqbalance
sudo systemctl start irqbalance

Step 11: Network Throughput Optimization

Adjust network backlog:

sudo sysctl -w net.core.netdev_max_backlog=5000

Step 12: Manage TCP SYN Backlog

Increase SYN backlog:

sudo sysctl -w net.ipv4.tcp_max_syn_backlog=2048

Step 13: TCP Connection Timeout

Reduce FIN timeout:

sudo sysctl -w net.ipv4.tcp_fin_timeout=15

Step 14: Optimize TCP Buffer Sizes

Set TCP buffer sizes:

sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216

Step 15: Apply tuned-adm Profiles

Install and apply profiles:

sudo apt-get install tuned
sudo tuned-adm profile throughput-performance

Step 16: Scheduler Tunables

Optimize scheduler responsiveness:

sudo sysctl -w kernel.sched_autogroup_enabled=1

Step 17: Implement zswap

Enable zswap:

sudo sysctl -w vm.zswap.enabled=1

Step 18: SSD Optimization with udev

Create udev rule for SSD:

sudo echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' | sudo tee /etc/udev/rules.d/60-ssd.rules

Step 19: Kernel Samepage Merging (KSM)

Enable KSM:

echo 1 | sudo tee /sys/kernel/mm/ksm/run

Step 20: Regular fstrim

Schedule fstrim:

sudo systemctl enable fstrim.timer
sudo systemctl start fstrim.timer

Step 21: CPU Governor Adjustment

Set performance governor:

sudo apt-get install cpufrequtils
sudo cpufreq-set -g performance

Automating Performance Tuning

Consistency in configuration across systems is crucial. Automate using tools like Ansible or Chef.

Example Ansible Playbook for Performance Tuning

- hosts: all
  tasks:
    - name: Set vm.swappiness
      sysctl:
        name: vm.swappiness
        value: '10'
        state: present
        reload: yes

    - name: Increase file descriptor limits
      lineinfile:
        path: /etc/security/limits.conf
        line: '* soft nofile 65535'
        create: yes

Actionable Takeaways: Your Tuning Checklist

Adjust kernel parameters (swappiness, file descriptors)
Implement cgroups for resource isolation
Optimize networking and TCP stack
Choose appropriate I/O schedulers
Automate tuning tasks with Ansible or Chef
Monitor continuously using tools like perf
Apply additional advanced optimization techniques listed above

By implementing these Linux performance tuning techniques step-by-step, you’re empowering your infrastructure to handle peak loads, ensuring optimal uptime and reliability.

Stay Ahead with Exclusive Insights

What's Hot