Introduction
Did you know that 80% of production outages can be traced back to misconfigured or under-optimized Linux systems? Site Reliability Engineers (SREs) are constantly challenged to keep systems running optimally under high workloads, making Linux performance tuning an essential skill. In this guide, you’ll discover powerful, practical techniques to proactively optimize your Linux systems, enhancing reliability, performance, and operational efficiency.
Step-by-Step Linux Optimization Guide
Step 1: Adjust Swappiness for Optimal Memory Management
Check current swappiness:
cat /proc/sys/vm/swappinessSet recommended swappiness value:
sudo sysctl vm.swappiness=10Step 2: Increase File Descriptor Limits
Check current limits:
ulimit -nUpdate limits:
echo '* soft nofile 65535' | sudo tee -a /etc/security/limits.conf
echo '* hard nofile 65535' | sudo tee -a /etc/security/limits.confStep 3: Resource Isolation with cgroups
Create a memory cgroup:
sudo cgcreate -g memory:/critical_service
echo $((1024*1024*1024)) | sudo tee /sys/fs/cgroup/memory/critical_service/memory.limit_in_bytesStep 4: Networking Optimization
Adjust TCP parameters:
sudo sysctl -w net.ipv4.tcp_tw_reuse=1
sudo sysctl -w net.core.somaxconn=1024Step 5: Select Appropriate I/O Scheduler
Check current scheduler:
cat /sys/block/sda/queue/schedulerSet deadline scheduler:
echo 'deadline' | sudo tee /sys/block/sda/queue/schedulerStep 6: Real-time Diagnostics with perf
Monitor kernel-level events:
sudo perf topStep 7: Disable Transparent Huge Pages (THP)
Check THP status:
cat /sys/kernel/mm/transparent_hugepage/enabledDisable THP:
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabledStep 8: Enable HugePages
Configure HugePages:
sudo sysctl vm.nr_hugepages=1024Step 9: Tweak Cache Behavior
Adjust dirty ratios:
sudo sysctl -w vm.dirty_ratio=15
sudo sysctl -w vm.dirty_background_ratio=5Step 10: Optimize IRQ Balancing
Install and configure irqbalance:
sudo apt-get install irqbalance
sudo systemctl enable irqbalance
sudo systemctl start irqbalanceStep 11: Network Throughput Optimization
Adjust network backlog:
sudo sysctl -w net.core.netdev_max_backlog=5000Step 12: Manage TCP SYN Backlog
Increase SYN backlog:
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=2048Step 13: TCP Connection Timeout
Reduce FIN timeout:
sudo sysctl -w net.ipv4.tcp_fin_timeout=15Step 14: Optimize TCP Buffer Sizes
Set TCP buffer sizes:
sudo sysctl -w net.core.rmem_max=16777216
sudo sysctl -w net.core.wmem_max=16777216Step 15: Apply tuned-adm Profiles
Install and apply profiles:
sudo apt-get install tuned
sudo tuned-adm profile throughput-performanceStep 16: Scheduler Tunables
Optimize scheduler responsiveness:
sudo sysctl -w kernel.sched_autogroup_enabled=1Step 17: Implement zswap
Enable zswap:
sudo sysctl -w vm.zswap.enabled=1Step 18: SSD Optimization with udev
Create udev rule for SSD:
sudo echo 'ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"' | sudo tee /etc/udev/rules.d/60-ssd.rulesStep 19: Kernel Samepage Merging (KSM)
Enable KSM:
echo 1 | sudo tee /sys/kernel/mm/ksm/runStep 20: Regular fstrim
Schedule fstrim:
sudo systemctl enable fstrim.timer
sudo systemctl start fstrim.timerStep 21: CPU Governor Adjustment
Set performance governor:
sudo apt-get install cpufrequtils
sudo cpufreq-set -g performanceAutomating Performance Tuning
Consistency in configuration across systems is crucial. Automate using tools like Ansible or Chef.
Example Ansible Playbook for Performance Tuning
- hosts: all
tasks:
- name: Set vm.swappiness
sysctl:
name: vm.swappiness
value: '10'
state: present
reload: yes
- name: Increase file descriptor limits
lineinfile:
path: /etc/security/limits.conf
line: '* soft nofile 65535'
create: yesActionable Takeaways: Your Tuning Checklist
- Adjust kernel parameters (swappiness, file descriptors)
- Implement cgroups for resource isolation
- Optimize networking and TCP stack
- Choose appropriate I/O schedulers
- Automate tuning tasks with Ansible or Chef
- Monitor continuously using tools like
perf - Apply additional advanced optimization techniques listed above
By implementing these Linux performance tuning techniques step-by-step, youβre empowering your infrastructure to handle peak loads, ensuring optimal uptime and reliability.
Related operator notes
- Customer Reliability Engineering: make customer pain operational
- Blameless culture in SRE: accountability without scapegoats
- KISS for SRE: shrink the state space
- Lessons learned that actually change systems
Continue Reading
π¨How AI is changing incident response: intelligent triage, automated Runbooks, LLM-powered postmortems, and on-call health.
Metrics, distributed tracing, structured logs, SLOs, and Error Budgets β and how to extend them for AI systems.
Stay Sharp
New articles on AIOps and SRE, straight to your inbox.
Practical content for practitioners. No noise, no vendor pitches.
No spam. Unsubscribe any time.


