xpam.pl

Dealing with latency sensitive processes on Linux

We’ve had an issue with players complaining about micro lags on our DotA  bots over the summer for no apparent reason. After lots of debugging and well timed clues from the players, I pinpointed the problem to non-bot processes spiking the CPU and causing this lag as a consequence.

Even a simple thing like logrotate compressing the log files would peg a core to 100% and cause a micro lag which was a surprise on a 4 core server and bots being relatively undemanding, hovering around 1-5% CPU utilization.

There were other culprits causing the CPU spikes, such as php and PvPGN.

While our website visits are relatively low these days, all kinds of (AI) crawlers and spambots are causing constant traffic, indexing the 100th page of the ladder and similar waste like this. To eliminate this source, I had to finally bite the bullet and throw Cloudflare in front of our web services with additional captchas placed in paths sensitive to spam.

The simple answer to all these problems would be to move bots to a dedicated server but since we’re on a tight budget this was not an option.

Another approach would be to try and resolve these spikes at the source but after profiling PvPGN for hours and getting nowhere, identifying lots of SQL queries that needed optimizations… correcting these issues will be a lengthy process while we needed a solution ASAP due to games being unplayable.

For the solution, I decided to pin all the bots to two dedicated CPU cores and increase their nice value. Setting both is much easier through systemd so I also transformed all the bots into parameterized systemd services, a single service file managing N bots.

This is the end result as /etc/systemd/system/la-host@.service


[Unit]
Description=1.26 la-host for instance %i
After=bnetd.service

[Service]
Type=simple
WorkingDirectory=/opt/la/126/hosts
ExecStart=/opt/la/126/hosts/ghost++ cfgs/s%i.cfg
Restart=on-failure
User=la
Group=la
Nice=-5
CPUAffinity=2 3

[Install]
WantedBy=multi-user.target

 

At the same time, I pinned the problematic services to the other two cores by overriding the default service files through .d directories, for example in /etc/systemd/system/php8.3-fpm.service.d/10-affinity.conf

[Service]
CPUAffinity=0 1

This way there will be no conflicts with default service file changes during OS upgrades.

I should note that setting nice value was not enough and it was affinity that finally solved the problem. I kept the nice value anyway since there are other non-pinned processes still using cores 2 and 3 and could cause problems in theory.

Since these changes all complaints stopped and peace is back on the server.

84 Total Views 1 Views Today


Cen
GitHub
Eurobattle.net
Lagabuse.com
Bnetdocs

Posted

in

,

by

Tags: