Server fails with "Too many open files" exceptions and 500K+ file descriptors in CLOSE_WAIT state
Posted: 11 Jun 2019, 01:49
Hello,
We've developed multiplayer game using html5 sdk on client. Server creates room extensions using java.
Production servers started to fail after a week with with "Too many open files" exceptions. When I checked lsof it shows 500k+ file descriptors in CLOSE_WAIT state. All of them are IPv6 TCP connections from different external IPs (for sure related to connected players). Please advise how to fix it ?
We use smartfox: 2.13.4
server runs on aws c5.large
java:
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)
Our players usually close browsers or tab after completing game or set of games (though we have stand up and disconnect buttons). Usual load on server CCU wise 100-300 24x7.
What I also noticed that during stress test number of file descriptors in ESTABLISHED state for 500 CCU = 95K+
root@banana:~# lsof -n | grep smartfox | grep ESTABLISHED | wc -l
95570
For stress test we used java client with direct tcp connection and not html5 client.
Other question: potentially our 500 CCU pulled load equals 4500 CCU and server was able to handle it. But if 500 CCU has 95k+ files opened than 4500 will open almost 700K which is very close to file limit that can be open on c5.large. I guess these two problems might be somewhat related. But CLOSE_WAIT problem is more pressing.
Thank you,
Sergey.
We've developed multiplayer game using html5 sdk on client. Server creates room extensions using java.
Production servers started to fail after a week with with "Too many open files" exceptions. When I checked lsof it shows 500k+ file descriptors in CLOSE_WAIT state. All of them are IPv6 TCP connections from different external IPs (for sure related to connected players). Please advise how to fix it ?
We use smartfox: 2.13.4
server runs on aws c5.large
java:
openjdk version "1.8.0_212"
OpenJDK Runtime Environment (build 1.8.0_212-8u212-b03-0ubuntu1.16.04.1-b03)
OpenJDK 64-Bit Server VM (build 25.212-b03, mixed mode)
Our players usually close browsers or tab after completing game or set of games (though we have stand up and disconnect buttons). Usual load on server CCU wise 100-300 24x7.
What I also noticed that during stress test number of file descriptors in ESTABLISHED state for 500 CCU = 95K+
root@banana:~# lsof -n | grep smartfox | grep ESTABLISHED | wc -l
95570
For stress test we used java client with direct tcp connection and not html5 client.
Other question: potentially our 500 CCU pulled load equals 4500 CCU and server was able to handle it. But if 500 CCU has 95k+ files opened than 4500 will open almost 700K which is very close to file limit that can be open on c5.large. I guess these two problems might be somewhat related. But CLOSE_WAIT problem is more pressing.
Thank you,
Sergey.