Page 1 of 1

SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 10:22
by Nexic
Hello,

We launched our game 3 days ago, and have 5 servers each with 1-3k conceurrent user online at all times (average is 1500 ish). Every machine is installed with vanilla Centos 7, SFS 2x 2.13.3 and NetData only (with 300k open files limit). They are all Intel(R) Xeon(R) CPU E3-1270 v6 @ 3.80GHz (8 core) with 32GB RAM and unlimited 1gbps lines.

We do not use any extensions, or MMO features; just simple rooms, chat and UserVariable updates. No database, no emails, no UDP, no excryption etc. Each machine is set to use the default amount of threads (since docs suggest not fiddling with them) but we did set the JVM with 8GB min and 16GB max. On average our users are sending 5 messages per second each in rooms with a max size of 4 people. This generates approx 80-160mbps of traffic.

The problem is that these instances are crashing constantly and we have to manually restart them. According to net data the servers will suddenly start running out of of memory, despite 16GB max being set in the JVM, the machine having 32GB total and there being nothing else running (other than NetData, which uses tiny amounts of RAM). In the logs it shows that SFS is getting an OOM kill command from the OS. Sometimes (but rarely) the entire machine will lock up completely and need a full restart.

In a desperate attempt to fix the problem I have set all SFS instances to restart hourly. Whilst it has helped, it still hasn't solved the issue, and one instance recently crashed like this after just 20 minutes of uptime. It's also not a great user experience for our players, as it creates a 10+ second long lag while the client auto remakes/rejoins games.

Since it's happening on all 5 machines, each rented from different datacenters, I think it's safe to say it's not a hardware fault.

I should note that I have never seen any of the threads increase automatically, and all messages queues show green in terms of load.

The problem does not appear to be directly linked to traffic. It's happened to machines with only 800 online, and I've also seen an instance run for hours with 3k online and not have issues. It feels very random.

We have tried:
- Increasing JVM memory to 12-24GB, didn't stop the problem but might have made it slightly rarer (hard to tell though)
- Lowering JVM memory 4-8GB, this would cause SFS to literally reboot itself every 5 minutes.
- Increasing core and extension threads to 64/32. Didn't really seem to make any difference.

Do you have any suggestions for us? We're getting a bit desperate now to be honest :)

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 10:56
by Nexic
Got some screenshots just to prove more what's going on. These were both taken from the same server just now. SFS Admintool reports only 9GB of memory allocation, but NetData reports java using 27GB (SFS is the only java application). This instance crashed shortly after and had to be restarted.

Image

Image

I did wonder if we somehow had multiple instances of SFS running from all the restarts, but that doesn't seem possible. After restarting netdata and SFS AdminTool mem usage match up perfectly again.

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 11:29
by Lapo
Hi,
I think something else is missing. If the server is running with no server side code and a few thousand users you should rarely see heap memory usage go over 1GB. (not the allocated RAM, that depends on your settings and available RAM in the system).

Also the idea that the JVM is set to use 16GB and its found to use 27GB instead sounds very suspicious.
I don't know this tool you're using but I would question it's accuracy as the above seems difficult to believe, unless data has been interpreted incorrectly.

For starters you could zip the SFS2X/config/ and SFS2X/zones folders and send them to our support@... email box with a reference to this discussion. So we can take a look at the current config you're using.

Also, are you running the server with it's default JRE or has it been replaced?

Thanks

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 12:05
by Nexic
We've been using NetData on our previous online games for years and it been rock solid. It's never misreported anything. I have also previously seen these java usage numbers just by going into the CLI and running top (though I don't have a screenshot, I'll try and get it next time it happens).

JRM is the default supplied. The only other thing we've changed is the l4j logging to WARN level to try and prevent drives filling up (as a team member suggested here: viewtopic.php?t=10336)

I'll send over a copy to support now. I would also be happy to let one your team login to one of our machines and poke around if necessary.

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 12:19
by Nexic
Image

This shows java usage at 22GB in both netdata and top while SFS AdminTool only reports 10GB. Not quite at crashing levels yet but I think illustrates the problem and proves net data is not making it all up :)

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 12:30
by Nexic
I've sent in the requested files. Any help you could give me would be hugely appreciated. Time really is of the essence here as we'll probably start losing a lot of players if we can't get on top of this issue soon. Thanks!

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 14:33
by Lapo
Thanks,
Nexic wrote:Image
This shows java usage at 22GB in both netdata and top while SFS AdminTool only reports 10GB. Not quite at crashing levels yet but I think illustrates the problem and proves net data is not making it all up :)

I wasn't implying you were making it up :)
To be clear, keep in mind that the AdminTool is measuring the amount of heap memory, allocated/used. If you plug in another monitoring tool such as JConsole or VisualVM you'll find the same values.

The JVM has other memory areas that are not directly measurable at runtime such as stack memory and PermGen (now called Metaspace) which usually constitute a lesser portion of the overall RAM usage.

In your case it looks like you're using a massive amount of non-heap memory, which seems quite bizarre. In other words from the picture above the whole application is using ~10GB of heap (4GB really used) and ~13GB of other stuff.

Unless you're doing some crazy class-loading at runtime, I am having a hard time imagining what else could be using so much non-heap memory.

We'll take a look at the files you've sent.
Cheers

Re: SFS2x Memory Leak/Crash

Posted: 08 Sep 2018, 15:54
by Nexic
Thanks for the explanation and for your email. I will attempt the fixes you suggested and see if that solves the issue (I suspect it will).

Re: SFS2x Memory Leak/Crash

Posted: 10 Sep 2018, 07:57
by Lapo
Sure, let us know.

Re: SFS2x Memory Leak/Crash

Posted: 21 Nov 2018, 13:53
by zynbasil
More details

I have the same issus with sfs2x 2.13.4

Re: SFS2x Memory Leak/Crash

Posted: 21 Nov 2018, 16:19
by Lapo
zynbasil wrote:I have the same issus with sfs2x 2.13.4

You already started a thread on this, let's keep the conversation in one place.
Thanks

Re: SFS2x Memory Leak/Crash

Posted: 14 Feb 2019, 09:45
by rewb0rn
Hey,

have you been able to sort the problem out? We are currently experiencing similar problems.. System reporting growing memory for the Java process, but it is not the heap.

Thanks in advance

Re: SFS2x Memory Leak/Crash

Posted: 14 Feb 2019, 17:08
by Lapo
rewb0rn wrote:Hey,

have you been able to sort the problem out? We are currently experiencing similar problems.. System reporting growing memory for the Java process, but it is not the heap.

Thanks in advance

Yes the problem was solved. Basically they were having issues with the logging. There were a large amount of errors generated every few seconds putting lots of pressure on memory due to constant string generation, plus the logging overhead.

Cheers

Re: SFS2x Memory Leak/Crash

Posted: 15 Feb 2019, 07:58
by rewb0rn
Thanks, I'll check if that could be the case, I think we also added some loggings recently.