Live deployment of SFX2x (2000+ CCU) - Report and Questions.

Post here your questions about SFS2X. Here we discuss all server-side matters. For client API questions see the dedicated forums.

Moderators: Lapo, Bax

User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

Live deployment of SFX2x (2000+ CCU) - Report and Questions.

Postby Fraggle » 06 Jul 2011, 12:39

Hi everyone,

I am really thrilled to let you know that I successfully launched our first real usage of Smartfox (2x to be precise but it's our first time anyway).

We have around 15k concurrents users at peak in our game but we are slowly rolling it out (up to 2k users atm) and for now our usage of SFX is only for player-player chat and guild chat.

However, even if the feature is simple, our extension does a few things like talking to our main system thru http api calls (and not directly using a DB connexions) for player log-in and buddy list management, so it's not totally irrelevant.

First, I must say I'm really happy with the overall performance. As you see in the linked screenshot, cpu-usage stays quite low on an "average" server.
I can indeed believe that with correctly written extension code, Smartfox will not be the bottleneck to reach multiples thousands of CCU.

https://img.skitch.com/20110706-qcmn2pg ... rcmcdc.jpg

But, I can see that the CPU usage is really erratic, doing up (20%) and down (5%) all the time (you can see it in the screenshot as well). It's so regular that I believe it must be something I forgot to change in the configuration or something. Maybe the GC coming too often ?

We also satured the CPU at 100% 2 times (with high number of CCU and slow http API response time), making SFS almost unresponsive. I dont know yet the exact cause of the issue.

I'd love to have some advices so feel free to comment :)

Sebastien
mrpinc
Posts: 25
Joined: 02 Nov 2009, 21:12

Postby mrpinc » 06 Jul 2011, 17:18

Very interesting stuff, I'd love it if you would share more of this info.

In terms of optimizations I can only suggest you make sure that any unnecessary events don't get distributed and maybe you can increase the room count update interval if you are using that feature.
tchen
Posts: 191
Joined: 11 Dec 2010, 14:14

Postby tchen » 07 Jul 2011, 12:59

Thank you for sharing.

Regarding the http calls, be sure to use the asynch http requests when inside the SFS request handlers, so you don't block it. Ideally, almost all work should be shoved into the scheduler too.
User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

Postby Fraggle » 07 Jul 2011, 13:00

I would love to but how can I do it async, I'm doing them at USER_LOGIN ?
tchen
Posts: 191
Joined: 11 Dec 2010, 14:14

Postby tchen » 09 Jul 2011, 00:40

Sorry, I missed the login bit. USER_LOGIN is problematic as it does need to return something.
:(
User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

Follow up :) - Would love some insights from GotoAndPlay

Postby Fraggle » 09 Jul 2011, 15:57

Hi again,

We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.

I'm happy to say that I figured out the bit about CPU saturation:
I was creating a lot of rooms in the default group, they were too many events sent of USER_COUNT_CHANGE, ROOM_CREATED, ROOM_JOIN..
In my use case, each room can be in it's own group so I used that and additionnally, I used the api calls params to disable some server and client events.

However, I am still annoyed by the issue with USER_LOGIN and the buddy list storage system beeing not async.

The consequence for the user is that (custom) login takes a long time because it's waiting for smartfox to have free time to process the event.
Once the user is logged in, everything is fast and responsive (ie: private and buddy messages are relayed instantly). My dropped packet rate is below 1%.

I can and will optimize the time it takes for my http api call, but still.

I increased the number of threads in my extension to 16 to create more parallelism but I'm not happy with this solution. I think using too many threads is not really optimal and have performance issue on it's own (if the hardware has not enough core).

Anyway, still very happy from the early results. I'm just a bit anxious when we're gonna release it to everyone (~15k CCU).

If you have any question, let me know.

Sebastien
hasbean
Posts: 43
Joined: 02 Sep 2009, 11:39

Postby hasbean » 10 Jul 2011, 06:05

That's pretty awesome! What's the client?
User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

Postby Fraggle » 10 Jul 2011, 06:08

We are using Flash.
We also have iPhone client but we haven't updated it yet.

For the curious, the game is Urban Rivals (http://www.urban-rivals.com).
tchen
Posts: 191
Joined: 11 Dec 2010, 14:14

Postby tchen » 12 Jul 2011, 19:31

I'm not sure how much work it'd involve for you, but you could always use an escalation pattern when doing the login.

At least for us, we only do the bare minimum token-based auth before returning right away. User data, role promotion, initial room joins, etc are all done on a deferred basis on a delayed scheduler queue or invoked by the client as a second stage to the login.

It requires a few more custom extension calls of course. I haven't played with the buddy list system to know whether something similar can be done there.
rav
Posts: 82
Joined: 06 Dec 2010, 13:14

Re: Follow up :) - Would love some insights from GotoAndPlay

Postby rav » 13 Jul 2011, 05:41

Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.


Do you use single server? or some clustering solution
User avatar
Lapo
Site Admin
Posts: 23008
Joined: 21 Mar 2005, 09:50
Location: Italy

Postby Lapo » 13 Jul 2011, 06:21

Thanks for reporting :)

I increased the number of threads in my extension to 16 to create more parallelism but I'm not happy with this solution. I think using too many threads is not really optimal and have performance issue on it's own (if the hardware has not enough core).


16 is reasonable and you might need even more. HTTP calls are really slow and require lots of threads. If you have high traffic you might need to push the threads in the hundreds depending on how much these HTTP services are used.

If you have a decent server (4x CPU or more) it shouldn't be a problem. One downside is that threads eat memory so watch your settings.

Cheers
Lapo
--
gotoAndPlay()
...addicted to flash games
User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

A lot more questions now

Postby Fraggle » 13 Jul 2011, 13:56

Thanks lapo.

After spending 2 weeks running smartfox and tweaking here and there, I got tons of questions and a few issues.

First of all, I still got thoses huges cpu spikes again:
https://img.skitch.com/20110713-frdq33t ... rqdf78.jpg
(everytime it happens, there is a huge difference between session and user logged in, why ? should the number be inverted, I dont understand how I can have more user logged in than sessions))

The system queues status panel shows almost no message waiting in any queue, as you can see there https://img.skitch.com/20110713-xkbwr56 ... ppshqr.jpg

I'll increase the number of threads (if needed) but I need more informations. In smartfox admin panel I can see:
System Controller thread pool size
Extensions Controller thread pool size
Task scheduler thread pool size

Appart for the last one, wich one does what exactly?

I mean, when a user log in or send a extension command, what is happening exactly ?
And what about the buddylist ?

Also, I use @Instantiation(InstantiationMode.SINGLE_INSTANCE) for every handler, maybe it's a bad idea?

At the moment, my http call are in USER_LOGIN handler, but I can split it in two and move the longuest one in the buddylist loadBuddyList object, is it useful ?

Also, I got several strange issues with smartfox, like:

• At some point, for some users, it seems that the smartfox bitswarm client (flash) receive "LOGIN" with the datas, but the ON_LOGIN event does not trigger (I experienced it my self with smartfox client debug turned on). Restarting smartfox and hop, the problem went away.

• It is really annoying that part of the client configuration can be done in code and part using a external file. My setup is quite complicated cdn + multiple website using the same flash + prod environnement and multiples dev environnement. It doesn't really work for me and I need to change the httpPort using code. We usually pass the config params using flashvars thru php.

• If I use an external config file for the client, It doesn't care about the host/port args of my connect() call (in case I want to override).

• Sometime I have messages like this in the log:
13 Jul 2011 16:56:00,903 WARN [pool-1-thread-5] entities.managers.SFSExtensionManager -
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Exception: java.lang.NullPointerException
Message: *** Null ***
Description: Error during event handling: java.lang.NullPointerException, Listener: { Ext: urbanRivalsChat, Type: JAVA, Lev: ZONE, { Zone: urbanRivalsChat }, {} }
+--- --- ---+
Stack Trace:
+--- --- ---+
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

or like this:
13 Jul 2011 16:56:15,967 WARN [com.smartfoxserver.v2.controllers.SystemController-1] v2.controllers.SystemController -
java.lang.NullPointerException


Emphasis on the fact that there is no stack trace. I cannot log at DEBUG level, there is too much traffic and the issues happens mostly WITH lots of traffic.

If you can help me out with my issues that would be fantastic :)

I can send you the source code (it's quite small) if you want.

At the moment, I cannot roll the new version to everyone because of the cpu behavior (mostly). I'm sure I did a few design mistake and so I can't wait to hear from the masters :)

Thanks,

Seb
User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

Re: Follow up :) - Would love some insights from GotoAndPlay

Postby Fraggle » 13 Jul 2011, 13:57

rav wrote:
Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.


Do you use single server? or some clustering solution


A single server. I have no plan to cluster.
I believe my game can run 40k players on one server, when I'll good at smartfox.
rav
Posts: 82
Joined: 06 Dec 2010, 13:14

Re: Follow up :) - Would love some insights from GotoAndPlay

Postby rav » 14 Jul 2011, 07:12

Fraggle wrote:
rav wrote:
Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.


Do you use single server? or some clustering solution


A single server. I have no plan to cluster.
I believe my game can run 40k players on one server, when I'll good at smartfox.


IMHO it's rather optimistic, but it would be great if it were so.
How much extension requests approximately one user generate per second in your game?
User avatar
Fraggle
Posts: 62
Joined: 02 Apr 2010, 06:41
Location: Paris, France

Re: Follow up :) - Would love some insights from GotoAndPlay

Postby Fraggle » 14 Jul 2011, 08:16

rav wrote:
Fraggle wrote:
rav wrote:
Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.


Do you use single server? or some clustering solution


A single server. I have no plan to cluster.
I believe my game can run 40k players on one server, when I'll good at smartfox.


IMHO it's rather optimistic, but it would be great if it were so.
How much extension requests approximately one user generate per second in your game?


Not much, more likely 1 every 10 sec. It's turn based.

Return to “SFS2X Questions”

Who is online

Users browsing this forum: No registered users and 53 guests