Hi everyone,
I am really thrilled to let you know that I successfully launched our first real usage of Smartfox (2x to be precise but it's our first time anyway).
We have around 15k concurrents users at peak in our game but we are slowly rolling it out (up to 2k users atm) and for now our usage of SFX is only for player-player chat and guild chat.
However, even if the feature is simple, our extension does a few things like talking to our main system thru http api calls (and not directly using a DB connexions) for player log-in and buddy list management, so it's not totally irrelevant.
First, I must say I'm really happy with the overall performance. As you see in the linked screenshot, cpu-usage stays quite low on an "average" server.
I can indeed believe that with correctly written extension code, Smartfox will not be the bottleneck to reach multiples thousands of CCU.
https://img.skitch.com/20110706-qcmn2pg ... rcmcdc.jpg
But, I can see that the CPU usage is really erratic, doing up (20%) and down (5%) all the time (you can see it in the screenshot as well). It's so regular that I believe it must be something I forgot to change in the configuration or something. Maybe the GC coming too often ?
We also satured the CPU at 100% 2 times (with high number of CCU and slow http API response time), making SFS almost unresponsive. I dont know yet the exact cause of the issue.
I'd love to have some advices so feel free to comment
Sebastien
Live deployment of SFX2x (2000+ CCU) - Report and Questions.
Follow up :) - Would love some insights from GotoAndPlay
Hi again,
We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.
I'm happy to say that I figured out the bit about CPU saturation:
I was creating a lot of rooms in the default group, they were too many events sent of USER_COUNT_CHANGE, ROOM_CREATED, ROOM_JOIN..
In my use case, each room can be in it's own group so I used that and additionnally, I used the api calls params to disable some server and client events.
However, I am still annoyed by the issue with USER_LOGIN and the buddy list storage system beeing not async.
The consequence for the user is that (custom) login takes a long time because it's waiting for smartfox to have free time to process the event.
Once the user is logged in, everything is fast and responsive (ie: private and buddy messages are relayed instantly). My dropped packet rate is below 1%.
I can and will optimize the time it takes for my http api call, but still.
I increased the number of threads in my extension to 16 to create more parallelism but I'm not happy with this solution. I think using too many threads is not really optimal and have performance issue on it's own (if the hardware has not enough core).
Anyway, still very happy from the early results. I'm just a bit anxious when we're gonna release it to everyone (~15k CCU).
If you have any question, let me know.
Sebastien
We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.
I'm happy to say that I figured out the bit about CPU saturation:
I was creating a lot of rooms in the default group, they were too many events sent of USER_COUNT_CHANGE, ROOM_CREATED, ROOM_JOIN..
In my use case, each room can be in it's own group so I used that and additionnally, I used the api calls params to disable some server and client events.
However, I am still annoyed by the issue with USER_LOGIN and the buddy list storage system beeing not async.
The consequence for the user is that (custom) login takes a long time because it's waiting for smartfox to have free time to process the event.
Once the user is logged in, everything is fast and responsive (ie: private and buddy messages are relayed instantly). My dropped packet rate is below 1%.
I can and will optimize the time it takes for my http api call, but still.
I increased the number of threads in my extension to 16 to create more parallelism but I'm not happy with this solution. I think using too many threads is not really optimal and have performance issue on it's own (if the hardware has not enough core).
Anyway, still very happy from the early results. I'm just a bit anxious when we're gonna release it to everyone (~15k CCU).
If you have any question, let me know.
Sebastien
We are using Flash.
We also have iPhone client but we haven't updated it yet.
For the curious, the game is Urban Rivals (http://www.urban-rivals.com).
We also have iPhone client but we haven't updated it yet.
For the curious, the game is Urban Rivals (http://www.urban-rivals.com).
I'm not sure how much work it'd involve for you, but you could always use an escalation pattern when doing the login.
At least for us, we only do the bare minimum token-based auth before returning right away. User data, role promotion, initial room joins, etc are all done on a deferred basis on a delayed scheduler queue or invoked by the client as a second stage to the login.
It requires a few more custom extension calls of course. I haven't played with the buddy list system to know whether something similar can be done there.
At least for us, we only do the bare minimum token-based auth before returning right away. User data, role promotion, initial room joins, etc are all done on a deferred basis on a delayed scheduler queue or invoked by the client as a second stage to the login.
It requires a few more custom extension calls of course. I haven't played with the buddy list system to know whether something similar can be done there.
Re: Follow up :) - Would love some insights from GotoAndPlay
Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.
Do you use single server? or some clustering solution
Thanks for reporting
16 is reasonable and you might need even more. HTTP calls are really slow and require lots of threads. If you have high traffic you might need to push the threads in the hundreds depending on how much these HTTP services are used.
If you have a decent server (4x CPU or more) it shouldn't be a problem. One downside is that threads eat memory so watch your settings.
Cheers
I increased the number of threads in my extension to 16 to create more parallelism but I'm not happy with this solution. I think using too many threads is not really optimal and have performance issue on it's own (if the hardware has not enough core).
16 is reasonable and you might need even more. HTTP calls are really slow and require lots of threads. If you have high traffic you might need to push the threads in the hundreds depending on how much these HTTP services are used.
If you have a decent server (4x CPU or more) it shouldn't be a problem. One downside is that threads eat memory so watch your settings.
Cheers
A lot more questions now
Thanks lapo.
After spending 2 weeks running smartfox and tweaking here and there, I got tons of questions and a few issues.
First of all, I still got thoses huges cpu spikes again:
https://img.skitch.com/20110713-frdq33t ... rqdf78.jpg
(everytime it happens, there is a huge difference between session and user logged in, why ? should the number be inverted, I dont understand how I can have more user logged in than sessions))
The system queues status panel shows almost no message waiting in any queue, as you can see there https://img.skitch.com/20110713-xkbwr56 ... ppshqr.jpg
I'll increase the number of threads (if needed) but I need more informations. In smartfox admin panel I can see:
System Controller thread pool size
Extensions Controller thread pool size
Task scheduler thread pool size
Appart for the last one, wich one does what exactly?
I mean, when a user log in or send a extension command, what is happening exactly ?
And what about the buddylist ?
Also, I use @Instantiation(InstantiationMode.SINGLE_INSTANCE) for every handler, maybe it's a bad idea?
At the moment, my http call are in USER_LOGIN handler, but I can split it in two and move the longuest one in the buddylist loadBuddyList object, is it useful ?
Also, I got several strange issues with smartfox, like:
• At some point, for some users, it seems that the smartfox bitswarm client (flash) receive "LOGIN" with the datas, but the ON_LOGIN event does not trigger (I experienced it my self with smartfox client debug turned on). Restarting smartfox and hop, the problem went away.
• It is really annoying that part of the client configuration can be done in code and part using a external file. My setup is quite complicated cdn + multiple website using the same flash + prod environnement and multiples dev environnement. It doesn't really work for me and I need to change the httpPort using code. We usually pass the config params using flashvars thru php.
• If I use an external config file for the client, It doesn't care about the host/port args of my connect() call (in case I want to override).
• Sometime I have messages like this in the log:
or like this:
Emphasis on the fact that there is no stack trace. I cannot log at DEBUG level, there is too much traffic and the issues happens mostly WITH lots of traffic.
If you can help me out with my issues that would be fantastic
I can send you the source code (it's quite small) if you want.
At the moment, I cannot roll the new version to everyone because of the cpu behavior (mostly). I'm sure I did a few design mistake and so I can't wait to hear from the masters
Thanks,
Seb
After spending 2 weeks running smartfox and tweaking here and there, I got tons of questions and a few issues.
First of all, I still got thoses huges cpu spikes again:
https://img.skitch.com/20110713-frdq33t ... rqdf78.jpg
(everytime it happens, there is a huge difference between session and user logged in, why ? should the number be inverted, I dont understand how I can have more user logged in than sessions))
The system queues status panel shows almost no message waiting in any queue, as you can see there https://img.skitch.com/20110713-xkbwr56 ... ppshqr.jpg
I'll increase the number of threads (if needed) but I need more informations. In smartfox admin panel I can see:
System Controller thread pool size
Extensions Controller thread pool size
Task scheduler thread pool size
Appart for the last one, wich one does what exactly?
I mean, when a user log in or send a extension command, what is happening exactly ?
And what about the buddylist ?
Also, I use @Instantiation(InstantiationMode.SINGLE_INSTANCE) for every handler, maybe it's a bad idea?
At the moment, my http call are in USER_LOGIN handler, but I can split it in two and move the longuest one in the buddylist loadBuddyList object, is it useful ?
Also, I got several strange issues with smartfox, like:
• At some point, for some users, it seems that the smartfox bitswarm client (flash) receive "LOGIN" with the datas, but the ON_LOGIN event does not trigger (I experienced it my self with smartfox client debug turned on). Restarting smartfox and hop, the problem went away.
• It is really annoying that part of the client configuration can be done in code and part using a external file. My setup is quite complicated cdn + multiple website using the same flash + prod environnement and multiples dev environnement. It doesn't really work for me and I need to change the httpPort using code. We usually pass the config params using flashvars thru php.
• If I use an external config file for the client, It doesn't care about the host/port args of my connect() call (in case I want to override).
• Sometime I have messages like this in the log:
13 Jul 2011 16:56:00,903 WARN [pool-1-thread-5] entities.managers.SFSExtensionManager -
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Exception: java.lang.NullPointerException
Message: *** Null ***
Description: Error during event handling: java.lang.NullPointerException, Listener: { Ext: urbanRivalsChat, Type: JAVA, Lev: ZONE, { Zone: urbanRivalsChat }, {} }
+--- --- ---+
Stack Trace:
+--- --- ---+
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
or like this:
13 Jul 2011 16:56:15,967 WARN [com.smartfoxserver.v2.controllers.SystemController-1] v2.controllers.SystemController -
java.lang.NullPointerException
Emphasis on the fact that there is no stack trace. I cannot log at DEBUG level, there is too much traffic and the issues happens mostly WITH lots of traffic.
If you can help me out with my issues that would be fantastic
I can send you the source code (it's quite small) if you want.
At the moment, I cannot roll the new version to everyone because of the cpu behavior (mostly). I'm sure I did a few design mistake and so I can't wait to hear from the masters
Thanks,
Seb
Re: Follow up :) - Would love some insights from GotoAndPlay
rav wrote:Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.
Do you use single server? or some clustering solution
A single server. I have no plan to cluster.
I believe my game can run 40k players on one server, when I'll good at smartfox.
Re: Follow up :) - Would love some insights from GotoAndPlay
Fraggle wrote:rav wrote:Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.
Do you use single server? or some clustering solution
A single server. I have no plan to cluster.
I believe my game can run 40k players on one server, when I'll good at smartfox.
IMHO it's rather optimistic, but it would be great if it were so.
How much extension requests approximately one user generate per second in your game?
Re: Follow up :) - Would love some insights from GotoAndPlay
rav wrote:Fraggle wrote:rav wrote:Fraggle wrote:We have now reached more than 4500 CCU as we are allowing a bigger share of our userbase to use the new version.
Do you use single server? or some clustering solution
A single server. I have no plan to cluster.
I believe my game can run 40k players on one server, when I'll good at smartfox.
IMHO it's rather optimistic, but it would be great if it were so.
How much extension requests approximately one user generate per second in your game?
Not much, more likely 1 every 10 sec. It's turn based.
Who is online
Users browsing this forum: No registered users and 130 guests