Delay seen between joinRoom and USER_JOIN_ROOM event

Post here your questions about Actionscript and Java server side extensions development.

Moderators: Lapo, Bax

winstonwaite
Posts: 11
Joined: 17 Aug 2016, 20:01

Delay seen between joinRoom and USER_JOIN_ROOM event

Postby winstonwaite » 17 Aug 2016, 20:11

We have a game that uses SF for matchmaking. Users connect and are kept in a lobby... and the server pulls people out of the queue and starts to put them into game rooms.

We create the room and then loop over the users to call joinRoom on the newly created room. When the server event USER_JOIN_ROOM fired for a particular user... we then send them a message telling them they are ready to move forward with other logic.

When we load the server up... I start to see a very large delay between adding a user to a room and seeing this event come back. With 200 simulated users on my laptop... I start to see a delay of 15 seconds for the event to come back. I did notice in another forum post that it was mentioned that you dont need to listen to the event if you are server side... that once you call the api to joinRoom... the user is already joined so dont need to handle. I tried sending the message to the user right after the joinRoom and all of a sudden I get errors like this:

Code: Select all

17 Aug 2016 | 12:53:31,111 | WARN  | SFSWorker:Ext:3 | controllers.v290.ExtensionReqController |     | com.smartfoxserver.v2.exceptions.SFSExtensionException: Extension Request refused. Sender is not a User: { Id: 163, Type: DEFAULT, Logged: No, IP: 10.1.1.1:58426 }
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController.processRequest(ExtensionReqController.java:93)
   com.smartfoxserver.v2.controllers.v290.ExtensionReqController$1.run(ExtensionReqController.java:68)
   java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   java.lang.Thread.run(Unknown Source)


The forum post that seemed to claim you dont need to wait for the event was here: viewtopic.php?t=15378

Assuming that I do need to wait for the event for the user to actually be officially joined to the room... what might be causing so much delay? The server is not pegged on CPU (10-20%?). Bandwidth isn't crazy... 3kbps on average? Sometimes spikes but never for more than a second. We also ran this on a full-blown server... it was showing similar behavior around 2000 users.
User avatar
Lapo
Site Admin
Posts: 22999
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Delay seen between joinRoom and USER_JOIN_ROOM event

Postby Lapo » 18 Aug 2016, 07:22

Hi,
the 15 seconds delay is certainly not expected, but I don't have enough info to guess what might be wrong.

In any case it is also correct that you don't need to wait the USER_JOIN_ROOM event on the server side, after calling the API.joinRoom(...) method. The USER_JOIN_ROOM event is useful when the Room join is initiated from client side.

Before I reply to the rest of your post, I need to understand something. By default the API.joinRoom() sends an event to the client notifying that they've a joined a certain Room, with the details of the Room etc...

Isn't this enough for the client as a signal to proceed with the game logic?

I tried sending the message to the user right after the joinRoom and all of a sudden I get errors like this:

The error you have reported has nothing to do with a custom Extension message sent from client to server. In fact this error states the opposite. It is complaining about a non logged in client trying to invoke Extension commands... and failing because users needs to be logged in before the can interact with custom server side code.

Hope it helps
Lapo
--
gotoAndPlay()
...addicted to flash games
winstonwaite
Posts: 11
Joined: 17 Aug 2016, 20:01

Re: Delay seen between joinRoom and USER_JOIN_ROOM event

Postby winstonwaite » 18 Aug 2016, 18:34

The error you have reported has nothing to do with a custom Extension message sent from client to server. In fact this error states the opposite. It is complaining about a non logged in client trying to invoke Extension commands... and failing because users needs to be logged in before the can interact with custom server side code.


When you say "logged in" what do you mean? In our implementation... users connect and are moved into a matchmaking room. Then we have an interval timer that gets groups of users from the room and moves them into new rooms for games.

Is api.joinRoom() truly synchronous? Why have a server-side event then? Maybe just to keep architecture consistent? I have a strong feeling it is not. My reasoning is as follows:

This game has been played by many people for around a year. So in general... it works and we don't see an issue where users are trying to send messages to a room they dont belong to yet (we don't see the error about sending to a RoomExtension vs. ZoneExtension.) When under load... I start seeing long delays between calling joinRoom() and getting the event back on the server. After I read the forum post about not needing to wait for the server event to come back... I moved the logic out of the event handler... and directly after api.joinRoom(). Only then do I start to see the error about not being able to call a RoomExtension for a room that I dont belong to.

Here is the flow that has been going for at least a year:

1) User connects and we move them to a lobby.
2) Timer fires and starts processing users in the lobby
3) For each group of users it wants to match up... it creates a room and then calls api.joinRoom() passing each user to it.
4) In the server side USER_JOIN_ROOM event handler... we send a message down to the user which says they are ready to go.
5) The client receives this message and then starts sending messages to the extension handling gameplay for the room

With this... everything works fine unless under load. Once the USER_JOIN_ROOM event starts to get delayed... it becomes very obvious that they have been sitting in a holding pattern of sorts to get into the game. In fact... we won't let them in a game that has started too long in the past.

So... when I saw the message implying that api.joinRoom() is actually synchronous... I tried the following:
1) User connects and we move them to a lobby.
2) Timer fires and starts processing users in the lobby
3) For each group of users it wants to match up... it creates a room and then calls api.joinRoom() passing each user to it.
4) Instead of waiting for the USER_JOIN_ROOM server side event... right after api.joinRoom() i go ahead and send the user a message saying they are in the room and ready to go
5) The user receives this message and then tries to send a message back to the server... presumably trying to call the room extension
6) All of a sudden... the error I posted about a user not being able to call an extension method shows up

I infer from this that calling api.joinRoom() has not completed its work... and that the user trying to send a message before the USER_JOIN_ROOM event has been fired server side is the cause the the warning about not being able to call a room extension because he isn't in the room yet. Otherwise... why would introducing a delay by waiting for the event suddenly allow the user to send messages with no warnings? Again... the warnings only showed up when i tried to send a message to the user before the server side event was fired.

Also... realize that even a perfect api and system can be messed up by people not using it correctly. Our main question is why does the server start to have such a delay between joinRoom() and the event being fired? Are extension calls and internal server events pulled from the same queue? Something seems very off for the server to take 15 seconds to fire the event... when CPU is not pegged, the client and server are on the same machine and bandwidth in and out of server from admin screen perspective is not crazy (5-20 KBps... spikes occasionally to 200KBps but for under a second). The queue screen also doesn't show anything crazy and there are no dropped messages.

It could be that we do have some extension code that is waiting for IO or something and is blocking worker threads from moving onto SmartFox internal tasks... and that would actually be the best case... because we can tune our own code. Just not really sure how I can make it visible which items are blocking the queue (if that is the issue).

There could be different choices about using the client-side ROOM_JOIN event instead of sending our own down to say they are in the room... but somewhere along the line there will be a server-side event that needs to be processed... and right now it appears that the eventing is getting slowed down and we aren't sure what metrics might help us see where the hold up is.
winstonwaite
Posts: 11
Joined: 17 Aug 2016, 20:01

Re: Delay seen between joinRoom and USER_JOIN_ROOM event

Postby winstonwaite » 18 Aug 2016, 20:16

Oops! I pasted the wrong exception in the original post! It was this:

17 Aug 2016 | 10:58:51,638 | WARN | SFSWorker:Ext:2 | controllers.v290.ExtensionReqController | | com.smartfoxserver.v2.exceptions.SFSExtensionException: User cannot invoke Room extension if he's not joined in that Room. ( User Name: b198133914ec4c299900fd99ed8bc3aa, Id: 85, Priv: 0, Sess: 10.1.20.90:55670 ) , [ SFSGame: br18545, Id: 40, Group: default, public: true, minPlayers: 1 ]


The other exception is new... and happened when I was playing with the code... not something that we actually run.
winstonwaite
Posts: 11
Joined: 17 Aug 2016, 20:01

Re: Delay seen between joinRoom and USER_JOIN_ROOM event

Postby winstonwaite » 19 Aug 2016, 04:51

So I tracked down the slowdown... the test clients falsely subscribe to USER_EXIT (or a similar event that we aren't using in our client... dont have access to code at the moment) and causes a flurry of messages. We do still have a volume of messages that need to be processed. Perhaps half of the throughput could be useful at higher user counts. Say 5k? What sort of message throughput is expected from the framework?

Also noticed the config setting that lets you batch a particularly common message that gets sent in a scattered fashion.
User avatar
Lapo
Site Admin
Posts: 22999
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: Delay seen between joinRoom and USER_JOIN_ROOM event

Postby Lapo » 19 Aug 2016, 07:44

Is api.joinRoom() truly synchronous? Why have a server-side event then? Maybe just to keep architecture consistent? I have a strong feeling it is not. My reasoning is as follows:

Yes it is synchronous. It runs in the same thread of your code.
Why is there an server side event? Because you may be interested in knowing when a client joins a certain Room, when operation is requested from client side. (i.e. client sends a JoinRoomRequest).

If you join users from server side code, listening for the USER_JOIN_ROOM is redundant and adds more work to the event systen. In other words when you call SFSApi.joinRoom(...) you should make sure to set the fireServerEvent parameter to false.

SIDE NOTE: all events are asynchronous in SmartFoxServer, both on client and server side. So it's perfectly normal that you see a small delay between calling SFSApi.joinRoom() and receiving the (redundant) USER_JOIN_ROOM event.

The fact that you experience long delays between events suggests that your event handling code is running slow, i.e. holding the event threads for long time and thus creating a busy queue. You haven't specified which SFS2X version you're using. Based on the version there are different ways to configure the thread pools to improve performance and throughput.
You can read all the details here:
http://docs2x.smartfoxserver.com/Advanc ... extensions

1) User connects and we move them to a lobby.
2) Timer fires and starts processing users in the lobby
3) For each group of users it wants to match up... it creates a room and then calls api.joinRoom() passing each user to it.
4) Instead of waiting for the USER_JOIN_ROOM server side event... right after api.joinRoom() i go ahead and send the user a message saying they are in the room and ready to go
5) The user receives this message and then tries to send a message back to the server... presumably trying to call the room extension
6) All of a sudden... the error I posted about a user not being able to call an extension method shows up

This is indicative of the fact that some users haven't completed the login phase. So the problem is way up in the logic, even before this whole set of steps.
It sounds like your Extension queue is getting pretty busy when you ramp up the CCU and this causes slow downs and queued up requests at the Extension level. My suggestion is to check the state of the server queues when you reach an high CCU count via the AdminTool.

You should see small spikes of queued messages. If you have a busy queue all the times we should investigate the current configuration and what's keeping the threads busy.

Are extension calls and internal server events pulled from the same queue?

No. Please read the article I've linked above.


The queue screen also doesn't show anything crazy and there are no dropped messages.

I'd like to see screenshots of the queue status when you're seeing those Extension call errors.

So I tracked down the slowdown... the test clients falsely subscribe to USER_EXIT (or a similar event that we aren't using in our client... dont have access to code at the moment) and causes a flurry of messages.

I don't understand how the client subscribing to an event that is not used has anything to do with the flurry of Extension call errors...

Thanks
Lapo

--

gotoAndPlay()

...addicted to flash games

Return to “Server Side Extension Development”

Who is online

Users browsing this forum: No registered users and 13 guests