C++ HRC Not Working

Post here your questions about the C++ API for SFS2X

Moderators: Lapo, Bax, MBagnati

dhuang11
Posts: 38
Joined: 02 May 2014, 08:21

C++ HRC Not Working

Postby dhuang11 » 31 Jul 2015, 07:46

Hi

I'm trying to get C++ HRC working but am not having any luck. This is how we are testing:
1. Use SmartFox::killConnection() to simulate a connection going away - this is tied to a button inside our game
2. When we trigger a killConnection(), we see that the CONNECTION_RETRY event is fired in the client and also the equivalent in the server
3. CONNECTION_RESUME event is never fired.
4. We've tested using both the newest 1.6.0 API as well as the previous 1.1.6 API - both have the same problem and behave identically in these tests and trial/error modifications
5. The zone configuration on the server has reconnection seconds set to 15

Because we have the C++ source code to the SFS2X Client API, we walked through in our debugger and tried to find the cause of the issue. We see that the client is aware that it should fire a CONNECTION_RETRY event and goes into the following call sequence:
1. SmartFox::OnSocketReconnectionTry() - this fires the SFSEvent::CONNECTION_RETRY event back to our client and we do get this successfully
2. BitSwarmClient::RetryConnection() is called which ends up calling BitSwarmClient::OnRetryConnectionEvent() using a retryTimer
3. BitSwarmClient::OnRetryConnectionEvent() calls socket->Connect(address, lastTcpPort)
4. This goes into TCPSocketLayer::Connect() which fails at the following code:

Code: Select all

   if (State() != States_Disconnected)
   {
      boost::shared_ptr<string> message (new string("Calling connect when the socket is not disconnected"));
      LogWarn(message);   
      return;
   }

It returns because the TCPSocketLayer state is not Disconnected

To try things further, we changed BitSwarmClient::OnRetryConnectionEvent() and before calling socket->Connect(address, lastTcpPort), we do a socket->Disconnect(). Doing this, we are able to get a physical socket reconnection and sometimes we can get the resume to fully work with this change. Other times it will crash and more rarely, it will attempt to reconnect but not complete through to CONNECTION_RESUME without crashing. Our new code for TCPSocketLayer::OnRetryConnectionEvent() looks like this:

Code: Select all

void BitSwarmClient::OnRetryConnectionEvent(const boost::system::error_code& code)
{
   if (code == boost::asio::error::operation_aborted)
   {
      // Timer has been stopped
      // Nothing to do
      return;
   }
   
        if(socket->IsConnected()) {
            socket->Disconnect();
        }
   boost::shared_ptr<IPAddress> address (new IPAddress(IPAddress::IPADDRESSTYPE_IPV4, *lastIpAddress));
   socket->Connect(address, lastTcpPort);
}


Doing this, we can see that sometimes the connection resumes normally. Other times it will crash with a SIGABRT doing this boost::lock_guard<boost::recursive_mutex> lock(mtxDisconnection); in the following code block:

Code: Select all

void TCPClient::OnBoostAsioDataReceived(const boost::system::error_code& error, long int length)
{
   boost::shared_ptr<vector<unsigned char> > data;
   if (length > 0)
   {
      data = boost::shared_ptr<vector<unsigned char> >(new vector<unsigned char>());
      data->assign((unsigned char*)(boostTcpInputBuffer.data()), (unsigned char*)(boostTcpInputBuffer.data() + length));
   }
   else
   {
      data = boost::shared_ptr<vector<unsigned char> >(new vector<unsigned char>());
   }

   // Notify received data
   // Note that data length could be 0 when socket closure has been detected
   
   boost::lock_guard<boost::recursive_mutex> lock(mtxDisconnection);

   if (callbackTCPDataRead != NULL)
   {
      callbackTCPDataRead->Invoke(data);
   }

   boost::lock_guard<boost::recursive_mutex> unlock(mtxDisconnection);

   // Decrease counter of asynchronous read that are in progress
   if (counterAsyncReadOperationsInProgress > 0)
   {
      #if defined( WIN32 ) || defined( _WIN32 ) || defined( __WIN32__ ) || defined( __CYGWIN__ )
         BOOST_INTERLOCKED_DECREMENT(&counterAsyncReadOperationsInProgress);
      #else
         counterAsyncReadOperationsInProgress--;
      #endif
   }
}


Can someone help us fix this and get HRC working on C++? It looks like there is now lock contention and multiple threads are trying to acquire the same lock that is causing that failure. If we can get past that, it looks like forcing a socket disconnect before trying to reconnect will get HRC working for the C++ SFS2X API.

Thanks!
User avatar
Lapo
Site Admin
Posts: 19988
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: C++ HRC Not Working

Postby Lapo » 31 Jul 2015, 07:59

Thanks for reporting, I have added a ticket in our bug tracker.
Lapo
--
gotoAndPlay()
...addicted to flash games
dhuang11
Posts: 38
Joined: 02 May 2014, 08:21

Re: C++ HRC Not Working

Postby dhuang11 » 03 Aug 2015, 06:42

Ok we have a working fix (I think)

On top of calling a Disconnect() as mentioned in the previous post, we also changed the following:

Code: Select all

void TCPClient::Shutdown()
{
    boost::lock_guard<boost::recursive_mutex> lock(mtxDisconnection);
   // Cancel all asynchronous operations associated with the socket
//    boostTcpSocket.shutdown(boost::asio::ip::tcp::socket::shutdown_both);
    boostTcpSocket.close();
    boostIoService.stop();
    boost::lock_guard<boost::recursive_mutex> unlock(mtxDisconnection);
}


This actually locks TCPClient for disconnection as the mutex was intended based on its name.

We also changed TCPSocketLayer like so:

Code: Select all

void TCPSocketLayer::Disconnect()
{
   if (State() != States_Connected)
   {
      boost::shared_ptr<string> message (new string("Calling disconnect when the socket is not connected"));
      LogWarn(message);   
      return;
   }

   isDisconnecting = true;
         
   try
   {
        connection->Dispose();       
      connection->Shutdown();
   }
   catch (...)
   {
   }

   HandleDisconnection();
   isDisconnecting = false;
}


And finally changed this:

Code: Select all

void TCPClient::OnBoostAsioDataReceived(const boost::system::error_code& error, long int length)
{
    if(isDisposed) {
        return;
    }
   
   boost::shared_ptr<vector<unsigned char> > data;
   if (length > 0)
   {
      data = boost::shared_ptr<vector<unsigned char> >(new vector<unsigned char>());
      data->assign((unsigned char*)(boostTcpInputBuffer.data()), (unsigned char*)(boostTcpInputBuffer.data() + length));
   }
   else
   {
      data = boost::shared_ptr<vector<unsigned char> >(new vector<unsigned char>());
   }

   // Notify received data
   // Note that data length could be 0 when socket closure has been detected
   
   boost::lock_guard<boost::recursive_mutex> lock(mtxDisconnection);

   if (callbackTCPDataRead != NULL)
   {
      callbackTCPDataRead->Invoke(data);
   }

   boost::lock_guard<boost::recursive_mutex> unlock(mtxDisconnection);

   // Decrease counter of asynchronous read that are in progress
   if (counterAsyncReadOperationsInProgress > 0)
   {
      #if defined( WIN32 ) || defined( _WIN32 ) || defined( __WIN32__ ) || defined( __CYGWIN__ )
         BOOST_INTERLOCKED_DECREMENT(&counterAsyncReadOperationsInProgress);
      #else
         counterAsyncReadOperationsInProgress--;
      #endif
   }
}


To explain it all - we found that we had race conditions with boost reading from the socket and the closure of the socket. Unregistering the boost asio callbacks wouldn't remove them for any callbacks that may already be scheduled/in-flight. We noticed that there was a notion of "TCPClient::isDisposed" which was used in a few of the callbacks to determine whether to proceed. So we opted to use that the same way it's used in the other callback methods. This works for us now - however, when there is an official fix for this, please let us know so we can rollback our changes and use the official SFS2X C++ API fixes for this.

With these changes we can get a game to resume if we do a manual killConnection() test and the client can resume using the same physical network connection/connection details/IP address. However, our original intention was to build a mechanism that could deal with completely different network paths and I think we're very close but not quite there. Think of what happens when flipping between Wifi and 3G/4G on a mobile handset. So our sequence and what we're seeing as a response looks like this:
1. We determine that the mobile device's network connectivity has changed, for example wifi connectivity is lost or not responsive, and the mobile device has reverted to 3G/4G connection
2. When we get this event, we fire a SmartFox::killConnection() to trigger the events needed to do a connection retry and resume to flip over to new network path (note if we don't do this, the SFS2X API doesn't register a disconnection, but it completely stops responding)
3. We are able to receive the CONNECTION_RETRY event, and we can see that the connection is disconnected and reconnects successfully.
4. We are also able to see that a Handshake Request is sent to the SFS2X server and this contains paramters/flags indicating that this is a reconnection handshake
5. We are also able to see that the connection is live - we've previously turned on PING_PONG requests, and we can see those are still coming through on the new connection - this verifies that we are connected and connectivity is good.
6. The Handshake request never responds from the server - as such the reconnection flow stops at this point with the client with a valid/working socket connection, PING_POING requests coming back, but no resumption of the Session because the handshake response is not received.

Is the SFS2X server limiting handshakes/resuming sessions to originate from the same IP address that was previously connected to the session? Otherwise, it's not clear why we are not receiving responses to our Handshake requests at this point. Can someone help answer why this might be the case?

In the meantime, we're going to try the following:
1. Test to see if sending a handshake response without the "reconnection flag" and previous session id on the handshake request will get a response during a reconnect with a different physical connection. If this works - then it confirms our assumptions above about SFS2X server rejecting reconnection handshakes from physically different connections from the original connection.
2. We're also going to try to resume the connection by manually assigning the previous session id to the new connection on the C++ API side. This is probably a bad idea, but we're just going to try it to see if it can help reveal what else needs to be modified to get our use case working.
dhuang11
Posts: 38
Joined: 02 May 2014, 08:21

Re: C++ HRC Not Working

Postby dhuang11 » 03 Aug 2015, 07:52

Ok so in response to previous post and what we've tried, if we don't send the previous sessionToken in the handshake during a reconnection using a secondary/different line, then we are able to get a response to the handshake request we send.

So this confirms that SmartFox server keeps sessions keyed off the client's physical connection. I'll guess that it uses the client's IP address as some sort of key. If we reconnect from a different connection/different IP address during a connection retry/resume, the session isn't able to transfer over to the new connection and thus doesn't respond to a handshake request. If the handshake request does not contain a session token, everything works fine. Or if the handshake request contains a session token and the connection is the same as the one previously used to connect to SFS2X server, then everything resumes as normal now (after the changes we made to the SFS2X Client C++ API).

With the current version of SmartFox, we cannot support a function we need - that is resuming game play when a user is switched between wifi and 3G on their mobile handset. It looks like what we need is for a reconnection handshake that sends a sessionToken to update the connection (IP Address?) details for the related session on the server. Can we get a patch on the server to fix this? We're the owners of 2 SFS2X Unlimited CCU licenses and are in the process of procuring license #3,4 and 5 this month before our new product goes live.
User avatar
Lapo
Site Admin
Posts: 19988
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: C++ HRC Not Working

Postby Lapo » 03 Aug 2015, 08:19

Is the SFS2X server limiting handshakes/resuming sessions to originate from the same IP address that was previously connected to the session? Otherwise, it's not clear why we are not receiving responses to our Handshake requests at this point. Can someone help answer why this might be the case?

Actually no. There is no formal check to verify that a reconnection originate from the same IP address of its previous connection.

The problem with switching from WiFi to 3G "transparently" is that it may depend on multiple variables, such as the OS, the application runtime etc... Also the state of the TCP connection is not always deterministic. Timeouts and incomplete disconnections can create different behaviors.

if we don't send the previous sessionToken in the handshake during a reconnection using a secondary/different line, then we are able to get a response to the handshake request we send.

The previous sessionToken is necessary to reconnect as the previous Session. If you don't send it the server thinks it's a new connection and you loose the state associated with the previous user.

You wrote that after the killConnection() test you never get the CONNECTION_RESUME event on the client. Are you checkin the server side logs for possible errors?
Thanks

p.s. = we're going to check this as soon as possible, but it will be next week (after August 9) as our C++ developer is on vacation.
Lapo

--

gotoAndPlay()

...addicted to flash games
dhuang11
Posts: 38
Joined: 02 May 2014, 08:21

Re: C++ HRC Not Working

Postby dhuang11 » 03 Aug 2015, 09:55

Hi!

Thanks for the follow-up.

We can get CONNECTION_RESUME to fire now after making our changes to the C++ API (detailed in the previous post). Basically the C++ client needed to do a disconnect first and cleanup the previous underlying socket before retrying connection.

What lead me to believe that it could be keying off IP address for the session on the server is that if we do a killConnection() on the same connection, we can get everything to resume fine (this is after the changes we made to the C++ client). When we detect a Wifi to 3G event, we force a killConnection(). We can see that the client fully reconnects but never gets a response when sending the Handshake to the server. However, if we do the exact same thing, but remove the sessionToken from the Handshake request on reconnection, we get a Handshake response from the server.

So to summarize:
1. If retry/resume on same connection, retry connection succeeds with new C++ API client code changes we made
2. If retry/resume on different connection (3G instead of wifi), connection is established, but we don't get response to Handshake request
3. If retry/resume on different connection (3G instead of wifi), connection is established, we can get a Handshake response from the server if we don't send the reconnection sessionToken.

Because of the difference in behavior from #2 and #3, it seems the server is validating something with the sessionToken that's preventing it from sending a Handshake response. Note that scenario 1 now works fine.
User avatar
Lapo
Site Admin
Posts: 19988
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: C++ HRC Not Working

Postby Lapo » 12 Aug 2015, 15:13

Hi,
we have fixed issue #1 and #2 and are still investigating #3
If you want to download a patch see the attached files. It contains only the changes so you will need the original C++ API 1.6.0 files.

Let us know your feedback
Attachments
SFS2X_API_C++_patch.zip
(23.54 KiB) Downloaded 105 times
Lapo

--

gotoAndPlay()

...addicted to flash games
dhuang11
Posts: 38
Joined: 02 May 2014, 08:21

Re: C++ HRC Not Working

Postby dhuang11 » 13 Aug 2015, 03:01

Thanks. We'll give the patch a try and report back.
dhuang11
Posts: 38
Joined: 02 May 2014, 08:21

Re: C++ HRC Not Working

Postby dhuang11 » 17 Aug 2015, 08:41

Patch looks good.

Still waiting for feedback on the 3rd part of this issue.

Thanks.
User avatar
Lapo
Site Admin
Posts: 19988
Joined: 21 Mar 2005, 09:50
Location: Italy

Re: C++ HRC Not Working

Postby Lapo » 17 Aug 2015, 17:46

Thanks for your feedback.

We have investigated the 3rd part of the issue you have reported. Unfortunately this is not under our control and the specific case of network switching is unlikely to work with the reconnection system.

If you switch from wireless to mobile (or viceversa) the kill connection won't really kill the socket, you can verify this with a netstat and you will find that the previous connection using the old wifi IP address is still open.

So the server has never detected a disconnection and when the client tries a reconnection the server will complain about it.

The problem here is that calling killConnection() will not do what we would expect. A similar case happens if you connect, turn off the network, call killConnection and turn on the network again. What happens is that from client side the connection is closed, but from server it is not.

The TCP protocol was created 40 years ago and at that time mobile devices didn't exist... also every device uses a different TCP implementation, so it's difficult to predict how each TCP stack behaves.

It would be best to detect the network switch at the OS level, and start a brand new connection and login. Using the "forceLogout" option in the Zone you can make sure the new login will force the previous session out.

cheers
Lapo

--

gotoAndPlay()

...addicted to flash games

Return to “SFS2X C++ API”

Who is online

Users browsing this forum: No registered users and 1 guest