First off, I wanted to say that my answer wasnt meant as a rant against you. Just the whole TCP vs UDP thing
Sorry if it came over like that.
Secondly, I'm the worst person to say how to do a FPS specifically, as I havent done one. My SFS games are almost all without exception turn based strategy/tactics games - if you dont count the "Unity Island Demo Chat MMO" that is.
That doesnt mean I dont have an idea on how to do them, as in my earlier years I've spend 2 years of my life trying to make a flight simulator dogfight game using Torque and its internal networking. Can you say physics over network? FAIL.....
So I can translate some things from that experience to you, and hope some other people will chime in. Also this topic is rather non-Unity specific, so you might have luck in the regular parts of this forum too. No matter what kind of client you use (Flash, Unity, obj-c, ...) you will have to design the networking part to fit your specific game.
So thats the first thing - no 2 games are similar in requirements, and there is not a single only way to do it.
OK - and let me get this one out of the way: NEVER do physics over network
You can run physics on the server and send the definitive result to the clients - or you can run it on the clients. But trying to sync 2 async running physics simulators on client AND server is doomed to fail. It is almost impossible to force a client simulation to interpolate or backstep. Additionally the timing is always different too, as you have the transmission time. And the simulations are hefty users of floats - and those get truncated over a network - and that will for certain drift the client and server. These differences will accumulate over the time the simulation runs (like in a flight sim.......)
Anyways - FPS dont usually have a lot of physics in their core, but use them for funneh effects like barrels, crates and such. Here you'll have to ask yourself how important it is for pixel perfect sync between clients. Does it actually matter if a crate flies exactly at the same time on all clients on exactly the same trajectory? If not, then you could simply have the server send a single "do explode crate" command out to all clients, and they run the animation.
This goes for almost all other single shot commands too - like shooting a missile launcher or whatever.
If I were to do a FPS, I would look at the network design such that there are 3 fundamental types of communication:
1) positional sync that happens all the time (and thus has to be optimized)
2) server initiated single shot messages (example: results of other player actions or the world simulation like an exploding crate)
3) client initiated single shot messages (e.g. firing a missile launcher)
I'm always worried about security, and thus the rule of thumb is: dont trust clients!! So this means that the server runs the definite world simulation, and the clients have to request permission from the server to do anything
In the case 1) and 2), the client simply HAS to do what the server says. Period.
So in 1) if the client gets a different transform send for another player, then it has to figure out how to move it into that transform in a visual pleasant way. Thats where interpolation and prediction comes in. If you look at the Island chat demo, this is already being taken care of on the interpolation side. The server sends the transforms of all other clients (in a very innefficient way) and the client code then interpolates the server position with the position the object has in the client side.
The prediction side is missing in the demo. Prediction is a matter of "for the next frames - until I get the real position from the server, I will try to render the client avatar where I think he will be right now". Remember that the position you get from the server is always delayed compared to the other client. Timing wise you have
remote client sends position changes to server -> network delay -> server calculates and verifies positional change and applies it to the world sim -> sends out the client positions once in a while to all clients -> network delay -> you get the package with the position onto your client
This is totally OK again for a chat program, where it doesnt matter where the client was 50-60 ms before. We simply render the position a little behind.
But in a FPS where you have fast shooting machine guns, you definitely need to render the remote client "as close as possible" to the position where you predict it to be right now (trying to cut out the server and network delay). A simple algoritm here would be to simply take the last position and velocity vector of the remote avatar and project that using the simulation time. If it turned out to be in the middle of a complex move by the real remote client, then this will be a tiny way off the truth. But most likely its a good guess, and in any case the interpolation will take care of moving the remote avatar into the real position. At the cost of some jittering, but its still better than not predicting at all.
In a SFS frame and still being at 1), you can use 2 core mechanics - you can attach some user variables to a user object holding the transform. SFS will then sync these for you. This might seem to be the easy way out - but its very inefficient. The build in commands are all XML based (without optimization choices). The other way to do this involves more work from you, but opens up for optimizations. That is to send extension messages. Here you can use the most efficient protocol - and you can even apply server side compression together with client side decompression of the transform itself.
Server side you can also apply different optimizations. E.g. in a FPS, you would typically only care for high precision (looking at it from a single client) for enemy avatars very close to you and/or in view. So the server can use this for optimizing not sending out the entire amount of all client transforms to all clients for every "tick". So you would code up an extension on the server to filter who gets what this tick - e.g. 2 clients that are in separate sides of the world might not get send transforms at all for each other. Units "just around the corner" will get them for every 10 ticks. And 2 clients that are very close to each other get updated as much as possible.
I hope you get the picture.
There is a whole side that I havent discussed (and wont). That is how the server gets positional input changes from a client. E.g. can the server trust the client enough that the client simply sends it current transform to the server as "the truth"? Or do you send "I would like to turn 90 degrees. May I?" messages. Tricky tricky.
Back to 2). Single shot messages from a server. Simply send them as xt messages - very easy. Nothing to really think about here (might want to use the most efficient protocol, but still easy) ). So when some client decided to fire his missile launcher, the avatar in your own client simply needs to start the firing animation too based on the server telling you to.
3) is slightly more complex. In general you dont trust your client. So when the client presses the "fire missile launcher" button, then what happens?
I would go down the route of:
Player presses button -> client starts a small animation of the missile launcher "warming up" -> client sends at the same time a request to the server that I would like permission to fire missile launcher -> server verifies that the given client has a ML and ammo and sends back an OK to fire missile -> the client finishes its "warm up" animation and runs the actual firing animation -> at the same time the server has applied the missile launcher firing to its world sim, and will send out this update to all clients the next tick
The server then updates the position of the missile in thew world sim, and keeps relevant clients updated with the position of the missile (maybe using physics). And if the missile hits anything, it is simply a matter of 2) again - server tells everyone that missile hit tank and tank exploded. So that clients can render the explosiuon animations.
Hmmmmm - I think my morning coffee is now expended in my veins, and I'll stop here
As you noticed, almost nothing in the above is neither Unity nor SFS related, as this is a general design/architecture decision that applies to almost all FPS and MMO games in general. So I'm sure you can find tons of information online or in game design books.
Hope it helps