Muzzleflash – User

From Bohemia Interactive Community

We Need to Talk About Concurrency

In Arma, multiple scripts can be active at the same time, such as those created by for example spawn and execVM. Sometimes it can be very useful to split our application logic across multiple scripts that each handle part our scripting logic. For example, maybe we need a script to manage some special AI groups to guide them or track their state. Or maybe there is some complicated condition we need to monitor that triggers would be too cumbersome for. Having all these disparate bundles of logic inside a single script would complicate it too much, so instead we logically place them in different scripts.

However, at any given time, Arma only runs one script at a time, or said in another way, scripts do not run in parallel. There is no amount of computational work you can make happen faster in Arma by splitting work to be handled by multiple spawned scripts. Scheduled scripts do; however, run concurrently, since scripts don’t necessarily run to termination in ‘one go’, and may suspend various places, for example when using the sleep or waitUntil commands. But they can also be suspended because the scheduler has decided to suspend the script, perhaps because it is taking too long and the scheduler don’t want risk freezing the game and dropping the fps. By suspending the script, the scheduler can run other code, or stop and let the engine complete the game frame, and it will resume the the script in a later frame.

We’ve already discussed how Arma has scheduled and unscheduled environments, and that in unscheduled environments scripts cannot wait, sleep or otherwise temporarily suspend themselves, so unscheduled scripts are immune from being suspended. Trying to suspend by for example sleep command inside an unscheduled environment will fail. But unscheduled scripts may slow the game down since the game will effectively freeze while an unscheduled script is running. You can check whether a script may suspend with the canSuspend command, but usually we will know the environment the script runs in.

Race Conditions

So why is this difference between scheduled and unscheduled environments so important? It is not so much about the environment themselves but about the scheduler suspending our code and how that affects us if multiple scripts are running concurrently. Normally when reading a script you would expect things happen in order, typically left-to-right, top-to-bottom. But when scripts can be suspended - at any time - this complicates it. Try running lst-race-condition, in the debug editor inside Arma and set a watch field on Counter.

Race Condition Example

Counter = 0;
private _script = {
    for "_i" from 1 to 100000 do {
        private _a = Counter;
        _a = _a + 1;
        Counter = _a;
    };
};
[] spawn _script;
[] spawn _script;

We start two scripts that each read the counter into a local variable, increments the local variable, and write to the global variable again. If there was no interference we would expect the result 200000 every time. Sometimes you may get lucky and actually get the result 200000 in the watch field for Counter, but most of the time I certainly did not. The reason is that as mentioned above the scheduler may suspend our script at any time. Let us label the two executions of the script as S1 and S2. Let say we are at an iteration of the loop and S1 reads the Counter value into _a and suppose it was 2026 and increments _a to 2027. Now the scheduler suspends script S1 and instead switches to S2 running identical code, and let’s say it is about to start an iteration. The script S2 might then also read 2026 into _a, increment it to 2027 write it back out to Counter. In fact, it might do this many times in a row. Suppose it keeps going and writes 2089 into Counter and is then suspended. Now the scheduler resumes S1, which if you recall, was done with either reading into _a, or just done incrementing _a. Either way, its _a is (or was) incremented to 2027, and it is now going to write that value to the global variable Counter. In this case we’ve now lost 2089 - 2027 = 62 increments that S2 made. We call such an unfortunate situation where multiple scripts are trying to change a piece of data, while simultaneously another script can modify the data for a race condition.

Now, you may think that was a silly way to increment a counter, we could just replace the body of the for-loop with Counter = Counter + 1;. And, yes in fact, you are right, that does prevent the issue here. At least in my experiments the scheduler will never suspend in the middle of a statement like that. But the entire loop is just a proxy for something that takes a ‘bit of time’, and the body a proxy for some complex operation that happens inside. We cannot always find a single statement that does all the changes that need to happen atomically.

Preventing Suspension

How can we fix it? Well one way was to never run it scheduled to begin with as we did with spawn and instead just call the code, assuming we originally were in an unscheduled environment. Another option is to (mis)use the isNil command. Normally used to check whether a variable or expression wrapped in code returns the value nil, it also forces the evaluation to be uninterruptible, as in lst-outer-atomic.

Outer isNil Example

Counter = 0;
private _script = {
    isNil {    
        for "_i" from 1 to 100000 do {
            private _a = Counter;
            _a = _a + 1;
            Counter = _a;
        };
    };
};
[] spawn _script;
[] spawn _script;

With the entire loop evaluation being wrapped in isNil the scheduler will not suspend the loop at any pointer. Only before or after the isNil command.

I stated earlier that the loop itself was a stand-in for any kind of ‘expensive’ piece of code. Since we are limited by how much time our code may spend in unscheduled environments we may want to actually stay in the scheduled environment. But how do we then ensure our update of the counter happens correctly (atomically)? We can wrap the actual statements that must happen together instead of wrapping the whole script body.

Inner isNil Example

Counter = 0;
private _script = {
    for "_i" from 1 to 100000 do {
        isNil {    
            private _a = Counter;
            _a = _a + 1;
            Counter = _a;
        };
    };
};
[] spawn _script;
[] spawn _script;

The result is equivalent to lst-outer-atomic, but how it is evaluated is not. Now, our script can still be suspended, but each inner update to Counter will never be interrupted.

Subject to the performance of your system, when running the above examples you may notice that lst-outer-atomic seems to ‘blink’ once wrt. updating the watch field, and may also be percieved as finishing faster than lst-inner-atomic. For the latter you may notice the watch field increment very quickly during evaluation. As noted earlier, unscheduled code cannot be interrupted, so with the outer isNil the entire loop (all iterations) must run to completion, and the rest of the game is actually frozen while it does so. For the other example with the inner isNil: while each counter update itself (basically a single iteration) also freezes the game, the loop itself can still suspend which is why the scripts governing the watch fields are being run by the scheduler intermixed with our scripts and thus able to update the fields more frequently.

Do we need isNil?

We have seen how isNil can be used to prevent scheduled scripts from interacting poorly. So you might think that obviously the solution is just to always wrap our (scheduled environment) scripts code with isNil then we can’t have race conditions or other undesirable interactions. And you are actually not wrong. If we have some code that is not ‘heavy’ - meaning it won’t freeze the game - then that is the simplest, safest, (and maybe tiny bit more performant) way to do it.

But often our code will already originate in an unscheduled environment and can’t be suspended any way, so we don’t need isNil. When does our script code run?

  • Our code might run as the handler code for events, also simply called event handlers.
  • Triggers might activate (or deactivate) and run the associated code fragments.
  • Any code we have spawned (or maybe execVM) will run occasionally until they terminate.
  • Special scripts like init.sqf, initServer.sqf, initPlayerLocal.sqf, etc..

So that gives four broad categories. The third one, code we have spawned, can only be spawned because we were already running code in some other context; otherwise how did we execute the spawn command? The special scripts usually run in scheduled environment (the ones listed do). But both event handlers and triggers run code in the unscheduled environment. So unless we decide to spawn off some code from our event handlers or trigger activations we do not need to use isNil at all, or be that careful. Such code will run to completion without being interrupted, so we do not need to worry about race conditions at all.

If you can, only run code in unscheduled environment!
“Is that not somewhat of an extreme position to take, you might ask?”
Yes, I don’t know, maybe. But this entire section about the problems and solutions to race conditions is completely irrelevant if you only run unscheduled code - which as you know can’t be suspended - it is always run to completion (or possible terminate early if there is an error in the code). Since it always run to completion it can’t race (as in have a race condition) with other unscheduled code. Most of the time we are waiting for something to happen in the game, when it does happen the engine raises an event, and our event handling code runs in an unscheduled environment, and when all event handlers have run, the engine goes back to doing engine-stuff again.

So let me revert the question: what code can’t run unscheduled? We already covered heavy code should be able to suspend to avoid freezing the game. By heavy code I don’t mean code that does createVehicle which itself is a heavy command, but there is little we can do about that. If we run somePos nearObjects ["CAManBase", 50000], which basically searches for humans in a 50km radius from somePos that engine command will run to completion and won’t suspend in the middle. And if we do that in a loop that runs for many iterations we are now doing ‘heavy work’. That will freeze the game in an unscheduled environment. For this particular example, such a script could instead do smaller nearObjects check inside a loop, which would enable the scheduler to suspend the script before it takes too much frame time.

But often it might not be heavy work, but either some monitoring script waiting for something for which there is no existing event, or we are maybe running a script that manages an AI group. And if such an scheduled script relies on some shared data that we cannot avoid being used from elsewhere like uninterruptible unscheduled code (perhaps an event handler), we might be back to a similar situation as in lst-race-condition again. Now you know the solution: we would have to guard the shared data with isNil in the scheduled code again to avoid race conditions.