MaHuJa/Sandbox – User

From Bohemia Interactive Community
Jump to navigation Jump to search

Note: the following is a draft. Comments and corrections are welcome (discussion page, please) but keep in mind that this is not the finished text.


Armascript considered harmful

Cliched title, I know. Armascript is a name I invented to describe the scripting language used in the arma series of games, also known by its file extension sqf, to which the game owes most of its extensibility and power. In short, its power, already back in Operation Flashpoint (where this feature was revolutionary), is probably the biggest reason for the success of ArmA. I believe BIS are well aware of this.

ArmAscript, as per arma2, is a huge improvement over the original ofp script language (known by the file extension sqs). The language has certainly evolved since then; with each iteration (ofp,ofp:resistance,arma1,arma2) we've seen large improvements. However, it occurs to me that these improvements constitute evolution rather than design. Thus each improvement, while an improvement, does not always mesh well with the rest of the developments to the language.

It's my observation that the rate of armascripters also being professional programmers is unusually high. Statistically speaking I do not have a big enough sample size, so contributions in that field would be welcome.

One possible factor is that these programmers are drawn to a proper simulation like arma in the first place. (The community overall is also unusually tolerant of bugs.) I'm focusing on another: The language is distinctly unsuitable for new/non-professional programmers. Just look at how much the professionals are struggling.


Here's a few fundamental problems:

Forced multithreaded programming

We have to program as though we were making (pre-emptively) multithreaded programs. This is something that causes professional programmers no end of headaches, in better languages. The "top" programmers agree - multithreaded programming is one of the hardest things you can do. With the exception of those working with tools that do the job for them (and who have tasks those tools are suitable for).

A different thread can change things at any time. Worse yet, we are also not given proper synchronization primitives. The best we have is an intrusive hack:

code;
waituntil { critical_section_code; true};
code;

When I say intrusive, I mean that the code fragment needs to have that final "true" statement. This precludes abstracting it. Apparently the critical_section_code performs a lot better too when run in this fashion. Primarily because the 0.03ms delay isn't present, and also because it will benefit from hardware cache effects. The cost is probably mainly to the framerate, should it take too long.

If the critical section is too large for instant processing, or simply needs to work over time, you have to grab a lock by

code; waituntil { _l = lockname; lockname = true; _l==false}; critical_section_code; lockname=false; code;

Race Conditions

Many treatises on race conditions in MT programming speak of it actually failing as if it were a rare occurrence. ("It might work fine for several years")

In Armascript, it's not. It's common. And it breaks stuff arbitrarily. I've seen many script failures in armascript that are directly resulting from race conditions. One of the major reasons is that the majority of armascript programmers don't have the skills to get it right. Thus there's so much that will break when the race conditions don't go as prayed for, that there's going to be at least one such per game.

Distributed programming:

If there's one thing that's harder on programmers than (pre-emptive) multithreaded programming, it's distributed programming. And once a mission is loaded into multiplayer, the programmer better have it down right. But it gets worse; the "local/global" of many functions are not properly documented. Or depend on the context.

Again, we are provided no synchronization. As for making it ourselves, I've seen a good one: it comes down to putting a string in a variable, publicvariable-ing it (note that these have to happen atomically, see above) and having an EH on the server compile and run it. (Apparently publicvariable on code datatype is horrible on performance, presumably lag-wise. For very small blocks of code.)

Combining the above you can create a session-wide lock, as needed. The fact that you need to, though, speaks for itself. Alternatively, send a string (never code) to be compiled and run at the server, which is really just a way to de-distribute it.

Another issue again, is the locality of game objects, and how this can magically break code; especially combined with JIP. We have neither the means to override it, nor to detect where else it is. (And by the time our sent code arrives, it may have changed.)

Race conditions are an issue here too, but it's slightly less of an issue because it's so obvious to everyone.

Also, I would like to note how some commands, e.g. sidechat, are local-effect; because it was assumed, in the days of sqs and little design behind this, that the scripts would be running the same everywhere.

One more complaint to add here: Transmitting info to another computer cannot be done without broadcasting it everywhere. If there is some size to the data, that's going to multiply the traffic levels a lot.

Performance issues

There's one rule above all other rules when it comes to performance in armascript. Don't do it (yourself). Armascript is slow. Even for relatively simple mathematical tasks, it'll still be better to contort the game engine to do it for you. This is opposite to most environments where accessing external resources is usually the more expensive way to do something.

One recent example I found, which surprised the original programmer; if you want to know if some position is within a tilted rectangle, creating a trigger and a gamelogic, and for each check just move the gamelogic and check the list for that trigger, it's faster than doing the trigonometry by script.

And don't even think of using trigonometry to measure the distance between two coordinates; the distance command does this better. But wait, you need to create two objects (gamelogic preferred) and setposasl them, or the result will be wrong if that area of terrain isn't flat. (Well, unless they were world coordinates and that was exactly what you wanted.) Oh, and, another part of code wanted such info for another set of data? Too bad. This might not apply if you can stick the whole thing in a single statement. (Shall we invent the term "statement packing"?)

  • Todo: Investigate the impact of scripts on server cpu load; I have several reasons to believe it's unusually heavy.

The difficulty of getting things right

Bugs will occur in any program, in any language. However, the nature and rate of bugs will vary depending on the language. While there are clearly languages which are worse (see brainfuck), I find armascript ranks quite low. This is especially true given the complications I noted above. And the lack of debugging tools make this even worse. diag_log was no less than a breakthrough.

  • Non-linear syntax

Armascript is an operator-based language. This is even more obvious to anyone who used its sqs incarnation. It means that each command is in the form of

op B;
A op B;

This makes a lot of sense for -a, a+b, a=b, and so on. It's also (seemingly by coincidence) a fairly close match to the usual object-oriented notations, {player setpos somepos} instead of player.setpos(somepos) When a command needs more than two operands, that problem is solved by making an array out of B. It's almost decent, as workarounds go.

However, it's also being used where less appropriate. As a particular example, { ... ... ... } foreach whatever; In order to know the context of the block, you need to read below it. Then you can go back up and understand what it's for. Most languages are, and I dare say for a reason, made such that a human reading it from top to bottom can understand it. As a workaround, I usually copy the foreach to a comment on top of the block, but such comments tend to fall out of sync with their original.


No less-than-run code checking

The only way you can check your code, even for syntax, is to run it - and observe that it has the desired effects. Even the compile command will not catch (all) your syntax errors. On some errors, it has silently failed compilation; no errors, no logged errors, the stuff just hasn't been done. The actual error? A missing semicolon(;).

Squint is a good attempt, but falls short in several areas; some of which, due to the nature of armascript, it cannot fix. For example the surprisingly plentiful code that, for armascript reasons, must be contained in "strings" rather than {blocks}.

Is armascript, in its current condition, salvageable?

We can live with it as it is; but we will suffer the cost of doing so. (Fewer programmers, each working under bad conditions -> Less added value. AKA less fun for the rest of us.) How many projects have not been completed and released because its maker gave up on armascript? Those we know of are just the tip of the iceberg. At some point, I considered the idea of making a C++ compiler backend, that would accept a rather large subset of C++, and turn it into armascript. This was a viable idea, but I never got around to actually doing it. We also do not want to switch to an alternative that's just plain worse, even should it have prettier syntax. There's also the issue of how much work implementing it will be for BIS. (The parts they cannot leave to the community, that is.)

There are many minor annoyances with armascript, which can be fixed; the language can evolve further. However, there is a limit to how much you can do without breaking existing code, and the above list are fundamental issues, any fix to which WILL break code. And lots of it.

Thus, to fix it, we will need to deprecate it, such that it will only be used for backward compatibility. Similarly to how sqs was deprecated in favor of sqf.


What features do we need of a replacement?

We should consider every feature armascript has, look at WHY that feature is there, and compile a list of those requirements. Only when we have the most fundamental parts down, can we begin looking at what we would replace it with. Points that must be covered:

  • Multithreading: .03, atomicity/pre-emptive*,
  • Distributed
  • Events
  • Localization
  • The game engine does not need access to the variables contained; nothing has effect unless it does a command to the game engine.

Some features we should probably do away with:

  • Distributed parts can arbitrarily write into each others memory spaces.



Clean upgrade path

  • Must have a way to interact with sqf code. If it is not workaround-y in use, it can be a mechanism for forward use as well.
  • Make it viable to have a program do the (bulk) conversion of armascript to the new solution. Any "call compile string" probably means the function needs to be rebuilt by hand, though.


Schedule?

  • Whenever a new solution has been found, and implemented, it can be released any time. Sqf was added in operation flashpoint: resistance, which probably has some connection to the changes it introduced into the game engine. A different solution now may not need such facilities.
  • Full deprecation of armascript(sqf) should be done with a major release; no less than the release of a full expansion; think size and scope of operation arrowhead.
  • The next full release after that, Arma3, can then conceivably be the first version that does not support sqf.