Code Optimisation: Difference between revisions
| Line 457: | Line 457: | ||
| // 0.0040 ms</code> | // 0.0040 ms</code> | ||
| is slower than  | is slower than | ||
| <code>private _a = 1; private _b = 2; private _c = 3; private _d = 4;   | <code>private _a = 1; private _b = 2; private _c = 3; private _d = 4;   | ||
Revision as of 01:36, 1 May 2018
Make it work.
"Premature optimization is the root of all evil."
Donald Knuth
No need to worry about making it work at light speed if it doesn't even do what it is supposed to. Focus on getting a working product first.
Make it fast.
Optimisation is everything when running lots of instances, with low delays. However, there is such thing as premature optimisation. Also, avoid excessive cleverness.
"Excessive cleverness is doing something in a really clever way when actually you could have done it in a much more straightforward but slightly less optimal manner. You've probably seen examples of people who construct amazing chains of macros (in C) or bizarre overloading patterns (in C++) which work fine but which you look at an go "wtf"? EC is a variation of premature-optimisation. It's also an act of hubris - programmers doing things because they want to show how clever they are rather than getting the job done." - sbsmac
Written it twice? Put it in a function
Pre-compilation by the game engine can save up 20x the amount of time processing, even if the initial time is slightly lengthened. If you've written it twice, or if there is a kind of loop consistently being compiled (perhaps a script run by execVM), make it into a function (FUNCVAR =compile preprocessfilelinenumbers "filename.sqf");
Preprocessfilelinenumbers
The preprocessFileLineNumbers command remembers what it has done, so loading a file once will load it into memory, therefore if wanted to refrain from using global variables for example, but wanted a function precompiled, but not saved, you could simply use:
call compile preprocessfilelinenumbers "file"
Remembering the only loss of performance will be the compile time of the string returned and then the call of the code itself.
Length
If any script or function is longer than around 200-300 lines, then perhaps (not true in all cases by all means) you may need to rethink the structure of the script itself, and whether it is all within scope of the functionality required, and if you could do something cleaner, faster and better.
Fewer statements => faster code
This may sound too obvious, but... optimise the code by removing redundant statements. The following code examples do the same thing, but the latter is 1.5 times faster:
_arr = [1,2];
_one = _arr select 0;
_two = _arr select 1;
_three = _one + _two;
_arr = [1,2];
_three  = (_arr select 0) + (_arr select 1);
NOTE: "fewer statements" refers to fewer statements that the engine needs to execute. NOT to confuse with: "fewer statements to write".
Variable Names
Scripts and functions that use long variable names will run more slowly than those with short names. Using cryptically short variable names is not recommended without explanatory comments.
_pN = "John Smith"; //this line executes in half the time of the line below
_playerNameBecauseThePlayerIsImportantAndWeNeedToKnowWhoTheyAreAllTheTimeEspeciallyInsideThisImpressiveFunction = "John Smith";
Conditions
if (_group knowsAbout vehicle _object > 0 && alive _object && canMove _object && count magazines _object > 0) then {
	//custom code
};
You may expect the engine to stop reading the condition after the group has no knowledge about the object but that's false. The engine will continue evaluating the condition until the end even if any of the previous conditions evaluated false.
if (_group knowsAbout vehicle _object > 0) then {
      if (alive _object && canMove _object && count magazines _object > 0) then {
            //custom code
      };
};
Now the engine will only continue reading the condition after the group has some knowledge about the object. Alternatively you can use lazy evaluation syntax. If normal evaluation syntax is (bool1 .. bool2 .. bool3 .. ...), lazy evaluation syntax is (bool1 .. {bool2} .. {bool3} .. ...). Now let's look at the above example using lazy evaluation:
if (_group knowsAbout _vehicle object > 0 && {alive _object} && {canMove _object} && {count magazines _object > 0}) then {
            //custom code
};
Using lazy evaluation is not always the best way as it could speed up the code as well as slow it down, depending on the current condition being evaluated:
['true || {false} || {false}'] call BIS_fnc_codePerformance; //fastest
['true || false || false'] call BIS_fnc_codePerformance; //normal
['false || false || false'] call BIS_fnc_codePerformance; //same as above
['false || {false} || {false}'] call BIS_fnc_codePerformance; //slowest
isNil
isNil String is quite a bit faster than isNil Code
var = 123; 
isNil "var";
// is faster than
isNil {var};
Make it pretty.
Documentation, readability, and all that jazz. Clean code is good code.
If Else If Else If Else ...
If you can't escape this using a switch control structure, then try and rethink the functionality. Especially if only one option is needed to match.
On the other hand switch is slower than if then else. To keep tidiness of the switch and speed of if, use if exitWith combined with call:
call {
	if (cond1) exitWith {/*code 1*/};
	if (cond2) exitWith {/*code 2*/};
	if (cond3) exitWith {/*code 3*/};
	//default code
};
if () then {} 
 is faster than 
 if () exitWith {} 
 is faster than 
 if () then {} else {} 
 or 
 if () then [{},{}]
However there is no noticeable difference in speed in the following:
_a = 0; if (true) then {_a = 1};
_a = if (true) then [{1},{0}];
_a = if (true) then {1} else {0};
Constants
Using a hard coded constant more than once? Use preprocessor directives rather than storing it in memory or cluttering your code with numbers. Such as:
a = _x + 1.053;
b = _y + 1.053;
And
_buffer = 1.053;
a = _x + _buffer;
b = _y + _buffer;
Becomes:
#define BUFFER 1.053
_a = _x + BUFFER;
_b = _y + BUFFER;
This also allows quick modifying of code; with the obvious loss of dynamics, but in that case it isn't a constant is it.
Loops
These first two loop types are identical in speed (+/- 10%), and are more than 3x as fast the proceeding two loop types.
- for "_y" from # to # step # do { ... };
- { ... } foreach [ ... ];
Where as these two loops are much slower, and for maximum performance, avoided.
- while { expression } do { code };
- for [{ ... },{ ... },{ ... }] do { ... }
Waituntil can be used when you want something to only run once per frame, which can be handy for limiting scripts that may be resource heavy.
- waitUntil {expression};
As requested, the method to gain this information was via the CBA_fnc_benchmarkFunction, using around 10,000 iterations. It was not tested across different stations, and *may* be subject to change between them (ArmA2 is special remember :P):
fA = {
	private "_i";
	_i = 0;
	while {_i < 1000} do {
		_i = _i + 1;
		private "_t";
		_t = "0";
	};
};
fB = {
	for "_i" from 0 to 1000 do {
		private "_t";
		_t = "0";
	};
};
This code then performs 10,000 tests and returns average time taken for the function, measured via diag_ticktime.
[fA,[],10000] call CBA_fnc_benchmarkFunction;
[fB,[],10000] call CBA_fnc_benchmarkFunction;
10,000 Iterations Limit in Loops
A while do loop will be limited to 10,000 iteration in non-scheduled environment. In scheduled environment such limit does not apply.
Threads
The game runs in a scheduled environment, and there are two ways you can run your code. Scheduled and non scheduled.
Depending on where the scope originates, determines how the code is executed. Scheduled code is subject to delays between reading the script across the engine, and execution times can depend on the load on the system at the time.
Some basic examples:
- Triggers are inside what we call the 'non-scheduled' environment;
- All pre-init code executions are without scheduling;
- FSM conditions are without scheduling;
- Event handlers (on units and in GUI) are without scheduling;
- Sqf code which called from sqs-code are without scheduling.
The 3ms run time
A scheduled script can only run for maximum 3ms per frame before it is put in suspension to be resumed on the next frame or even later. For more information on that see Scheduler.
When am I creating new threads?
Using the spawn/execVM/exec commands are creating small entries within the script scheduler, and as the scheduler works through each one individually, the delay between returning to the start of the schedule to proceed to the next line of your code can be very high (in high load situations, delays of up to a minute can be experienced!).
Obviously this problem is only an issue when your instances are lasting for longer than their execution time, ie spawned loops with sleeps that never end, or last a long time.
Avoid O(n^2)!!
Commonly you may set up foreach foreach's. 'For' example:
{
	{ ...} foreach [0,0,0]; 
} foreach [0,0,0];
This example is of the order (n^2) (3^2 = 9 iterations). For arrays that are twice as big, you will run 4 times slower, and for arrays that are 3 times as big you will run 9 times slower! Of course, you don't always have a choice, and if one (or both) of the arrays is guaranteed to be small it's not really as big of a deal.
Deprecated/Slow Commands
Adding elements to an array
- pushBack was added in ARMA3 1.26 and is currently the fastest command to push an element into an array, as of 1.29 it will also return the index of the element. Quick tests shows it's around 2x faster than the below method, set. Not to mention it is also easier to read.
_a pushBack _v
- set is around 2x faster than binary addition
_a set [count _a,_v]
Instead of:
_a = _a + [_v]
Removing elements from an array
deleteAt - Removes array element at the given index and returns removed element (modifies the original array, just like resize or set)
_array = [1,2,3]
_array deleteAt 1;
systemChat str _array; // -> [1,3]
Faster than...
When FIFO removing elements from an array, the set removal method works best, even if it makes a copy of the new array.
ARRAYX set [0, objnull];
ARRAYX = ARRAYX - [objnull];
Combining arrays
- When adding an array to an existing array variable, append is fastest
arr1 = [1,2,3,4,5,6,7,8,9,0]; arr2 = arr1; arr1 append arr2;
//0.015 ms
arr1 = [1,2,3,4,5,6,7,8,9,0]; arr2 = arr1; arr1 + arr2;
//0.016 ms (Arma 3 after optimisation)
append modifies existing array while "+" produces a copy, hence a little bit slower. 
- When not saving the array to a variable, use +.
([veh1] + _array2) call BIS_fnc_setPitchBank
//0.004 ms
_array1 = [veh1];
_array1 append _array2;
_array1 call BIS_fnc_setPitchBank
//0.0054 ms
Comparing arrays
To compare arrays prior Arma 3, use the following function:
KK_fnc_arraysAreEqual = {str (_this select 0) in [str (_this select 1)]};
Example:
hint str ([[1,2,[3]], [1,2,[3]]] call KK_fnc_arraysAreEqual); //true
In Arma 3 use isEqualTo command.
Comparing values by type
"a" isEqualType 0
//0.0009 ms
Is much faster than
typeName "a" == typeName 0
//0.0032 ms
Checking if array is []
Traditional (count _arr == 0) is pretty fast, but direct comparison with new comparison command is a little faster: (_arr isEqualTo [])
count _arr == 0
// 0.0014 ms
_arr isEqualTo []
// 0.0013 ms
Position World is the fastest
getPosASL, getPosATL and visiblePositionASL are  faster than getPos, position and visiblePosition. But new to Arma 3 command getPosWorld is the fastest  of them all.
getPosWorld player
//0.0014 ms
visiblePositionASL player
//0.0014 ms
visiblePosition player
//0.0048 ms
getPos player
//or
position player
//0.005 ms
Config path delimiter
>> is slightly faster than / when used in config path with configFile or missionConfigFile, i.e.
configFile >> "CfgVehicles"
//0.0019 ms
is faster than
configFile/"CfgVehicles"
//0.0023 ms
Reusing configs
The delimiter >> (or /) is just a script command with 2 arguments arg1 >> arg2, so when one constructs config path, he just chains several commands so that the result of one command becomes argument for another. Therefore when repeated request to config is required, it makes sense to store the closest Config result in a variable for performance.
_cfgCar = configFile >> "CfgVehicles" >> "Car";
_access = _cfgCar >> "access";
_type = _cfgCar >> "type";
....
nearEntities vs nearestObjects
- nearEntities is much faster than nearestObjects given on range and amount of object(s) which are within the given range.
If a range was set to more thean 100 meters it is highly recommend to use nearEntities instead of nearestObjects.
Note: nearEntities only searches for objects which are alive. Killed units, destroyed vehicles, static objects and buildings will be ignored by the nearEntities command.
forEach vs count
- Both commands will step through supplied array of elements one by one and both commands will contain reference to current element in _x variable. However, count loop is a little faster than forEach loop, but it does not have _forEachIndex variable and the code inside count expects Boolean or Nothing while it returns Number.
{diag_log _x} count [1,2,3,4,5,6,7,8,9];
//is faster than
{diag_log _x} forEach [1,2,3,4,5,6,7,8,9];
_someoneIsNear = {_x distance [0,0,0] < 1000} count allUnits > 0;
//is still faster than
_someoneIsNear = {
	if (_x distance [0,0,0] < 1000) exitWith {true};
	false
} forEach allUnits;
Filtering array with select {}
If you want to filter an array, you can loop trough it with forEach/Count and use If (condition) inside OR use select {}.
In this example,the testArray is filled with numbers from 0 to 1000 and in this our filter condition is to be an even number.
result = []; 
{ 
	if (_x % 2 == 0) then 
	{ 
		result pushBack _x; 
	}; 
} forEach testArray;
//2.57 ms
result = (testArray select {_x % 2 == 0});
//1.55 ms
So if you would like - for example - add these even numbers up:
result = 0; 
{ 
	if (_x % 2 == 0) then 
	{ 
		result = result + _x; 
	}; 
} forEach testArray;
//2.79 ms
result = 0; 
{ 
	result = result + _x; 
} forEach (testArray select {_x % 2 == 0});
//2.44 ms
Filtering your base array with select {} will be faster.
for [] vs for "_i"
- One may think that "for [] do" is faster than "for "_i" from 1 to 10 step 1 do" because less keywords are used.
But the opposite is true.
for [{_x= 1},{_x <= 10},{_x = _x + 1}] do {true};
//0.0532 ms
for "_i" from 1 to 10 step 1 do {true};
//0.015 ms
format vs +
- when adding more than two strings, format is faster than +.
Adding 3 strings:
a = format ["Hi, my name is %1%2","bob, what's yours","?"]
//0.004 ms
a =  "Hi, my name is " + "bob, what's yours" + "?"
//0.0043 ms
Adding 2 strings:
a = format ["Hi, my name is %1","bob, what's yours?"]
//0.0038 ms
a =  "Hi, my name is " + "bob, what's yours?"
//0.0035 ms
Adding large strings together
For small strings a = a + b works fine, however the bigger the string gets the slower this becomes:
s = ""; for "_i" from 1 to 10000 do {s = s + "123"}; //30000 chars @ 290ms
The solution is to use array to make string and then convert array to string:
s = []; for "_i" from 1 to 10000 do {s pushBack "123"}; s = s joinString ""; //30000 chars @ 30ms
select vs if
a = "You're " + (["a loser","awesome!"] select true)
//0.0046 ms
a = "You're " + (if true then [{"awesome!"},{"a loser"}])
//0.0054 ms
Checking if unit is on foot
isNull objectParent player
//0.0013 ms
is a little faster than traditional
vehicle player == player
//0.0022 ms
createVehicle(Local)
createVehicle(Local) position is not exact so you must use setPos but this is very slow, to create the object on [0,0,0] and then set the position is faster.
_obj = 'Land_Stone_4m_F' createVehicle [0,0,0]; //also createVehicleLocal
_obj setPos (getPos player); //0,03ms (100 testcycles)
is 200 times faster than...
_obj = 'Land_Stone_4m_F' createVehicle (getPos player); //also createVehicleLocal
_obj setPos (getPos player); //5,9ms (100 testcycles)
createSimpleObject vs createVehicle
createSimpleObject is over 43x faster than createVehicle!
createVehicle ["Land_VR_Shape_01_cube_1m_F",[0,0,0],[],0,"none"];// ~3.5 ms
createSimpleObject ["a3\structures_f_mark\vr\shapes\vr_shape_01_cube_1m_f.p3d",[0,0,0]];// ~0.08 ms
private ["_var"] vs private _var
private ["_a", "_b", "_c", "_d"];
_a = 1; _b = 2; _c = 3; _d = 4; 
// 0.0040 ms
is slower than
private _a = 1; private _b = 2; private _c = 3; private _d = 4; 
// 0.0023 ms
However,
private ["_a", "_b", "_c", "_d"];
for "_i" from 1 to 100 do
{
	_a = 1; _b = 2; _c = 3; _d = 4;
};
// 0.146327 ms
is usually faster than
for "_i" from 1 to 100 do
{
	private _a = 1; private _b = 2; private _c = 3; private _d = 4;
};
// 0.186776 ms
Resolve any script errors
If command is throwing an error because of incorrect or illegal input, it will write this into .rpt file regardless of whether or not -showScriptErrors is enabled. Many mission makers choose to disable onscreen errors, however this may degrade game performance significantly if errors are not dealt with. Compare the following:
systemChat "123"; // execution time ~0.00271ms
systemChat 123; // obvious type error, execution time ~0.172206ms, 63 times slower!
How to test and gain this information yourself?
There is a few ways to measure the information and run time durations inside ArmA2, mostly using differencing of the time itself. The CBA package includes a function for you to test yourself, however if you are remaining addon free or cannot use this, the following code setup is as effective; and allows different ways to retrieve the information (chat text, rpt file, clipboard)
_fnc_dump = {
	player globalchat str _this;
	diag_log str _this;
	//copytoclipboard str _this;
};
_t1 = diag_tickTime;
// ... code to test
(diag_tickTime - _t1) call _fnc_dump;
In ArmA 3 you can simply use in-built library function BIS_fnc_codePerformance, now integrated into the debug console as the speedometer button.
