Language Selection – Talk

Local Code Pages mess

OFP uses ASCII formated text documents, and expects code to be provided as ASCII strings. For the primary western languages, the 'Latin' code-pages generally handle this appropiately. However, when you try to branch out your translations into non-Latin languages, that's where it breaks. Your stringtables files are actually stored as numbers representing the characters intended. Windows then matches the numbers read from the file to a look-up table (code page) to pick a character to send to OFP which then picks a texture portion to render as text. A lot of the non-Latin code pages sort their extended diacritical marks differently, leading to chaos.

An example of this would be attempting to write a stringtables file for english and czech on a machine using a US/UK codepage. If the czech characters look correct when Windows renders it using the western codepage, the czech characters will be scrambled if you attempt to read that same file on a machine using the czech codepage, and also in reverse. Note that this doesn't significantly impact the Latin languages of English, French, German, Spanish, and Italian, but it seriously impacts non-Latin languages such as the Scandinavian and especially the Slavic languages.

Furthermore, the limited character resolution means that there will be no practical solution for East Asian languages in OFP. Workarounds can be made by using resource pictures in place of textual overlays. Additionally, OFP supports right-justification in resources, but does not have proper RTL string handling capabilites.

For editing stringtables for the reasons mentioned above, it's strongly recommended that a Unicode spreadsheet application be used to build and edit the stringtables with final export to the text stringtable.csv files.

Shinraiden 06:23, 20 July 2006 (CEST)

Note - Korean version of OPF exists, it was published by Infogrames Korea. I have no idea if or where can this version be still obtained. --Suma 12:43, 20 July 2006 (CEST)

Making a bold, but fairly confident statement, this entire issue would be solved by using unicode text strings as already employed for some sections of Elite (SaveMeta, blah). Utf8 is fine, and notepad editor already supports unicode transparently. The 'user' is unaware of it when editing.

As for copdepages, I'm on shakier ground. The 'standard' for windows is winansi (codepage 1252), not a us/ansi derivate (850, etc). But, yes, the principle is right, the slavic codepage is 1258 from memory, not sure and it doesn't matter, it's different, as are central European (iso-8859-13). Utf8 would _mostly_ solve this issue as the lead bytes dictate what to use, while maintaining a fairly low memory profile of single byte characters. (not true for korean of course). Actually, I'm just rambling, but this is a talk page <grin>. The answer is unicode, because for windows at least, just about every font available (in windows), supports the full range of characters. I've played a few 'cyrillic' missions and the text, to say the least is 'interesting'. Be nice to sort this out once and for all.

--ook? 02:20, 21 July 2006 (CEST)