Language Selection – Talk

From Bohemia Interactive Community
Jump to navigation Jump to search
m (→‎Local Code Pages mess: use unicode fro ArmA)
mNo edit summary
 
(One intermediate revision by the same user not shown)
Line 20: Line 20:


--[[User:Mikero|ook?]] 02:20, 21 July 2006 (CEST)
--[[User:Mikero|ook?]] 02:20, 21 July 2006 (CEST)
:: Whatever mechanism is used for the official korean ports, and the rumored unofficial Japanese and Chinese ports, are painful hacks at best. This goes back to not only this issue of codepages and the lack of multi-byte support, but also the nature of the fonts system. In OFP, the fonts are done as textures with an index file serving as a pseudo-codepage matching characters to texture map regions, which are then rendered. That's unfortunate, and means that the necessary accents and diacriticals may not be available for some European languages, and certainly not for unannounced languages. I recall some effort was made to reverse-engineer the font mapper file format, but I don't know of any significant efforts to backdoor more language support in. Ideally it would be nice if unicode strings could be rendered as-is, there's plenty of DirectX demos that show this in action. At any rate, this still doesn't resolve the unrelated RTL matters, as well as the related UI design considerations. [[User:Shinraiden|Shinraiden]] 04:04, 21 July 2006 (CEST)
:: '''Clarification:''' For reasons addressed above, you should ''edit'' and maintain your csv files in a unicode capable spreadsheet, but just make sure to export to text before game testing or release distribution. [[User:Shinraiden|Shinraiden]] 04:06, 21 July 2006 (CEST)

Latest revision as of 03:06, 21 July 2006

Local Code Pages mess

OFP uses ASCII formated text documents, and expects code to be provided as ASCII strings. For the primary western languages, the 'Latin' code-pages generally handle this appropiately. However, when you try to branch out your translations into non-Latin languages, that's where it breaks. Your stringtables files are actually stored as numbers representing the characters intended. Windows then matches the numbers read from the file to a look-up table (code page) to pick a character to send to OFP which then picks a texture portion to render as text. A lot of the non-Latin code pages sort their extended diacritical marks differently, leading to chaos.

An example of this would be attempting to write a stringtables file for english and czech on a machine using a US/UK codepage. If the czech characters look correct when Windows renders it using the western codepage, the czech characters will be scrambled if you attempt to read that same file on a machine using the czech codepage, and also in reverse. Note that this doesn't significantly impact the Latin languages of English, French, German, Spanish, and Italian, but it seriously impacts non-Latin languages such as the Scandinavian and especially the Slavic languages.

Furthermore, the limited character resolution means that there will be no practical solution for East Asian languages in OFP. Workarounds can be made by using resource pictures in place of textual overlays. Additionally, OFP supports right-justification in resources, but does not have proper RTL string handling capabilites.

For editing stringtables for the reasons mentioned above, it's strongly recommended that a Unicode spreadsheet application be used to build and edit the stringtables with final export to the text stringtable.csv files.

Shinraiden 06:23, 20 July 2006 (CEST)

Note - Korean version of OPF exists, it was published by Infogrames Korea. I have no idea if or where can this version be still obtained. --Suma 12:43, 20 July 2006 (CEST)

Making a bold, but fairly confident statement, this entire issue would be solved by using unicode text strings as already employed for some sections of Elite (SaveMeta, blah). Utf8 is fine, and notepad editor already supports unicode transparently. The 'user' is unaware of it when editing.
As for copdepages, I'm on shakier ground. The 'standard' for windows is winansi (codepage 1252), not a us/ansi derivate (850, etc). But, yes, the principle is right, the slavic codepage is 1258 from memory, not sure and it doesn't matter, it's different, as are central European (iso-8859-13). Utf8 would _mostly_ solve this issue as the lead bytes dictate what to use, while maintaining a fairly low memory profile of single byte characters. (not true for korean of course). Actually, I'm just rambling, but this is a talk page <grin>. The answer is unicode, because for windows at least, just about every font available (in windows), supports the full range of characters. I've played a few 'cyrillic' missions and the text, to say the least is 'interesting'. Be nice to sort this out once and for all.

--ook? 02:20, 21 July 2006 (CEST)

Whatever mechanism is used for the official korean ports, and the rumored unofficial Japanese and Chinese ports, are painful hacks at best. This goes back to not only this issue of codepages and the lack of multi-byte support, but also the nature of the fonts system. In OFP, the fonts are done as textures with an index file serving as a pseudo-codepage matching characters to texture map regions, which are then rendered. That's unfortunate, and means that the necessary accents and diacriticals may not be available for some European languages, and certainly not for unannounced languages. I recall some effort was made to reverse-engineer the font mapper file format, but I don't know of any significant efforts to backdoor more language support in. Ideally it would be nice if unicode strings could be rendered as-is, there's plenty of DirectX demos that show this in action. At any rate, this still doesn't resolve the unrelated RTL matters, as well as the related UI design considerations. Shinraiden 04:04, 21 July 2006 (CEST)
Clarification: For reasons addressed above, you should edit and maintain your csv files in a unicode capable spreadsheet, but just make sure to export to text before game testing or release distribution. Shinraiden 04:06, 21 July 2006 (CEST)