[sword-devel] derivation of the Chinese module

Matthew Patenaude mnglfiddle at gmail.com
Mon Jul 25 06:04:10 MST 2011

In reply to the following:

>Thanks Chris,

>You asked - Has any "expert Chinese reader" actually noted any problem?

>Last October, I had some discussions with a Chinese friend who had noted
>numerous issues in our previous ChiUns module. I'm trying to re-establish
>contact with him.  If he doesn't respond soon, I will discuss this topic
>with some of my other Chinese friends.

>In theory at least, there may be issues with mechanical conversion,
>particularly if this is done only at the individual codepoint level. The
>need for "one to many" replacements, as well as to go to higher levels than
>individual characters could have serious consequences, including some
>instances where the meaning is changed.

>Further to this, as the module has Strongs and other markup, this may mean
>that doing the conversion on the source text (OSIS) will cause the semantic
>context to be obscured by the interspersing of XML elements.

>Thanks for the lead about the built in tool in Mac OS X. While I was
>Googling, I came across  http://openvanilla.org/  http://openvanilla.org/

>"OpenVanilla is a collection of popular Chinese, Taiwanese, Japanese and
>symbol input methods and language tools. It is also easy to customize or
>create your own input methods with OpenVanilla's flexible design. Available
>on Mac OS X and Windows."

>Evidently, the Mac OS X Chinese text converter is based on this.
>I'm also pursuing some other lines of research (another conversion tool),
>though not yhet ready to report on the outcome.

>Finally, what MediaWiki uses for this task is freely available as a PHP
>script, though I have not looked further than this page yet.


To which module is this discussion referring, exactly? I would be happy to
look up some passages and see if I notice any particular errors with the
conversion. However, I do know that any mechanical conversion should be
checked over carefully by a native expert. The Chinese use automated tools,
but they will also tell you that those tools leave a lot of things that have
to be corrected by hand. It is not a simple one-to-one conversion, since
sometimes more than one character was simplified to the same simplified
character. The conversion from simplified to complex is always a touch more
problematic, but the other direction also frequently leaves problems.

