[sword-devel] Chinese "words"

Troy A. Griffitts sword-devel@crosswire.org
Fri, 27 Jun 2003 16:26:01 -0700


Frank,
	Thanks.  So when will you have gecko linked into an example app 
compiled with the free version of Borland C++Builder?

Here are the links to get the software:

http://www.forum.nokia.com/main/1,,030,00.html?fsrParam=1%2D3&fileID=2879

That link includes the nokia phone plugins, as well, so maybe you could 
get gecko to build for that platform, while you're at it.

Please also feel free to offer any useful suggestions you might have for 
us in the mean time.

	-Troy.

;)





YTang0648@aol.com wrote:
> In a message dated 27/06/2003 10:41:34 Pacific Daylight Time, 
> crenz-swordproject@web42.com writes:
> 
>     Sorry for being away for most of this month... am working my way
>     through 200+ sword-related e-mails and saw this one:
> 
>      >NEW CHINESE TEXTS:  It seems in our older Union texts, we added
>     spaces
>      >between every character to help with line wraps and word breaks. 
> 
> I think the right thing to do is to change your layout engine to support 
> correct Chinese line wrapping, instead of adding space (which should not 
> be there) to work around the limitation in the layout engine.
> 
>     Is
>      >this needed in the new NCV texts?  It seems they have spaces
>     included at
>      >certain places. 
> 
>     Chinese texts usually don't have spaces except after punctuation
>     marks. 
> 
> Neither have space after puncation. No space, period.
> 
>     I'll install NCV and take a look at the spaces it has.
> 
>      >I noticed this using the Hanzi dictionary which always
>      >tried to lookup a 'word' instead of an individual glyph.
> 
> Chinese do have the concept of "word". But that is very different from 
> the concept of the Latin word.
> First of all, space is not used to seperate words.
> Second, there are no easy way to parse a word.
> Third a word could be a single characters or composed by 2-6 characters.
> Forth, there are compound word so some times there are no easy way to 
> tell the boundary of a word even you are native Chinese.
>  
> google implement very good Chinese search. Maybe you should look at how 
> they do the search job.
> 
> 
>     I didn't do anything do make it lookup a 'word', in fact I don't know
>     how to make it lookup an individual glyph only ;-). It is often not
>     very useful to only look up one character (imagine looking up "foot"
>     and "ball" vs. looking up "football". The first lets you someone guess
>     the meaning, but the second gives the exact information). So it should
>     be possible to select a few characters and look them up in the
>     dictionary with the mouse or keyboard. However, for "standard lookup"
>     (ie. without text being selected) looking up the current character
>     only instead of the whole 'word' probably would be more useful, since
>     with most modules the 'word' is going to be the whole line.
> 
>     Greetings,
>        Christian
>