[sword-devel] Chinese "words"

Troy A. Griffitts sword-devel@crosswire.org
Fri, 27 Jun 2003 19:11:40 -0700


Hey Frank,
	Sorry for the sarcastic email :)  I hope you took it in light spirits. 
  The truth is, we've been talking about changing the 'layout engine' 
for quite some time.  We've been wanting to change from Microsoft's RTF 
RichEdit32 control to use gecko, but we've been quite frustrated getting 
help linking it into an app with Borland's C++Builder.  I've actually 
had it working with the ActiveX interface, but it didn't seem to give us 
any control over the DOM of the document.

	What I'd like is to be able to get a reference what Node is currently 
under the mouse position.  If we can use the ActiveX control for this, 
then I'd appreciate some help if you have any insights. Otherwise, I 
really would like a silly example app that just links to gecko and 
displays a basic "hello world" rendered text area, _using Borland 
C++Builder_.


	It would be a great contribution to our project, as this is one of the 
biggest tasks slated for a next release.

	Again, apologies for the sarcasm,
		-Troy.



YTang0648@aol.com wrote:
> In a message dated 27/06/2003 10:41:34 Pacific Daylight Time, 
> crenz-swordproject@web42.com writes:
> 
>     Sorry for being away for most of this month... am working my way
>     through 200+ sword-related e-mails and saw this one:
> 
>      >NEW CHINESE TEXTS:  It seems in our older Union texts, we added
>     spaces
>      >between every character to help with line wraps and word breaks. 
> 
> I think the right thing to do is to change your layout engine to support 
> correct Chinese line wrapping, instead of adding space (which should not 
> be there) to work around the limitation in the layout engine.
> 
>     Is
>      >this needed in the new NCV texts?  It seems they have spaces
>     included at
>      >certain places. 
> 
>     Chinese texts usually don't have spaces except after punctuation
>     marks. 
> 
> Neither have space after puncation. No space, period.
> 
>     I'll install NCV and take a look at the spaces it has.
> 
>      >I noticed this using the Hanzi dictionary which always
>      >tried to lookup a 'word' instead of an individual glyph.
> 
> Chinese do have the concept of "word". But that is very different from 
> the concept of the Latin word.
> First of all, space is not used to seperate words.
> Second, there are no easy way to parse a word.
> Third a word could be a single characters or composed by 2-6 characters.
> Forth, there are compound word so some times there are no easy way to 
> tell the boundary of a word even you are native Chinese.
>  
> google implement very good Chinese search. Maybe you should look at how 
> they do the search job.
> 
> 
>     I didn't do anything do make it lookup a 'word', in fact I don't know
>     how to make it lookup an individual glyph only ;-). It is often not
>     very useful to only look up one character (imagine looking up "foot"
>     and "ball" vs. looking up "football". The first lets you someone guess
>     the meaning, but the second gives the exact information). So it should
>     be possible to select a few characters and look them up in the
>     dictionary with the mouse or keyboard. However, for "standard lookup"
>     (ie. without text being selected) looking up the current character
>     only instead of the whole 'word' probably would be more useful, since
>     with most modules the 'word' is going to be the whole line.
> 
>     Greetings,
>        Christian
>