[sword-devel] imp2ld encoding problem
Troy A. Griffitts
scribe at crosswire.org
Tue Nov 22 17:29:36 MST 2005
Yiguang,
Have you tried to open your imp file with a text editor that
understands UTF-8? vi (vim) should do fine. If the data in the imp
file looks ok (and indeed is UTF-8) then you should be good to go for
sword. Actually, a web browser might be easiest. Just open your imp
file with firefox and manually select the UTF-8 encoding and see if it
show up ok.
Hope we can get things working well for you,
-Troy.
Yiguang Hu wrote:
> Thanks Chris. I used UTF-8 in the .conf file. It
> didn't work.
> By trying different encoding, I mean I tried to use
> word and text editor to read the geneated LD database
> (the *.dat file) by select different encodings(GB2312,
> BIG5, UTF-8, CN2202, etc), none of them work.
>
> I am not familiar with C lanuage (I assume impl2ld is
> a c program) since I have not coded it for several
> years, so I don't know if there were potential hidden
> conversion that took place. Java does has some hidden
> conversion if encodings are not specified correctly.
> The heart of that problem is:
> String str=new String(byte[],ENCODING)/ str=new
> String(byte[]);
> and byte[] bt=str.getByte(ENCODING)/bt=str.getByte().
> If ENCODING is not specified, the default encoding is
> picked up according to the JVM environment and it will
> corrupt data if the default encoding is ASCII(for
> example en_US locale) while the data were actually
> DBCS or MBCS characters like Chinese encoded in no
> matter what encodings. The above conversion is very
> common is JAVA and could cause problems, for example
> during conversing stream bytes into string or writing
> string to file using stream.
>
> Could there be similar issue in C/C++ ?
>
> Thanks
> Yiguang
>
>
> --- Chris Little <chrislit at crosswire.org> wrote:
>
>
>>imp2ld faithfully converts an IMP file to an LD
>>database. There is no
>>text encoding transformation of the data involved,
>>so what you put in
>>your file is exactly what will be placed in the
>>module and is exactly
>>what you will get back (from a front-end or
>>mod2imp).
>>
>>The invalid character warning can be ignored. The
>>only character
>>transformations that imp2ld performs relate to
>>sorting the dictionary
>>keys, so the worst case would involve entries in the
>>wrong order.
>>(Correct me if I'm wrong about this Troy.)
>>
>>I'm not sure what you meant about trying different
>>encodings. Which
>>values did you try? The .conf file for your module
>>should include a line
>>that says "Encoding=UTF-8" if you have UTF-8 input.
>>
>>--Chris
>>
>>
>>Yiguang Hu wrote:
>>
>>>I ran into Encoding problem when I tried to use
>>
>>imp2ld
>>
>>>to convert a Chinese theology terms/Encyclopedia
>>
>>into
>>
>>>the module
>>>that sword can use. The input text file is a UTF-8
>>>encoded with the format:
>>>$$$English KeyWord Chinese Translation
>>>The meaning of the term
>>>$$$....
>>>For example:
>>>$$$Abbess 女修道院長
>>>
>>>
>>
>  為女修道院之女領袖,其職任不如男修道院長設立之早,其權亦不如男修道院長之大。有時亦管理男修道院。
>
>>>$$$Abbey 修道院
>>>
>>>
>>
>  又稱*Monastery。原為一修道士團之名稱,由一位院長管理。以後他們所居住之屋宇、禮拜堂等,概稱為修道院。
>
>>>$$$Abbot 修道院長
>>>
>>>
>>
>  為修道院領袖之稱,意即父也。修道院長原係平信徒,從第七世紀起,教會定為聖職。通常為其本院弟兄所選舉,其職任乃終身。
>
>>>$$$Abbot, George
>>>阿波特(1562-1633)
>>>
>>>
>>
>  英國教宗;坎特布里大主教;*聖經欽定本的合編者。
>
>>>$$$Abelard, Peter or Abailard
>>>亞比拉(1079-1142)
>>>
>>>I used imp2ld to generate the module. There were
>>
>>many
>>
>>>errors about invalid characters. But it
>>
>>neverthless
>>
>>>generated the module. The problem is the module
>>>characters are saved in wrong encoding. I tried
>>>different encodings to read and none of them make
>>
>>the
>>
>>>charater understandable as shown below:
>>>Abbey ä¿®éé¢
>>>
>>>
>>
> ãå稱*Monasteryãåçºä¸ä¿®é士åä¹å稱ï¼ç±ä¸ä½é¢é·ç®¡çã以å¾ä»åæå±
ä½ä¹å±å®ã禮æå
>
>>çï¼æ¦ç¨±çºä¿®éé¢ã
>>
>>>Abbot ä¿®éé¢é·
>>>
>>>ãçºä¿®éé¢é
>>
> è¢ä¹ç¨±ï¼æå³ç¶ä¹ãä¿®éé¢é·åä¿å¹³ä¿¡å¾ï¼å¾ç¬¬ä¸ä¸ç´èµ·ï¼ææå®çºèè·ãé常çºå
¶æ¬é¢å¼å
æé¸èï¼å
¶è·ä»»ä¹çµèº«ã
>
>>>Abbot, George é¿æ³¢ç¹ï¼1562-1633ï¼
>>>
>>>Does anyone experience this and knows how to solve
>>>this problem?
>>>
>>>BTW, I have a couple of short java programs that
>>>generate the above format Dictionary file and
>>
>>Bible
>>
>>>text so you can use impl2vs and impl2ld to convert
>>>them into sword modules. I will be glad to put the
>>>code some where for share if someone interest in
>>
>>it.
>>
>>>Thanks
>>>Yiguang
>>
>>
>
>
>
>
> __________________________________
> Yahoo! FareChase: Search multiple travel sites in one click.
> http://farechase.yahoo.com
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list