<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=windows-1252"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Chris,<br>
<br>
I imagine that with most languages, sorting according to unicode
codepoint order works, but for Vietnamese it doesn't, probably because
the majority of letters are standard Latin characters, but then some
are less usual ("đ" being a good example). <br>
<br>
This is probably very low on the priority list and I'm not sure how
much work this would involve, but I would suggest at some point adding
an option to the command line syntax for imp2ld either to 1. sort the
order of keys according to unicode (default) or 2. retain the order of
the IMP file (not sort at all). That way languages that do not
alphabetize well according to the codepoint order in Unicode can remain
in alphabetical order (assuming the module creator sorted correctly). <br>
<br>
Daniel<br>
<br>
Chris Little wrote:
<blockquote cite="mid4724D3F7.3020006@crosswire.org" type="cite">
<pre wrap="">Daniel,
The order of keys in an LD module is according to the codepoint order in
Unicode. They keys are kept in this order in order to permit binary
searching. There is currently no way to perform localized collation.
The platform and locale shouldn't play a role in this. If they do, it's
a bug.
--Chris
Daniel Owens wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I am working on creating dictionary modules based on the Free Vietnamese
Dictionary Project. The Vietnamese-English dictionary is working, but
some words are not in alphabetical order, and I am trying to find out
how to maintain the original alphabetization.
I noticed this when all of the words beginning with a vowel having
diacritics/tones or beginning with a "đ" were sorted to the end of the
dictionary. The DAT file maintains the original order, which is more
accurate. It must be that the IDX file generated by imp2ld creates its
own index and alphabetizes according to it's own scheme. The entries of
each word are tagged as ThML. Here is a slightly random entry:
$$$ác bá
<entry key="ác bá" type="main" id="n20"><b>ác bá</b><br />[noun]<br />-
Cruel landlord, village tyrant<br /></entry>
Is there a way to keep imp2ld from changing the order of the index? I am
happy to send someone the IMP file if that helps. I pasted the CONF file
at the bottom of this message.
Daniel
CONF File:
[VietAnh]
DataPath=./modules/lexdict/rawld4/vietanh/vietanh
ModDrv=RawLD4
Encoding=UTF-8
SourceType=THML
SwordVersionDate=2007-10-27
Version=1.0
Lang=vi
Description=FVDP Vietnamese-English Dictionary
About=- This is the Vietnamese-English dictionary database of the Free
Vietnamese Dictionary Project. It contains more than 23.400 entries with
definitions and illustrative examples.\par\par- This database was
compiled by Ho Ngoc Duc and other members of the Free Vietnamese
Dictionary Project
(<a class="moz-txt-link-freetext" href="http://www.informatik.uni-leipzig.de/~duc/Dict/">http://www.informatik.uni-leipzig.de/~duc/Dict/</a>)\par\par- Copyright (C)
1997-2003 The Free Vietnamese Dictionary Project\par\par- This program
is free software; you can redistribute it and/or modify it under the
terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your
option) any later version. This program is distributed in the hope that
it will be useful, but WITHOUT ANY WARRANTY; without even the implied
warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
TextSource=<a class="moz-txt-link-freetext" href="http://www.informatik.uni-leipzig.de/~duc/Dict/">http://www.informatik.uni-leipzig.de/~duc/Dict/</a>
_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
<pre wrap=""><!---->
_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
</body>
</html>