[sword-devel] Website - module lists

DM Smith dmsmith555 at yahoo.com
Wed Dec 17 07:03:13 MST 2008


On Dec 16, 2008, at 9:16 PM, Chris Little wrote:

>
>
> Peter von Kaehne wrote:
>> Language:af means little - Africaans would be nicer. How can I get  
>> there?
>
> See http://www.sil.org/iso639-3/download.asp
>
> The official ISO 639-3 to English list is there under Language Names  
> Index.
>
> I know that GS has a localized equivalent to this, but I'm not sure  
> where it came from.

JSword has groked it from SIL. I update it once or twice a year since  
the list is not static.

This could be readily used for the web module lister pages, since both  
are Java.

You can get it from here (there are several representations):
http://www.crosswire.org/svn/jsword/trunk/common/src/main/java/org/crosswire/common/util/
iso639_all.txt This has all the values. Over 7000 of them. A bit much  
to use directly.
iso639.properties 	English - This contains a subset of iso639_all.txt.
iso639_de.properties German
iso639_fa.properties	Farsi/Persian
iso639_vi.properties 	Vietnamese

A couple of things to note:
In the top of the iso639_all.txt is a comment block giving how I  
created the file.

The markup is a Latin-1 representation of Unicode. Anything that is 7- 
bit  ASCII is unchanged, but all other characters are \uxxxx where  
xxxx is the unicode code point. This can be undone by ascii2native.

The translations are not "official" translations, but were done by  
native or fluent speakers for JSword.

>
>
> (I don't recommend copying the value given for "grc" though, since  
> it is technically incorrect as "grc" means Ancient Greek, not Koine,  
> and it is incorrect in the case of an increasing set of modules,  
> e.g. the LXX, which is not in Koine Greek. More broadly, I wouldn't  
> recommend using localized values for what are L2-languages only. No  
> one speaks native Ancient Greek, so why localize it to a less- 
> familiar form? No one speaks native Latin, so why localize it to  
> "lingua latina"? Etc. Hebrew is a problematic case since "he"  
> represents both modern and Biblical Hebrew. We should probably look  
> for/add a subtag to differentiate the two.)

In the SIL catalog (or for Peter, catalogue) grc is "Greek, Ancient  
(to 1453)"

For Hebrew, there are 3 codes (there used to be a 4-th of iw. Java  
uses this internally.)
heb and he for Hebrew
hbo for Hebrew, Ancient

On a related note, there are multiple scripts for some languages. I  
wish there were a standard for scripts that would be ready to use.

For Chinese, there is traditional and simplified. In JSword, we have  
represented these with a country code, using the ISO 3166 standard.
Country codes can be found in:
http://www.crosswire.org/svn/jsword/trunk/common/src/main/java/org/crosswire/common/util/
iso3166.properties

Notes:
Unicode is represented with the \uxxxx codes. A requirement of Java  
property files.

The value for CN and TW have been changed to Simplified and  
Traditional, since we are using the country code to indicate the  
difference in script and not the country.


>
>
> I think we should also push a language tag lookup mechanism/database  
> back into libsword so that this data is exposed to all users,  
> assuming it isn't already present in ICU. This probably isn't the  
> first time this has been proposed.

This is a good idea! Hopefully, it can be reused. Java has it built  
in. I like Java's mechanism and if one has to be invented, perhaps  
it's mechanism would be a good pattern to follow.

Here is how Java does it:
Property files have list of key=value pairs.

Java has the notion of the Locale of the user, which is represented as  
language (2-letter, lower case iso-639) and country (2-letter, upper  
case, iso3166). It only knows of a subset of languages and countries.

When looking up a key, it processes the following files in reverse,  
until it finds a match:
*.properties - The default. Not necessarily English or even a single  
language.

*_xx.properties - A language specific version, where xx is a 2-letter  
language code from iso 639.

*_xx_YY.properties - A language & country specific version.

Note:
It is common when using country codes to only change what is different  
or what needs to be protected from underlying changes.

E.g.
*.properties contains COLOR=color
*_en_GB.properties contains COLOR=colour

>
>
>> the little info button is irrelevant as if the name is highlighted  
>> as a
>> link it is obvious where to get info. So I will drop that.
>
> I'm not sure that's true. I would assume (as a naive user who had  
> never been to the site before) that clicking on the name of the  
> module would start a download (in spite of the presence of the 3  
> download buttons).
>
>> I never liked the internal module names as a presentation item, i  
>> think
>> the short description is much more useful for that. I think i would  
>> want
>> to drop the internal module name.
>
> While we should present an abbreviate (short form) name, we should  
> also still make clear what the actual module name is. So I strongly  
> suggest not dropping the internal module name.
>
>> But there is more:
>> What navigation would be efficient, robust and would bypass  
>> scrolling to
>> an ever expanding list?
>> I am sure most frontend developers have the same problem.
>> I am wondering about two drop downs - language and category
>> If you choose nothing you get all
>> If you choose English you get all English
>> if you choose English and Bible you get only English Bibles
>> Has anyone a better idea?
>
> Sounds good, but it will need to build itself based on the modules  
> present on the server. That includes needing to build the language  
> list based on installed modules. In other words, I don't want to  
> have to go in and add new languages to a JSP every time I upload a  
> module that implements a new language. *AND* The language lists need  
> to be repository-specific. So, the public list should not show  
> languages only present in beta modules (and vice versa).

I agree in principle with not needing to add new ones. But it is  
resource intensive. That's why JSword has pruned the list of 7000 to  
those that are in use. Maybe there should be a mechanism to build the  
pruned list from a master list based upon modules that are present in  
both repositories.

I agree that the presentation of languages should be based on the  
modules in that 1 repository.

Likewise, the list of module types should be based on the types of  
modules actually in the repository.

In Him,
	DM





More information about the sword-devel mailing list