[sword-devel] 3-letter language character codes

DM Smith dmsmith at crosswire.org
Mon Nov 9 13:45:44 MST 2009


For those that are interested. Here is the perl script, makeISO639.pl,  
I use to create the listing for JSword.
In order for names to sort better, I'm using the "inverted" name that  
puts the family name in front of the qualifier.
This means that all the Zapotek languages sort together.

(Note, I have to run the output through native2ascii to create a  
property file):
******************************************************************************
#!/usr/bin/perl
# This file is used to create a Java property file from SIL's ISO639-3  
files.
# That file changes frequently both in content and layout.
# Adjust this program as needed.
#
# The files are currently downloaded from:
#       http://www.sil.org/iso639-3/iso-639-3_20090210.tab
#       http://www.sil.org/iso639-3/iso-639-3_Name_Index_20090210.tab
#       http://www.sil.org/iso639-3/iso-639-3_Retirements_20090126.tab
#
# Run the program as:
#       makeISO639.pl > iso639.txt
#
# Sort the file if desired with:
#       makeISO639.pl | sort -t = -k 2 > iso639.txt
#
# Convert it from UTF-8 to Java's ASCII representation with:
#       native2ascii -encoding utf-8 iso639.txt > iso639.properties

use strict;
use Unicode::Normalize;
binmode(STDOUT, ":utf8");

my $nameIndex = "iso-639-3_Name_Index_20090210.tab";
my $langCodes = "iso-639-3_20090210.tab";
my $deadCodes = "iso-639-3_Retirements_20090126.tab";
my %names = ();
open(my $nameIndexFile, "<:utf8", $nameIndex);
# skip the first line
my $firstLine = <$nameIndexFile>;
while (<$nameIndexFile>)
{
         # chomp ms-dos line endings
         s/\r//o;
         chomp();
         # Skip blank lines
         next if (/^$/o);
         # ensure it is normalized to NFC
         $_ = NFC($_);
         my @line = split(/\t/o, $_);
         $names{$line[0],$line[1]} = $line[2];
}

open(my $langFile, "<:utf8", $langCodes);
# skip the first line
$firstLine = <$langFile>;
while (<$langFile>)
{
         # chomp ms-dos line endings
         s/\r//o;
         chomp();
         # Skip blank lines
         next if (/^$/o);
         # ensure it is normalized to NFC
         $_ = NFC($_);
         my @line = split(/\t/o, $_);
         # exclude extinct languages
         next if ($line[5] eq 'E');
         my $name = $names{$line[0],$line[6]};
         print "$line[3]=$name\n" if ($line[3]);
         print "$line[0]=$name\n";
}

# The dead codes file is iso-8859-1. This may change at some date.
open(my $deadFile, "<:encoding(iso-8859-1)", $deadCodes);
# skip the first line
$firstLine = <$deadFile>;
while (<$deadFile>)
{
         # chomp ms-dos line endings
         s/\r//o;
         chomp();
         # Skip blank lines
         next if (/^$/o);
         # ensure it is normalized to NFC
         $_ = NFC($_);
         my @line = split(/\t/o, $_);
         print "$line[0]=$line[1]\n";
}
******************************************************************************

On Nov 9, 2009, at 2:01 PM, DM Smith wrote:

> Here is a list of the proposed changes for the last update of 2009  
> (review ends December 15, so I think we can expect a new listing  
> shortly after that):
> 	http://www.sil.org/iso639-3/chg_requests.asp
> The last column gives the reason for the request.
>
> Perhaps of interest are some Iranian languages.
>
> In His Service,
> 	 DM
>
> On Nov 9, 2009, at 1:32 PM, DM Smith wrote:
>
>> On 11/09/2009 11:51 AM, Karl Kleinpaste wrote:
>>> DM Smith<dmsmith at crosswire.org>  writes:
>>>
>>>> ISO-639-3 is a changing set of codes.
>>>>
>>> ...
>>>
>>>> These all changed on 2009-01-16.
>>>>
>>> What is the point of "standardized" abbreviations if the  
>>> "standard" is
>>> not fixed?  "ckw" is replaced with "cak", "tzz" with "tzo"?  For  
>>> whose
>>> benefit is that, other than as a make-work issue for people like us?
>>>
>> I don't know all the history, and what I know may be a bit faulty.
>>
>> There are about 7500 languages. The beginnings of the ISO-639 were  
>> in the Ethnologue, started in 1950. ISO-639-1 was adopted in 1988.  
>> ISO-639-2 was adopted in 1998 and covered about 400 languages.  
>> IS0-639-3 was given to SIL in 2002 and the first adoption of it was  
>> published in 2007. So only a few years ago, the list was quite  
>> small. At that time, some of our module had Ethnologue codes of the  
>> form x-aaa or x-yyy-aaa.
>>
>> At this point ISO-639-3 encompasses all 2 and 3 letter codes. It is  
>> actively maintained and updates happen at least once a year.
>>
>> Much of the effort to define languages resolves around literacy and  
>> Bible translation. It is widely held that the return of Christ is  
>> predicated on the gospel being preached to every tongue and there  
>> is an effort to get the Bible into every spoken language. Many  
>> languages have no alphabet. My daughter and her husband spent the  
>> summer finalizing the alphabets for 3 closely related languages. At  
>> this point they, and the team that they were on, believe that these  
>> are 3 distinct languages and not merely dialects of each other. As  
>> such, they would have three different codes and language names. If  
>> later, these were found to be merely dialectical different, the 3  
>> alphabets might be merged into one and the 3 different codes and  
>> their names would be replaced with one name.
>>
>> If you look at the reasons for retiral, many of them were 'M', that  
>> is merging several codes into one code.
>>
>> On a similar note, the two letter codes are not stable either.  
>> Hebrew used to have the code 'iw' now it has the code of 'he'.  
>> Likewise for Indonesian, it use to have the code 'in', but now it  
>> is 'id'. Now with the latest CDRL, 'in' is an alias for 'id'.
>>
>> These two have bitten me as Java silently transforms the current  
>> code to the obsolete one. 'iw', Hebrew, bit me a few years back.  
>> Indonesian, 'in', was last week as Tonny supplied an Indonesian  
>> translation for JSword. We had to name the resource files with the  
>> obsolete name to get it to work.
>>
>> In Him,
>>   DM
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list