[jsword-devel] Missing short bible book names for most languages

Tue Jan 11 04:32:35 MST 2011

On Jan 10, 2011, at 6:37 PM, Martin Denham wrote:

> Hi, I have hit a problem and I don't know if anybody has any suggestions.  
> 
> A long term request for And Bible has been to show the book names and chapters in a grid for navigation purposes.  To fit the book names on the screen I used the short names from BibleNames.properties e.g. 'Gen'. 'Exo' and it was working very well until I tried a test in a foreign language when I discovered that very few of the localisations have short names and if there is no short name then the long name is used and there is no way to fit 66 long names in a grid on a small mobile phone screen.
> 
> To compound the problem it is not just a few languages that are missing short names it is nearly all of them:
> 
> No short names: af, ar_EG, bg, cs, da, es, fi, fr, he, hu, id, it, la, lt, nl, no, pl, pt_BR, pt, ro, ru, sk, sl, sv, th, uk, 
> Short names: de, en, et, fa, in,  ko, vi, zh_CN, zh
> 
> I did wonder if there might be an alternative source of short names, possibly from another sword project like Xiphos, that we could use to insert the correct short names in our properties files.

The Bible book names that the SWORD front-ends use are from locales.d (under svn here: http://crosswire.org/svn/sword/trunk/locales.d/ ). This is where we derived the various BibleNames.properties.

The languages for which SWORD has standard abbreviations, have a "xx_abbrev.conf" file, where xx is the language code. 

There are two ways SWORD stores abbreviations, either in the xx-utf8.conf (or the cp1252 equivalent xx.conf) or in xx_abbrev-utf8.conf (or cp1252 equivalent xx_abbrev.conf). In both of these files abbreviations are stored in the [Book Abbrevs] section.

The purpose of abbreviations differ between these two files.

In the xx-utf8.conf file, they are used as lookup patterns. That is given a text or given user input, to which Bible book is being referred? In this file there typically are many common prefixes of the book name, with and without punctuation and spaces. There is no attempt to identify a proper/standard abbreviation in this file. This file is sensitive to order. The abbreviation Jud might be good abbreviations for Judges and Jude. The first one listed is the winner. It may be that the shortest value in the file is a fine abbreviation to show the user. But I wouldn't vouch for that. Who knows, it might just be an offensive word.

The xx_abbrev-utf8.conf there is an attempt to identify a standard book abbreviation. These files map from standard English Bible book names to the language's standard abbreviations and from those abbreviations to the standard English abbreviations. (Note, by standard, I mean that it is identified as the unique/key name in these files.)

In JSword, we have the BibleNames_xx.conf with entries of the form:
Judg.Full=Judges
Judg.Short=Judg
Judg.Alt=jdg,jud
Where the key is made up of two parts: An OSIS book abbreviation followed by Full, Short or Alt.

The values given are:
Long - A full name of the book. here might be other abbreviations that are equally good, but only one is here.
Short - A standard abbreviation. This is worthy of being shown to a user. There might be other abbreviations that are equally good, but only one is here. By and large, these don't have spaces, e.g. 1 Cor is 1Cor.
Alt - other abbreviations by which the book is known. None of these are prefixes. Though they might be good abbreviations, these are intended for internal use only. That's why they are lowercase.

As an aside, here is how JSword does a lookup of a Bible book name from user input or programmatic:
* First the input is normalized by stripping out some punctuation and spaces and then lower casing. This is the same normalization that is applied to the stored long, short and alt names.
Then the search is in the following order with the first normalized match returned:
* It is compared to an OSIS abbreviation for an exact match.
* It is searched against the user's locale with the following:
** Exact match against the full name
** Exact match against the short name
** Exact match against an alt name
** Extended search against alt, long and short names from Genesis to Rev.
*** Is an alternate either a prefix of the input or the input a prefix of the alternate
*** Is the input a prefix of the long name.
*** Is the short name either a prefix of the input or the input a prefix of the short name

* Finally it is searched against the default, English locale, just as with the user's locale.

In this context, the JSword lookup mechanism makes many of the abbreviations in the locales.d file unnecessary.

If JSword gets abbreviations, we'll also submit them to the SWORD project.

In Him,
	DM

> 
> Thanks
> Martin
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel