[sword-devel] lack of apocrypha localization
DM Smith
dmsmith at crosswire.org
Sun Oct 15 14:38:48 MST 2017
> On Oct 15, 2017, at 5:02 PM, David Haslam <dfhmch at googlemail.com> wrote:
>
> In reply to:
>
>>
>> And how about another problem?
>>
>> What does a cross-reference to 1 Kings 13:12 point to?
>
> All references within a module should use an OSIS ID. If so it is not a
> problem.
>
> User input is a different story.
>
>>
>> In a Catholic Bible with 1/2/3/4 Kings, it should point to 1 Samuel 13:12
>> but would our scripts to fix cross-references even have a clue about this?
>
> Rather than asking us to dig into the code, please try doing a test on your
> own and report the outcome.
>
> ---
>
> Here's my reply:
>
> I'm talking about the Perl script that Peter uses when the modules team
> receives a submission for a Bible containing cross-references, and [say] the
> source files are provided only as USFM.
USFM is user input. AFAIK, USFM doesn’t constrain references to a standard naming convention.
>
> First step is to use usfm2osis.py to get an initial OSIS XML file.
> Second step is the attempt to fix the cross-references in the XML file.
>
>
> The Perl script attempts to convert the translater-supplied humanly readable
> xrefs to proper OSIS xrefs with the correct target osisRef value.
It uses the SWORD engine to do this. I don’t know if it supplements with other mappings.
>
> If the locale file is designed with only a language as the determinative,
> this would only be successful if none of the Bible versions in that language
> used the same name for two different books.
>
> Yet that's what we have even with the English language.
>
> e.g. 1 Kings refers to a different book in some Catholic Bibles than in most
> Protestant Bibles.
>
> There are loads of other issues with the Perl script, but this one seems to
> be a Top Level Design issue for the underlying methodology.
>
> This is always the hardest part of module preparation to get right.
This is a simple problem to solve in Perl or Python.
Have a new Perl/Python (whatever) program create a table, partially filled in with book name values determined by the SWORD engine, by reading the USFM.
A human verifies the generated small table, adjusting it as needed. Feeding changes into the locale files if new info is learned.
Have the current program take that table as input and use it to create OSIS ids.
Over time the set of tables should stabilize and be reusable. If not, it is a design problem of USFM that Bible book names are not standardized.
In Him,
DM Smith
More information about the sword-devel
mailing list