[sword-devel] usfm to osis converter...

Michael H cmahte at gmail.com
Fri Jul 31 12:44:23 MST 2015


Properly coded USFM has the book names for cross references listed in the
header info of each book (/toc1, /toc2, /toc3, or /h).  That info, added
back to a well formed AV11N, should allow for easy to parse xrefs.

I haven't worked on it myself, but it should be easier to deal with digits
only to detect the chapter, verse fields than trying to parse separator
characters.  That is, your end of number should be a 'not digit' [^\d] or
[\D] in regex, not a list of separator chars [\:\.\,\-] etc.   The form for
xrefs should be target book, target chapternum,  target versenum, (optional
target chapter num range), ending verse range.

What am I missing?



On Fri, Jul 31, 2015 at 11:46 AM, Peter Von Kaehne <refdoc at gmx.net> wrote:

> > Von: "Healing Advisor" <healingadvisor at gmx.com>
>
> > On 31/07/15 16:09, Ryan wrote:
> >
> > > This would be why I haven't done anything with xrefs. I am sure I don't
> > > possess the knowledge required to handle them properly. :)
> >
> > In the original, they don't quite work properly.
>
> They worked never anywhere. The reason is that the xref content is not
> ever standardised and needs parsing from very different and variable
> starting points.
>
> The current approach with all the scripts was/is to leave them untouched
> and then use a second script which relies on the Sword library to fix the
> xrefs.
>
> I have aa very messy looking script in sword-tools/modules/crossreferences
> which deals with xrefs and produces valid OSIS. It needs always finetuning
> for each individual module and is a pain in the neck - but nothing better
> exists and to be honest, nothing better will likely exist as the
> variability is a given.
>
> "Matthew 5:3, John 7:1" in English would be expressed as "Mattheus 5,3;
> Johannes 7,1" in German. Etc.
>
> Standardised translations into languages with no history of previous Bible
> publishing - i.e. much of Wycliffe's work - tend to be straightforward as
> at least the punctuation is always the same + carries the same meaning, but
> languages with a long history of Bible publication and translation will
> have invariably their very own ways of deploying dots, commas, colons,
> hyphens and the like and make very individually and variably sense of these.
>
> Peter
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20150731/400fd784/attachment.html>


More information about the sword-devel mailing list