[sword-devel] osis2mod import issue
Mattias Põldaru
mahfiaz at gmail.com
Thu Jun 4 08:13:02 MST 2009
Ühel kenal päeval, K, 2009-06-03 kell 19:25, kirjutas DM Smith:
> On Jun 3, 2009, at 1:36 PM, Mattias Põldaru wrote:
>
> > Hi everybody.
> >
> > It is nice to see you (DM, I suppose) got the osis2mod working in no
> > time at all. There is one more issue with preverse stuff. Some
> > whitespace gets counted as preverse on my file and I think this is
> > wrong, although it isn't that complicated at all to remove whitespace
> > from my source document. I paste a example here.
> >
> >
> > Here is the input osis file. Please correct me, if I have something
> > wrong here.
> > <!-- start of example clip -->
> > <div type="bookGroup">
> > <title>Vana Testament</title>
> > <div type="book" osisID="Gen" canonical="true">
> > <title type="main">1. Moosese</title>
> > <div type="section" scope="Gen.1.1-Gen.2.3" >
> > <title>Maailma ja inimese loomine</title>
> > <chapter sID="Gen.1" osisID="Gen.1" />
> > <title type="chapter">1. peatükk</title>
> > <p>
> > <verse sID="Gen.1.1" osisID="Gen.
> > 1.1" />
> > Alguses lõi Jumal taevad ja maa.
> > <verse eID="Gen.1.1" />
> > </p>
> > <p>
> > <verse sID="Gen.1.2" osisID="Gen.
> > 1.2" />
> > Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja Jumala Vaim
> > hõljus vete kohal.
> > <verse eID="Gen.1.2" />
> > </p>
> > <!-- end of example clip -->
> >
> >
> >
> >
> > And here is the corresponding module output. Please notice the one
> > space
> > only preverse.
> > <!-- start of example clip -->
> > <div sID="gen1" type="bookGroup"/> <title>Vana Testament</title> <div
> > canonical="true" osisID="Gen" sID="gen2" type="book"/> <title
> > type="main">1. Moosese</title> <div sID="gen3" scope="Gen.1.1-Gen.2.3"
> > type="section"/> <title>Maailma ja inimese loomine</title>
> > <chapter osisID="Gen.1" sID="Gen.1"/> <title type="chapter">1.
> > peatükk</title> <div sID="gen4" type="paragraph"/>
> > Alguses lõi Jumal taevad ja maa. <div eID="gen4" type="paragraph"/>
> > <div type="x-milestone" subType="x-preverse" sID="pv1"/><div
> > sID="gen5"
> > type="paragraph"/> <div type="x-milestone" subType="x-preverse"
> > eID="pv1"/> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja
> > Jumala Vaim hõljus vete kohal. <div eID="gen5" type="paragraph"/>
> > <!-- end of example clip -->
>
> The pre-verse contains "<p> " (the paragraph start and the space)
>
> Handling of whitespace is a bit problematic. What osis2mod does is
> replace sequences of whitespace (newlines, spaces and tabs) with a
> single space. If a verse contains leading or trailing space, it is
> trimmed. (I don't think it should do this trimming.)
>
> What osis2mod does not have knowledge of the containment model of the
> OSIS schema. That is, if it did, it could remove whitespace between
> element tags that don't allow for text.
>
> In this case, the OSIS schema allows for whitespace after the opening
> paragraph tag and before the verse tag. One could have:
> <p>yada yada yada <verse>verse text</verse> yada yada yada</p>
> In this case, it would be inappropriate to trim the whitespace off of
> the text that precedes the verse.
>
> If we can come up with a good heuristic I'd be glad to implement it.
>
For the case I have, it would be sufficient to check if the preverse has
any printing characters and not to add an empty preverse.
Mattias
More information about the sword-devel
mailing list