[sword-devel] osis2mod import issue
Mattias Põldaru
mahfiaz at gmail.com
Fri Jun 5 02:14:01 MST 2009
Ühel kenal päeval, N, 2009-06-04 kell 12:53, kirjutas DM Smith:
> Mattias Põldaru wrote:
> > Ühel kenal päeval, K, 2009-06-03 kell 19:25, kirjutas DM Smith:
> >
> >> On Jun 3, 2009, at 1:36 PM, Mattias Põldaru wrote:
> >>
> >>
> >>> Hi everybody.
> >>>
> >>> It is nice to see you (DM, I suppose) got the osis2mod working in no
> >>> time at all. There is one more issue with preverse stuff. Some
> >>> whitespace gets counted as preverse on my file and I think this is
> >>> wrong, although it isn't that complicated at all to remove whitespace
> >>> from my source document. I paste a example here.
> >>>
> >>>
> >>> Here is the input osis file. Please correct me, if I have something
> >>> wrong here.
> >>> <!-- start of example clip -->
> >>> <div type="bookGroup">
> >>> <title>Vana Testament</title>
> >>> <div type="book" osisID="Gen" canonical="true">
> >>> <title type="main">1. Moosese</title>
> >>> <div type="section" scope="Gen.1.1-Gen.2.3" >
> >>> <title>Maailma ja inimese loomine</title>
> >>> <chapter sID="Gen.1" osisID="Gen.1" />
> >>> <title type="chapter">1. peatükk</title>
> >>> <p>
> >>> <verse sID="Gen.1.1" osisID="Gen.
> >>> 1.1" />
> >>> Alguses lõi Jumal taevad ja maa.
> >>> <verse eID="Gen.1.1" />
> >>> </p>
> >>> <p>
> >>> <verse sID="Gen.1.2" osisID="Gen.
> >>> 1.2" />
> >>> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja Jumala Vaim
> >>> hõljus vete kohal.
> >>> <verse eID="Gen.1.2" />
> >>> </p>
> >>> <!-- end of example clip -->
> >>>
> >>>
> >>>
> >>>
> >>> And here is the corresponding module output. Please notice the one
> >>> space
> >>> only preverse.
> >>> <!-- start of example clip -->
> >>> <div sID="gen1" type="bookGroup"/> <title>Vana Testament</title> <div
> >>> canonical="true" osisID="Gen" sID="gen2" type="book"/> <title
> >>> type="main">1. Moosese</title> <div sID="gen3" scope="Gen.1.1-Gen.2.3"
> >>> type="section"/> <title>Maailma ja inimese loomine</title>
> >>> <chapter osisID="Gen.1" sID="Gen.1"/> <title type="chapter">1.
> >>> peatükk</title> <div sID="gen4" type="paragraph"/>
> >>> Alguses lõi Jumal taevad ja maa. <div eID="gen4" type="paragraph"/>
> >>> <div type="x-milestone" subType="x-preverse" sID="pv1"/><div
> >>> sID="gen5"
> >>> type="paragraph"/> <div type="x-milestone" subType="x-preverse"
> >>> eID="pv1"/> Ja maa oli tühi ja paljas ja pimedus oli sügavuse peal ja
> >>> Jumala Vaim hõljus vete kohal. <div eID="gen5" type="paragraph"/>
> >>> <!-- end of example clip -->
> >>>
> >> The pre-verse contains "<p> " (the paragraph start and the space)
> >>
> >> Handling of whitespace is a bit problematic. What osis2mod does is
> >> replace sequences of whitespace (newlines, spaces and tabs) with a
> >> single space. If a verse contains leading or trailing space, it is
> >> trimmed. (I don't think it should do this trimming.)
> >>
> >> What osis2mod does not have knowledge of the containment model of the
> >> OSIS schema. That is, if it did, it could remove whitespace between
> >> element tags that don't allow for text.
> >>
> >> In this case, the OSIS schema allows for whitespace after the opening
> >> paragraph tag and before the verse tag. One could have:
> >> <p>yada yada yada <verse>verse text</verse> yada yada yada</p>
> >> In this case, it would be inappropriate to trim the whitespace off of
> >> the text that precedes the verse.
> >>
> >> If we can come up with a good heuristic I'd be glad to implement it.
> >>
> >>
> > For the case I have, it would be sufficient to check if the preverse has
> > any printing characters and not to add an empty preverse.
> >
>
> The preverse is not empty, it contains
> <div type="paragraph" sID="gen5">
> which is the transformation of <p> into a milestoned representation.
>
> It also has a single space following that element.
>
> Where should the paragraph be put? It either is appended to the prior
> verse or it is pre-verse.
>
> The one solution I thought of is that any whitespace immediately
> following a block element start (<div>, <lg>, <p>, ...) can be deleted.
> Likewise for any whitespace immediately before the end element.
>
> Would this work?
>
> In Him,
> DM
>
> _______________________________________________
> sword-devel mailing list: sword-devel atcrosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
I reported this against Xiphos. It may be a bug of it's. You will find
the screenshot from the report.
https://sourceforge.net/tracker/?func=detail&aid=2801620&group_id=5528&atid=105528
Mattias
More information about the sword-devel
mailing list