[jsword-devel] Fwd: [sword-devel] Tables across verse boundaries

Martin Denham mjdenham at gmail.com
Wed Mar 19 15:26:16 MST 2014


This discussion of well-formed xml in individual verses has been very
helpful for me.  The most common request I have for And Bible is for *verse
highlighting*, but I have delayed implementation because the only method I
can think of is like <span class='highlight'>verse text</span> (tell me if
there is a better way).  However this would fail badly if the verse had
mis-matched xml tags and possibly highlight the whole chapter or nothing at
all.  So it is helpful to know that JSword forces well-formed xml in verses.

As a suggestion for the above mentioned requirements of a table spanning
multiple verses but also maintaining well-formed verses could we
support '*between
verse*' content?
E.g.
Between verse: \tr
Verse: .. Judah ...
Between verse: \tr
Verse: .. Issachar ...

Whether implemented in modules or not, this could already be used without
changing modules because if a verse is not well-formed it will normally
have extra tags at the beginning or end which could be sliced off and
pushed into 'between verse' content.

I am not sure if the above suggestion has any value, but if not then please
consider the 'highlighting a verse' requirement before compromising the
well-formedness of verses.

Martin


On 19 March 2014 00:27, DM Smith <dmsmith at crosswire.org> wrote:

> The other fixer upper was tag-soup.
>
> On Mar 18, 2014, at 8:22 PM, DM Smith <dmsmith at crosswire.org> wrote:
>
>
> On Mar 18, 2014, at 5:02 PM, Chris Burrell <christopher at burrell.me.uk>
> wrote:
>
> Yup - so I was looking at the code tonight.
>
> I don't think the problem is quite as bad/hard to fix as you make it sound.
>
> I think there are two types of issues
> - a verse on its own not producing correct XML
> - a bunch of XML together not producing well nested XML
>
> Not sure how to solve the second, but the easy (?) solution on the second
> one is to amalgamate all the raw text first before parsing it. Now that we
> pass the whole passage down one more level, it shouldn't be too difficult
> to do that?
>
>
> Amalgamation may minimize the problem. Especially if we are displaying a
> chapter at a time. But if we display a search results list, a parallel
> display or an arbitrary passage chosen by the user then it may exhibit
> problems.
>
> The problem is still the same.
>
> We've also talked about expanding the context of what fails by grabbing an
> adjacent verse and adding it to the amalgamation and re-parsing.
>
> It is not hard to write a parser. That's essentially what we have with the
> ThML parser. Such a parser could know when it sees an unmatched start or
> end tag. Presuming that the module is valid, well-formed as a whole we can
> either prefix or append the missing tag to the result. This would "solve"
> the problem. (The ThML parser does not do that).
>
>
> On the second, there may some nice XML parsers that fix stuff up more
> gracefully as well...
>
>
> By definition an XML parser must fail on bad input. I've not seen any that
> fix up broken xml. Every year I do a survey of available parsers not just
> XML to see if there is something that might help. One that caught my eye:
> JTidy.
>
> JTidy understands the xhtml spec and can take badly formed HTML and clean
> it up. I was trying to figure out if I could re-write it for another
> schema, or to take a schema and generate a cleanup technique. It was more
> complicated than I was willing to get into.
>
> DM
>
>
> Chris
>
>
> ---------- Forwarded message ----------
> From: DM Smith <dmsmith at crosswire.org>
> Date: 18 March 2014 20:45
> Subject: Re: [sword-devel] Tables across verse boundaries
> To: christopher at burrell.me.uk, SWORD Developers' Collaboration Forum <
> sword-devel at crosswire.org>
>
>
> On Mar 18, 2014, at 3:29 PM, Chris Burrell <christopher at burrell.me.uk>
> wrote:
>
> Hi DM
>
> 1- You're right, it was my mistake around across verses. Ezra 1 would be
> an example where you have 3 rows per verse, and a table over two verses.
>
> No problem. It's hard to debug a problem where the text is made up.
>
>
> 2- My issue with the markup and having the verse number inside the cell
> was that I got a 'nesting' warning by mod2osis. Is that something i just
> ignore? (i.e. "verse sID" in the first cell with "verse eID" in the second
> cell)
>
>
> The nesting warnings are relatively benign. They indicate that the verse
> in isolation is not well-formed XML and that when displayed in certain
> contexts it will have problems.
>
> That the verse sID is in one cell and the verse eID is in another by
> itself is not a problem. It is more a question if the raw data from the
> module is a well-formed fragment.
>
>
>
> 3- I had another look at the output, and the module does in fact have the
> table in it. It looks like it wrapped it into verse 8, as expected. So it
> seems, that maybe this is an issue specific to JSword?
>
>
> It is a particularily bad problem with JSword. JSword passes the verse raw
> data to an xml parser to create an xml fragment, which it fails when not
> well-formed. When the exception is caught, we then strip all markup out of
> the raw data and re-parse it.
> This is particular to JSword.
>
> However, when the verse is shown in isolation by any SWORD frontend or in
> a table cell, it most likely will not display as intended. It's that JSword
> does it one worse. If we wish to discuss JSword's shortcoming more, we
> should do that on jsword-devel or create an issue for it (if there isn't
> already one, as we have talked about this problem in the long past.)
>
>
> Chris
>
>
>
> On 18 March 2014 13:50, Jonathan Morgan <jonmmorgan at gmail.com> wrote:
>
>> Hi DM,
>>
>> On Tue, Mar 18, 2014 at 12:01 PM, DM Smith <dmsmith at crosswire.org> wrote:
>>
>>> On Mar 17, 2014, at 1:07 PM, Chris Burrell <chris at burrell.me.uk> wrote:
>>>
>>> > Hello
>>> >
>>> > I'm looking at converting a module that has tables across verse
>>> boundaries... Is this supported?
>>>
>>> It should be. At least by osis2mod. I don't know if SWORD renderers have
>>> code for tables. I'll leave that for someone else to answer. JSword
>>> probably will choke on tables. I'll go into that in a bit.
>>>
>>
>> Last time we discussed OSIS tables they weren't supported by the SWORD
>> renderers.
>> I don't think anything has changed.
>>
>> Jon
>>
>>>
>>> > I'm using the sword utilities to convert the module, however, I'm
>>> seeing that the 'table' element is getting dropped?
>>>
>>> I'm presuming that you are using osis2mod. osis2mod should not drop
>>> anything. To verify what osis2mod creates I recommend creating a raw module
>>> (that is, use no compression flags) and use the -d 2 flag. This will put
>>> milestones for the start and end of the verses into the module. Then you
>>> can use a text editor (stay away from NotePad as the line endings may not
>>> be windows friendly) to look at the file and search for the constructs.
>>>
>>> >  (both using mod2imp to check,
>>>
>>> Using mod2imp is also useful because it marks each index entry with the
>>> verse slot name. But it may not be necessary, if the raw file gives what
>>> you wish.
>>>
>>> > as well as using JSword).
>>>
>>> JSword has some problems going to OSIS. It assumes that each verse is
>>> well-formed xml. If it is not, it strips all xml, leaving text (with notes
>>> inline).
>>>
>>> This is a fairly safe assumption, but tables will probably will make
>>> that fail.
>>>
>>> This assumption is something that all SWORD/JSword frontends make at
>>> some points. Two examples:
>>> Search results list that show verse content as well as references.
>>> Stacked or side-by-side parallel display.
>>>
>>> >
>>> > If this is supported, does someone have some example mark-up that I
>>> could use as a starting point?
>>>
>>> I'm trying to understand where in a Bible a table would be useful. I can
>>> see it in an introduction. But spanning verses? No way. There is no tabular
>>> data in the Bible. (Please correct me if I'm wrong!)
>>>
>>> I have seen people use tables to control rendering. If this is what is
>>> being done, some one needs guidance.
>>>
>>> In a commentary, which is indexed by verse numbers, anything could
>>> happen.
>>>
>>> Regarding sample markup, it is analogous to simple HTML tables, but
>>> other than <table> the element names are different.
>>> The <table> element can be wholly contained within:
>>> <div>
>>> <chapter>
>>> <speech>
>>> <note>
>>> <cell>
>>> <p>
>>> Nothing else can be a parent to <table>.
>>>
>>> A table has a few attributes, cols and rows to give dimensions;
>>> canonical to indicate whether it contains canonical material; and the
>>> standard OSIS attributes.
>>> It can contain a <head> and also <row> elements. Both are optional, but
>>> it doesn't make sense to have a table without rows.
>>>
>>> I'm not clear what is the purpose of head. It can contain many of the
>>> same content as a verse.
>>>
>>> The <row> element can only contain <cell> elements and it has a role
>>> attribute that can have a value of label or data. It also has a canonical
>>> attribute and the standard OSIS attributes.
>>>
>>> The <cell> element can contain pretty much anything that a <div> or a
>>> <chapter> can contain except <div> and <chapter>. It also has the same role
>>> attribute, but defaults to data. It also has an align attribute with a
>>> value from left, right, center, justify, start and end. And of course it
>>> has canonical and standard OSIS attributes.
>>>
>>> Since a table cannot be milestoned, the element it is contained within
>>> also cannot be milestoned. The manual states that for any given element you
>>> can chose to use the milestoned version or the container version but not
>>> both in the same document.
>>>
>>> I guess a verse can be split across multiple cells and even rows by
>>> using the milestoned version of a verse.
>>>
>>> If a <table> only has a single column, a <list> may be a better
>>> container.
>>>
>>> Hope this helps.
>>>
>>> Together in His Service,
>>>         DM
>>>
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>>>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
>
>
> <smime.p7s>_______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
>
> _______________________________________________
> jsword-devel mailing list
> jsword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/jsword-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/jsword-devel/attachments/20140319/7f45951b/attachment-0001.html>


More information about the jsword-devel mailing list