[sword-devel] Sword enhancement proposal [was: HTML filter cross references link]
DM Smith
dmsmith555 at yahoo.com
Tue Jul 29 04:40:43 MST 2008
On Jul 29, 2008, at 3:10 AM, Chris Little wrote:
>
>
> Greg Hellings wrote:
>> On Tue, Jul 29, 2008 at 1:45 AM, Manfred Bergmann
>> <bergmannmd at web.de> wrote:
>>> Although I don't understand right now how the Sword module data is
>>> stored,
>>> my proposal here is that Sword should have a simple intermediate
>>> XML format
>>> that can be used by API users to have full access to the module
>>> data.
>>> Simple HTML/RTF can still be produced from this intermediate
>>> format by
>>> Sword. But HTML should not be used to give access to the module
>>> data while
>>> at the same time raw data access should not be used.
>>> Having XSDs would make is easy for API-users to use XML->Object
>>> binding (I
>>> only know JAXB in Java but this might be available to most
>>> languages as it
>>> is used in protocols like SOAP).
>>> Also XSLT stylesheets can be used to produce HTML or whatever
>>> output.
>>> Frontends could choose to use the HTML rendered output or choose
>>> totally
>>> different approaches by using the data of the intermediate XML.
>>> Let me know what you think.
>>
>> It seems to me that this is one of the better ideas. After all, the
>> library should supply display-agnostic data to the front end, which
>> then renders it into a display format, rather than presenting it with
>> a list of a few preselected display formats which are supported at
>> the
>> engine level.
>
> If you want OSIS, just ask the engine for OSIS. There's no requirement
> that you tell the API to render text as HTML or RTF. You can just as
> easily tell the API to render to OSIS, and it will happily perform (or
> at least attempt) the conversion from GBF and ThML to OSIS. The
> GBFOSIS
> and ThMLOSIS filters might need a little more work, but they should
> already work fairly well.
The SWORD OSIS filters need some work. They are fairly old and
somewhat incomplete.
Several have noted that this is the approach of JSword. Because JSword
has been doing this from the beginning, we produce pretty good OSIS
2.1 from GBF and ThML, and excellent from Plaintext. We map every
element of these to OSIS. Where GBF or ThML has an element that is not
meaningful to OSIS, we create a <seg type="x-yyyy">.....</seg>,
allowing the possibility for lossly reversing back into the original.
We used to use JAXB to construct the DOM directly into core with
schema validation, but we found that it was way too slow. We still
build the DOM using JDOM, but it is very fast. We made the change
sometime around OSIS 1.5. I've tried to keep it producing valid 2.1,
but without a validating parser, it is not certain. But while it may
be valid, it does not mean that it is good OSIS.
This is just to say that JSword can be used as a pretty good basis for
improving the SWORD filters.
As to TEI dictionaries, our assumption is that the OSIS folks still
plan to adopt it fairly wholesale. That said, JSword merely pretends
that TEI is part of OSIS. For SWORD, I don't see any point in TEI >
OSIS filters.
A bit off topic, JSword then takes the OSIS or TEI DOM and processes
it via XSLT to produce HTML. The HTML is then handed to Java to render.
In Him,
DM
More information about the sword-devel
mailing list