[jsword-devel] XSLT and enrichment of OSIS Text...
DM Smith
dmsmith at crosswire.org
Thu Nov 4 09:30:52 MST 2010
On 11/04/2010 08:03 AM, Chris Burrell wrote:
> Hi All
>
> I have a requirement to to take some OSIS xml, enrich it, and then
> transform it to HTML. I want to keep the enrichment as flexible as
> possible, in terms of the data I add. But one example would be adding
> the original Greek/Hebrew text to every word that is annotated with a
> Strong/Morph combination (optional Morph).
Another would be to knit the apparatus for a GNT into a GNT. Another, to
highlight search hits.
>
> Originally, I parsed the XML out into POJOS, iterate through the POJOS
> to add the information, and then iterated through again to output some
> HTML. I've had a rethink, and believe either of the three ways
> provided below might be a better way of doing things... Perhaps, this
> might also be of interest to integrate to the existing JSword
> libraries (or perhaps it can already do that?).
It would be good to add. I think the framework is there, though it might
need tweaking.
The basic process is that the filters transform module content into OSIS
as XML, not text. Then SAX is used to do the transformations from there.
I think JSword allows for the chaining of SAX parsers, wrapping the
output of one with another. Thus one could wrap the current SAX stream
with an annotating filter.
>
> I have seen some code in Bible Desktop and the core JSword libraries
> to do XSLT transformation, so it would make sense to reuse that since
> it does a very good job.
You could do it there, but it might not be performant. It might be
better to stack SAX filters.
> So the question is
>
> 1- is it easy enough to add attributes to the XML (before I send
> it down to the SAX provider/XSLT converter).
Yes. See above. No code samples though. See
o.c.b.display.basic.TextPaneBookDataDisplay.refresh() for a starting point.
> or 2- Even better, it's probably possible to have the transformation
> itself (XSLT file) call a Java function as it goes through.
We do this today. Quick primer:
In the <xsl:stylesheet> element add two attributes:
xmlns:jsword="http://xml.apache.org/xalan/java"
extension-element-prefixes="jsword"
The first allows us to use Java and the second lets us use a name of our
choosing in calling java.
Then create reusable service objects. In BibleDesktop's xslt
(.../bibledesktop/src/main/resources/xsl.cswing/simple.xsl) we use:
<!-- Create a global key factory from which OSIS ids will be generated -->
<xsl:variable name="keyf"
select="jsword:org.crosswire.jsword.passage.PassageKeyFactory.instance()"/>
<!-- Create a global number shaper that can transform 0-9 into other
number systems. -->
<xsl:variable name="shaper"
select="jsword:org.crosswire.common.icu.NumberShaper.new()"/>
Here you see the prefix, declared above, in use.
Then these can be used in the following fashion:
<xsl:variable name="passage" select="jsword:getValidKey($keyf, @osisID)"/>
<a href="#{substring-before(concat(@osisID, ' '), ' ')}">
<xsl:value-of select="jsword:getName($passage)"/>
</a>
and:
<xsl:variable name="chapter" select="jsword:shape($shaper,
substring-before(substring-after($firstOsisID, '.'), '.'))"/>
Note: Much of the simple.xsl is geared for the weak HTML implementation
in Java. E.g. it does not have spans or CSS positional stuff. It would
be cool to do a true interlinear in HTML and CSS. I've got sample HTML
for that.
> or 3- perhaps I can provide a map of all the relevant data that needs
> to be inserted, down to the transformation and reference it as a template?
If you mean within xslt then I don't recommend it as xslt is clumsy and
slow in doing lookups.
Another possibility is to hook into the hyper link capabilities to fetch
the appropriate info. It might also be possible, though BD does not
support it, to have a click on word capability that would fetch the
underlying OSIS and use that to fetch the info.
I'm thinking of using "flying saucer" as the display engine to directly
display OSIS. (https://xhtmlrenderer.dev.java.net/) This would minimize
the transformations.
>
> I guess I'm after both advice, and if it's already been done, then
> some code samples!? There seem to be a bunch of resources on the
> internet talking about extending XSLT with java functions. However,
> given I'm not very familiar with JSword XML implementations (or with
> the inner working of XSL), I would keen to find out if anyone has
> envisaged it, and if I would be able to reuse the existing jsword
> framework for it?
Much of the transformations is done in BibleDesktop. Refactoring these
and putting it into JSword and/or common would be good.
In Christ's Service,
DM
More information about the jsword-devel
mailing list