[jsword-devel] XSLT and enrichment of OSIS Text...

DM Smith dmsmith at crosswire.org
Thu Nov 4 09:30:52 MST 2010


On 11/04/2010 08:03 AM, Chris Burrell wrote:
> Hi All
>
> I have a requirement to to take some OSIS xml, enrich it, and then 
> transform it to HTML. I want to keep the enrichment as flexible as 
> possible, in terms of the data I add. But one example would be adding 
> the original Greek/Hebrew text to every word that is annotated with a 
> Strong/Morph combination (optional Morph).
Another would be to knit the apparatus for a GNT into a GNT. Another, to 
highlight search hits.

>
> Originally, I parsed the XML out into POJOS, iterate through the POJOS 
> to add the information, and then iterated through again to output some 
> HTML. I've had a rethink, and believe either of the three ways 
> provided below might be a better way of doing things... Perhaps, this 
> might also be of interest to integrate to the existing JSword 
> libraries (or perhaps it can already do that?).
It would be good to add. I think the framework is there, though it might 
need tweaking.

The basic process is that the filters transform module content into OSIS 
as XML, not text. Then SAX is used to do the transformations from there. 
I think JSword allows for the chaining of SAX parsers, wrapping the 
output of one with another. Thus one could wrap the current SAX stream 
with an annotating filter.

>
> I have seen some code in Bible Desktop and the core JSword libraries 
> to do XSLT transformation, so it would make sense to reuse that since 
> it does a very good job.
You could do it there, but it might not be performant. It might be 
better to stack SAX filters.

> So the question is
>
>     1- is it easy enough to add attributes to the XML (before I send 
> it down to the SAX provider/XSLT converter).
Yes. See above. No code samples though. See 
o.c.b.display.basic.TextPaneBookDataDisplay.refresh() for a starting point.

> or 2- Even better, it's probably possible to have the transformation 
> itself (XSLT file) call a Java function as it goes through.
We do this today. Quick primer:
In the <xsl:stylesheet> element add two attributes:
   xmlns:jsword="http://xml.apache.org/xalan/java"
   extension-element-prefixes="jsword"
The first allows us to use Java and the second lets us use a name of our 
choosing in calling java.

Then create reusable service objects. In BibleDesktop's xslt 
(.../bibledesktop/src/main/resources/xsl.cswing/simple.xsl) we use:
<!-- Create a global key factory from which OSIS ids will be generated -->
<xsl:variable name="keyf" 
select="jsword:org.crosswire.jsword.passage.PassageKeyFactory.instance()"/>
<!-- Create a global number shaper that can transform 0-9 into other 
number systems. -->
<xsl:variable name="shaper" 
select="jsword:org.crosswire.common.icu.NumberShaper.new()"/>
Here you see the prefix, declared above, in use.

Then these can be used in the following fashion:
<xsl:variable name="passage" select="jsword:getValidKey($keyf, @osisID)"/>
<a href="#{substring-before(concat(@osisID, ' '), ' ')}">
<xsl:value-of select="jsword:getName($passage)"/>
</a>
and:
<xsl:variable name="chapter" select="jsword:shape($shaper, 
substring-before(substring-after($firstOsisID, '.'), '.'))"/>

Note: Much of the simple.xsl is geared for the weak HTML implementation 
in Java. E.g. it does not have spans or CSS positional stuff. It would 
be cool to do a true interlinear in HTML and CSS. I've got sample HTML 
for that.

> or 3- perhaps I can provide a map of all the relevant data that needs 
> to be inserted, down to the transformation and reference it as a template?
If you mean within xslt then I don't recommend it as xslt is clumsy and 
slow in doing lookups.

Another possibility is to hook into the hyper link capabilities to fetch 
the appropriate info. It might also be possible, though BD does not 
support it, to have a click on word capability that would fetch the 
underlying OSIS and use that to fetch the info.

I'm thinking of using "flying saucer" as the display engine to directly 
display OSIS. (https://xhtmlrenderer.dev.java.net/) This would minimize 
the transformations.

>
> I guess I'm after both advice, and if it's already been done, then 
> some code samples!? There seem to be a bunch of resources on the 
> internet talking about extending XSLT with java functions. However, 
> given I'm not very familiar with JSword XML implementations (or with 
> the inner working of XSL), I would keen to find out if anyone has 
> envisaged it, and if I would be able to reuse the existing jsword 
> framework for it?

Much of the transformations is done in BibleDesktop. Refactoring these 
and putting it into JSword and/or common would be good.

In Christ's Service,
     DM



More information about the jsword-devel mailing list