[sword-devel] Portuguese translation - request for help

Peter von Kaehne refdoc at gmx.net
Thu Sep 9 15:52:31 MST 2010


Several questions re XSLT:

How can I select on (bits of/features of) content of the actual text
node - e.g. the text being capitalised?

How can I print out an attribute?

How can I avoid the text content being printed while still working on
the children nodes?

I think i have made decent progress on this text, but there were long
periods of inactivity until I understood the next steps.

FWIW my process:

I have received PDF files, no better source exists. I have used pdf2xml
to create XML expressions of the underlying post script. I then have
fairly painstakingly analysed the font size and other characteristics to
decide which bit represents which structure.

A perl script produces now a xml file based on above.

This XML file is still ordered along pages and as a print layout,
without any deeper hierarchy, so no actual textual structure. But at
least the structure becomes perceivable in my naming of tags.

I then take an XSLT sheet to create USFM from the text. This is closer
to the structureless text than OSIS.

I finally need another Perl script to clean things up a bit. (not yet
written), but that will be straight forward.

Thanks

Peter



More information about the sword-devel mailing list