[sword-devel] Persian Module

Chris Little chrislit at crosswire.org
Wed Jan 12 21:21:34 MST 2005


Hi Peter,

vkaehne at doctors.org.uk wrote:
> Dear knowing ones,
> 
> There are a few things I am struggling just now with and wonder whether I could get some advice:
> 
> As described previously my text is some XML variety, the dump of paratext. Everything is marked up - which is good, but uses different tags than OSI - which is bad. I am in the process to change it over to osis, but as I can not yet script I must do things by hand - which is a bit grim.

Do you have access to the original Paratext SFM files? Those would be
more trustworthy as source material.

And you really DON'T want to do this all by hand. If you don't feel
qualified to convert this (or you feel like would be spending way too
much time doing things by hand) you can send it to me and I would be
happy to do the conversion. However, not that I'm currently working on
another Bible and have about 4 other modules that I need to work on
after that, so it would probably be about a month before I get time to
work on it. There might be others on this list who would be willing to
work on it, too, who might have time at the moment. (Anyone?)

> q1) in an Osis prepared module do the verses need an osisID ? I reverted the Suaheli module (mod2osis) and found that only the chapters are tagged, while the verses appear to be simple a verse per line. I assume that the software counts the verses "by hand". Is this true?

Yes, each verse must be contained in a <verse> element and each <verse>
element must have an osisID. Without it, osis2mod can't tell that it's a
verse at all. mod2osis may not be functioning properly, but it really
should be generating <verse> elements with osisIDs. (I'll check on this.)

> q2) do the chapters need to have a complete osisID a la "Matt.1" or are there short versions possible - read in the Osis manual that a simple leading blank will be interpreted as referring to the current text, but teh refference is a bit ambiguous and not covered by an example.

The full osisID is necessary. mod2osis definitely could not understand
an abbreviated form, but I also don't believe there is an abbreviated
form defined in OSIS. I don't know what the OSIS Manual is talking about
where it says that about a leading blank, but I don't have a copy in
front of me at the moment.

> q3) Currently the chapters are coded as <chapter value="1"> and the verses as <verse value="1">. A simple search and replace would need to be done at chapter level to get all verses coded or at book level to get all chapters coded properly, but a e.g. sed script would probably do this in a minute for the whole book.  Are there some sample scripts about which would do the above, which I could adapt? Also a regex would probably cover this but I am clueless in these too.

You need something that can keep track of state since every book,
chapter, & verse (minimally) needs a full osisID. I don't think sed or
regex replaces could handle this. I use Perl, but any scripting language
should suffice. (Sorry, I don't have anything on hand to offer, though.)

> q4) the Bible is obviously in unicode with intermittent changes from l->r and r->l. The - to me - odd result of this is that each verse follows following scheme "<verse value="1"></verse> edhfoo fgfuwgfp " with teh text trailing the end marker. At least this what I see when I open the module in gedit, emacs and kate. is this normal and ok?

No, this is not valid. The <verse> element needs to surround the content
of the verse. This sounds like it is the fault of bad SFM to XML
conversion, not a rendering or bidi problem.

--Chris




More information about the sword-devel mailing list