[osis-users] Validation related OSIS questions

Markku Pihlaja markku.pihlaja at sempre.fi
Wed Nov 14 03:40:14 MST 2012

Thanks Peter,

I guess I need to enlighten you on the goals of our project a bit.

But since this is a rather lengthy explanation, I'll just mention to those
who read my original questions that two questions still remain relevant:


1) Is it possible to declare new custon attributes to tags, such as
<verse osisID="Gen.3.8"  sID="Gen.3.8" *x-FI_ID*="1. Moos. 3:8" />
and how do I do that, if it is possible?

2) Is this ok for encoding the same special character for different usages:
<milestone type="x-punctuation-dash" marker="mdash" />
<milestone type="x-range-dash" marker="mdash" />


And now to the explanation.

We are not creating a file solely for use for programmers to produce
different electronic applications of the Bible. Actually, that might not
even be our primary goal. Or well, it is one of two. The main point of this
project is to update the official source version of the current official
Finnish translation of the Bible, which was made in 1992. The source files
are word processor files also from that time and in need of technical

In addition to just updating the file format, we naturally also want to
supply structural info about the contents instead of the more formatting
oriented info in the old versions.

There are two different main target groups for this source file. Those
interested in electronic applications are one - a growing one, admittedly.
But we also want to serve the more traditional target group: book
publishers. This is why our goal is not to produce highly optimized XML
code but rather consider also "readability" and "usability" of the code.
Those words are in quotation marks because we're not talking in purely
human reading. But we admit that the technical facilities of book
publishers might not always be perfect, and in particular using some
external library or coding in general might be quite an effort for some of
them. So the processing might be half-human - that means for example using
search & replace operations to produce the desired formatting for a printed

Including the Finnish abbreviation in every single verse might be an
overkill, I admit. The abbreviation seldom gets printed into every verse.
Usually not into every chapter, either. But for those cases when they do,
we want to supply - as an alternative - something easier than code
libraries and lookup tables. The extra x-FI-ID attribute would enable
relatively simple search & replace operations to reach the goal. I also
believe it might make things easier for the programmer as well.

I'm perfectly aware that from a purely technical point of view it does mean
loads of redundant data. But in database terms, we don't need normal forms
for our database, we'd rather supply more flexible alternative ways of
processing the data.


As for encoding the ranges in titles as ranges: I will very probably do
that. But I'm still keeping in mind the search & replace type of publisher
/ editor. Having done quite a lot of searching & replacing myself, I know
how easy it is to do a global replace that accidentally turns also Bill
Gates into Bsick Geatss ;), especially when dealing with too large amounts
of data to confirm every replace. That was my original motivation for
avoiding use of e.g. the actual em dash character "–" for several different
purposes. And the display part of the range would still include the dash
even though the range was encoded.

But yes, it certainly also makes sense to me what you suggest: that for an
application it's definitely a better approach to search for a range markup
than scan for characters that look like a range.

About the SWORD library:
We are planning to include some references to tools that can be utilized
with our OSIS file and will be glad to mention SWORD there. But could you
write me a "hard facts" nutshell about it? I tried browsing the SWORD
website for a while but didn't really find some essential info, such as
what programming language the library is for - and what exactly it is meant
for. For example "Research manipulation of Biblical texts" doesn't really
say much.



2012/11/13 Peter von Kaehne <refdoc at gmx.net>

> On 13/11/12 16:29, Markku Pihlaja wrote:
>> Thanks again!
>> I'll first give you one further question and then comment on your
>> previous answers.
>> What would be a good way of including language versions of verse and
>> chapter id's in the markup? I previously checked here that osisID's have
>> to use the standard keywords and syntax. But I'd love to be able to
>> supply the Finnish abbreviation of each verse as additional information.
>> That is: when the osisID of a verse is "Gen.3.8", it would make life
>> much easier for utilizers of this OSIS file if the verse also somehow
>> contained the Finnish standard notation "1. Moos. 3:8".
> I think this is in general total overkill. Assuming you are not creating
> an OSIS document because OSIS so beautiful, but you are creating an OSIS
> document in order to use it in software you have two problems:
> 1) how to find a Finnish referenced verse ("1. Moos. 3:8") or verse range,
> 2) ensure appropriate display of references.
> Both can be easily solved without having this kind of information a
> million times repeated within the text.
> For (1) you need a parsing solution which will parse arbitrary Finnish
> references and create an OSIS reference from that. CrossWire's Sword
> library does that and does it well.
> For (2) - again, this is a matter of simple lookup in a table during
> rendering. Again, the Sword library would solve that for you.
> In essence, you should not look at the OSIS references as English
> abbreviations but as tokens for the computer which happen to be somewhat
> similar to English abbreviations.
>  The "obvious" way would be to be able to add a new attribute to the
>> verse tag, like:
>> <verse osisID="Gen.3.8"  sID="Gen.3.8" FI_ID="1. Moos. 3:8" />
>> but that probably isn't possible, is it? Or can I somehow declare new
>> custom attributes like Chris declared new custom dash entities in his
>> last reply?
> As an aside, any extra attributes you wish to use should look like
> x-MyAttribute. so, here e.g. "x-fi_ID"
>  2012/11/8 Peter von Kaehne <refdoc at gmx.net <mailto:refdoc at gmx.net>>
>>      > I'd like to be able to use some code or entity instead of an
>>     actual dash
>>      > characters (– or —), at least in some places, since we have two
>>      > different semantics for the dashes and I'd like to keep them
>>     separate in the code.
>>     Don't have an answer for that, but what is the semantic and is there
>>     not a better way to code it than the somewhat arbitrary length of a
>>     dash character?
>> That's a fair question. Indeed it would be nice to find a better way
>> (I'm not using the length to separate these cases but just different
>> notations of the same length), but I haven't (at least yet) found the
>> better way.
>> The two different cases are normal em dashes within sentences as
>> punctuation – just like the dashes in this sentence – and then to
>> indicate a range of chapters and verses in some headings. The latter is
>> not in the markup but in the content to be printed (or otherwise shown
>> to the reader). For example: "Second Speech of Moses (4:44–11:32)" just
>> before Deut.4.44. The range has been included in the official
>> translation by the translation committee and thus cannot be omitted.
> References as part of titles exist in OSIS and would encode what you want
> to encode.
>> At least in Finnish we nowadays use the em dash to indicate ranges as
>> well as punctuation. And I'd just like to enable the users of this OSIS
>> file to search for one or the other without getting ambiguous or extra
>> results.
> So, if you encode the range properly your search should then go for the
> range/passage  rather than simply for a string of text which happen to look
> like a reference. Does this make sense?
>    2012/11/9 Chris Little <chrislit at crosswire.org
>> <mailto:chrislit at crosswire.org**>>
>>         How would you suggest that an exception like this should be
>>         coded? Add
>>         some custom type attribute value to indicate special handling in
>>         layout?
>>     This was exactly the case for which <chapter> was made milestonable.
>>     You can switch all of your chapter elements to milestones:
>> I was hoping for some other solution. My impression is that these
>> milestone versions of structure indicators weaken the value and
>> usability of markup: I'd guess there are numerous tools that assume
>> "strong" markup where at least the basic structures are marked with
>> proper start and end tags instead of milestones.
> You are right wrt generic xml tools. Specifically a DOM query or an XPATH
> expression based query which picks up what is easily described as a "child"
> of a verse or chapter is a lot more complicated to create if start and end
> tag are milestoned.
> But - bear in mind again, the Sword library is a entirely different tool,
> not an XML tool and it is set up to give you fine grained access.
> Irrespective of XML niceties.
> Peter
> ______________________________**_________________
> osis-users mailing list
> osis-users at crosswire.org
> http://www.crosswire.org/**mailman/listinfo/osis-users<http://www.crosswire.org/mailman/listinfo/osis-users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20121114/1bc0e1a3/attachment-0001.html>

More information about the osis-users mailing list