[osis-core] 4 issues
Chris Little
osis-core@bibletechnologieswg.org
Sat, 31 Jan 2004 11:10:41 -0600
Troy A. Griffitts wrote:
> o discuss best practices for saving VALUES. My case: I have a
> Greek lexicon with OCCURANCE information. I want to mark the occurance
> text so I can optionally show or hide it, but would also like to save
> the OCCURANCE DATA in an attribute. Example:
>
> logos - word
>
> <seg type="x-occurance:157">This word occurs one hundred and fifty
> seven times in the New Testament</seg>
>
>
> the above example is my BEST PRACTICE idea. But would like to know
> the thoughts of placing the 157 VALUE where I have.
How about <seg type="x-occurenceCount" n="157">This word...</seg>.
> o I was assured <w> ATTRIBUTES would NOT be forced to be an
> osisID. This was the purpose of osisGenType. But in forcing the prefix
> (to which I DID WILLINGLY concede), we seem to have added additional
> restrictions which make my documents invalid. Here's what we did A LONG
> TIME AGO:
>
> <w
> lemma="x-Strongs:1234|x-Strongs:2345"
> morph="x-Robinsons:V-AAI1P|x-Robinsons:N-ASM"
> >
> eternity
> </w>
>
>
> from the discussed change, I think I need to:
>
> <w
> lemma="strongs:1234 strongs:2345"
> morph="robinsons:V-AAI1P robinsons:N-ASM"
> >
> eternity
> </w>
>
> But this is not valid.
>
> Problem 1: WE AGREE NOT TO LIMIT THE TEXT BEYOND NOT INCLUDING A SPACE.
> Which is what my last example shows, and is still invalid (I think
> because of the '-'). I think this is just an oversight.
>
> Problem 2: I really liked the '|' better than the space. I remember
> discussing this with Patrick and I think we decided that we knew of
> codes that included spaces.
>
> MY ARGUMENT:
>
> ' ' is a language script character of many languages.
> '|' is a NOT. It is a computer symbol used expressly for delimeting
> purposes.
>
>
> I can:
>
> PREFER: morph="robinsons:V-AAI1P|robinsons:N-ASM"
> LIVE WITH: morph="robinsons:V-AAI1P robinsons:N-ASM"
> REALLY DON'T WANT: to transform lemma/morph values to osisID
> restrictions with an escape character
> CANNOT LIVE WITH: forcing transformation of the lemma/morph values with
> no common escape character
In light of our motion towards supporting PSIs and info:, perhaps we
should adopt something like the URI format for our regex, allowing %HEX
escapes and spaces to divide elements. That provides alphanumerics plus
"-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")".
(http://www.rfc-editor.org/rfc/rfc2396.txt for details.)
HOWEVER, that said... it is still MY opinion that we should make these
valid osisIDs (Troy's "REALLY DON'T WANT" case). This gives us a place
to look stuff up. Failing this, we don't have a way to look these
values up in other documents and find out what they mean (unless we
define a mapping mechanism such as "reserved" -> "_").
--Chris