[sword-devel] OSIS Glosses?

DM Smith dmsmith at crosswire.org
Fri Dec 12 14:03:30 MST 2014


I’ve used such long, multi-character entities before. It requires a DTD and is defined with in the DOCTYPE statement. I’m not at all sure how to blend a DOCTYPE and a schema. The OSIS schema does not have any.

I wish I had read your answer before I replied to Greg with the same. :) Excellent.

— DM
> On Dec 12, 2014, at 9:51 AM, Sebastien KOECHLIN <seb.sword at koocotte.org> wrote:
> 
> Hello,
> 
> In fact, an entity can store more than a single character. It can be a string (less
> common) or a more complex structure (I've never seen it in real usage).
> 
> See
> http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx
> for examples.
> 
> 
> To answer Greg, when you read an XML file, you create a memory tree
> structure of elements (having attributes) and text nodes (and less common
> nodes: comments, processing instructions...).
> 
> When the text node is parsed, any escaping or entity is substitued with its
> final value, resulting in a "canonical" string.  You can not tell if any
> character was put raw in the file, or if it was an entity.  So "Sword",
> "&#83;word", "&#x53;word" and "<[CDATA[Sword]]>" result in the same text
> node. You don't have to care how the text was writen in the file, you got
> the same final result.
> 
> 
> HTTP (not HTML) use a different encoding system with %<hexa>, (for example
> %20 for space) that allow to mix easilly both escaping systems.  This could
> be used for escaping space(%20), colons(%3A) and percents(%25) in gloss,
> lemma and morph.  It should allow to represent any character in the content.
> 
> 
> 
> On Fri, Dec 12, 2014 at 08:01:31AM -0600, Greg Hellings wrote:
>> If that's the case, how does it handle escaping <>? I believe entity
>> replacement is after XML validation but before passing them to a
>> transformer or such.
>> On Dec 12, 2014 7:52 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
>> 
>>> Best I can recall:
>>> Nope. An entity is merely an alternate way of specifying a character. The
>>> XML parser is supposed to replace the entity with the corresponding code
>>> point before the value is evaluated against the schema.
>>> 
>>> On Dec 12, 2014, at 8:49 AM, Greg Hellings <greg.hellings at gmail.com>
>>> wrote:
>>> 
>>> It should be possible to escape any such characters with an XML entity, no?
>>> On Dec 12, 2014 7:44 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
>>> 
>>>> 
>>>>> On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <refdoc at gmx.net> wrote:
>>>>> 
>>>>> Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr
>>>>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
>>>>> 
>>>>>> Not sure, but I thought we used optional prefixes to specify the kind
>>>> of gloss if there are multiple, e.g., > gloss="en_US:18&nbsp;wheeler
>>>> en_UK:articulated&nbsp;lorry"
>>>>> 
>>>>> Should there be an option to escape colons?
>>>> 
>>>> IMHO:
>>>> Yes.
>>>> 
>>>> The definition of gloss in the schema is xs:string, not osisGenRegex.
>>>> The former places no semantic on the content an allows for an empty
>>>> string.
>>>> 
>>>> If gloss should have a semantic, then it should be changed in the OSIS
>>>> spec.
>>>> 
>>>> The latter is used by lemma and morph and is specified as:
>>>> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
>>>> which basically is work:value.
>>>> If I read this right it does not allow for :  to be escaped. I know we
>>>> allow lemma=“x:a y:b” but I don’t see that this allows for the pattern to
>>>> be repeated, separated by spaces.
>>>> 
>>>> The pattern would need to change ([^:\s])+ to (\\:|[^:\s])+  [ not
>>>> tested ]
>>>> 
>>>> In His Service,
>>>>        DM
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>> 
> 
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page




More information about the sword-devel mailing list