[sword-devel] OSIS Glosses?
DM Smith
dmsmith at crosswire.org
Fri Dec 12 14:03:30 MST 2014
I’ve used such long, multi-character entities before. It requires a DTD and is defined with in the DOCTYPE statement. I’m not at all sure how to blend a DOCTYPE and a schema. The OSIS schema does not have any.
I wish I had read your answer before I replied to Greg with the same. :) Excellent.
— DM
> On Dec 12, 2014, at 9:51 AM, Sebastien KOECHLIN <seb.sword at koocotte.org> wrote:
>
> Hello,
>
> In fact, an entity can store more than a single character. It can be a string (less
> common) or a more complex structure (I've never seen it in real usage).
>
> See
> http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx
> for examples.
>
>
> To answer Greg, when you read an XML file, you create a memory tree
> structure of elements (having attributes) and text nodes (and less common
> nodes: comments, processing instructions...).
>
> When the text node is parsed, any escaping or entity is substitued with its
> final value, resulting in a "canonical" string. You can not tell if any
> character was put raw in the file, or if it was an entity. So "Sword",
> "Sword", "Sword" and "<[CDATA[Sword]]>" result in the same text
> node. You don't have to care how the text was writen in the file, you got
> the same final result.
>
>
> HTTP (not HTML) use a different encoding system with %<hexa>, (for example
> %20 for space) that allow to mix easilly both escaping systems. This could
> be used for escaping space(%20), colons(%3A) and percents(%25) in gloss,
> lemma and morph. It should allow to represent any character in the content.
>
>
>
> On Fri, Dec 12, 2014 at 08:01:31AM -0600, Greg Hellings wrote:
>> If that's the case, how does it handle escaping <>? I believe entity
>> replacement is after XML validation but before passing them to a
>> transformer or such.
>> On Dec 12, 2014 7:52 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
>>
>>> Best I can recall:
>>> Nope. An entity is merely an alternate way of specifying a character. The
>>> XML parser is supposed to replace the entity with the corresponding code
>>> point before the value is evaluated against the schema.
>>>
>>> On Dec 12, 2014, at 8:49 AM, Greg Hellings <greg.hellings at gmail.com>
>>> wrote:
>>>
>>> It should be possible to escape any such characters with an XML entity, no?
>>> On Dec 12, 2014 7:44 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
>>>
>>>>
>>>>> On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <refdoc at gmx.net> wrote:
>>>>>
>>>>> Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr
>>>>> Von: "Troy A. Griffitts" <scribe at crosswire.org>
>>>>>
>>>>>> Not sure, but I thought we used optional prefixes to specify the kind
>>>> of gloss if there are multiple, e.g., > gloss="en_US:18 wheeler
>>>> en_UK:articulated lorry"
>>>>>
>>>>> Should there be an option to escape colons?
>>>>
>>>> IMHO:
>>>> Yes.
>>>>
>>>> The definition of gloss in the schema is xs:string, not osisGenRegex.
>>>> The former places no semantic on the content an allows for an empty
>>>> string.
>>>>
>>>> If gloss should have a semantic, then it should be changed in the OSIS
>>>> spec.
>>>>
>>>> The latter is used by lemma and morph and is specified as:
>>>> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
>>>> which basically is work:value.
>>>> If I read this right it does not allow for : to be escaped. I know we
>>>> allow lemma=“x:a y:b” but I don’t see that this allows for the pattern to
>>>> be repeated, separated by spaces.
>>>>
>>>> The pattern would need to change ([^:\s])+ to (\\:|[^:\s])+ [ not
>>>> tested ]
>>>>
>>>> In His Service,
>>>> DM
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list