[sword-devel] OSIS Glosses?

Sebastien KOECHLIN seb.sword at koocotte.org
Fri Dec 12 07:51:34 MST 2014


Hello,

In fact, an entity can store more than a single character. It can be a string (less
common) or a more complex structure (I've never seen it in real usage).

See
http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx
for examples.


To answer Greg, when you read an XML file, you create a memory tree
structure of elements (having attributes) and text nodes (and less common
nodes: comments, processing instructions...).

When the text node is parsed, any escaping or entity is substitued with its
final value, resulting in a "canonical" string.  You can not tell if any
character was put raw in the file, or if it was an entity.  So "Sword",
"&#83;word", "&#x53;word" and "<[CDATA[Sword]]>" result in the same text
node. You don't have to care how the text was writen in the file, you got
the same final result.


HTTP (not HTML) use a different encoding system with %<hexa>, (for example
%20 for space) that allow to mix easilly both escaping systems.  This could
be used for escaping space(%20), colons(%3A) and percents(%25) in gloss,
lemma and morph.  It should allow to represent any character in the content.



On Fri, Dec 12, 2014 at 08:01:31AM -0600, Greg Hellings wrote:
> If that's the case, how does it handle escaping <>? I believe entity
> replacement is after XML validation but before passing them to a
> transformer or such.
> On Dec 12, 2014 7:52 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
> 
> > Best I can recall:
> > Nope. An entity is merely an alternate way of specifying a character. The
> > XML parser is supposed to replace the entity with the corresponding code
> > point before the value is evaluated against the schema.
> >
> > On Dec 12, 2014, at 8:49 AM, Greg Hellings <greg.hellings at gmail.com>
> > wrote:
> >
> > It should be possible to escape any such characters with an XML entity, no?
> > On Dec 12, 2014 7:44 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
> >
> >>
> >> > On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <refdoc at gmx.net> wrote:
> >> >
> >> > Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr
> >> > Von: "Troy A. Griffitts" <scribe at crosswire.org>
> >> >
> >> >> Not sure, but I thought we used optional prefixes to specify the kind
> >> of gloss if there are multiple, e.g., > gloss="en_US:18&nbsp;wheeler
> >> en_UK:articulated&nbsp;lorry"
> >> >
> >> > Should there be an option to escape colons?
> >>
> >> IMHO:
> >> Yes.
> >>
> >> The definition of gloss in the schema is xs:string, not osisGenRegex.
> >> The former places no semantic on the content an allows for an empty
> >> string.
> >>
> >> If gloss should have a semantic, then it should be changed in the OSIS
> >> spec.
> >>
> >> The latter is used by lemma and morph and is specified as:
> >> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
> >> which basically is work:value.
> >> If I read this right it does not allow for :  to be escaped. I know we
> >> allow lemma=“x:a y:b” but I don’t see that this allows for the pattern to
> >> be repeated, separated by spaces.
> >>
> >> The pattern would need to change ([^:\s])+ to (\\:|[^:\s])+  [ not
> >> tested ]
> >>
> >> In His Service,
> >>         DM
> >> _______________________________________________
> >> sword-devel mailing list: sword-devel at crosswire.org
> >> http://www.crosswire.org/mailman/listinfo/sword-devel
> >> Instructions to unsubscribe/change your settings at above page
> >



More information about the sword-devel mailing list