[sword-devel] OSIS Glosses?
Greg Hellings
greg.hellings at gmail.com
Fri Dec 12 08:08:37 MST 2014
Yes, that was my point. However, Sword does not use standard XML parsing
tools that operate this way and thus we might be able to handle an entity
and the raw character differently. This would still permitting the file to
validate with standard XML tools as well.
--Greg
On Dec 12, 2014 8:52 AM, "Sebastien KOECHLIN" <seb.sword at koocotte.org>
wrote:
> Hello,
>
> In fact, an entity can store more than a single character. It can be a
> string (less
> common) or a more complex structure (I've never seen it in real usage).
>
> See
> http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx
> for examples.
>
>
> To answer Greg, when you read an XML file, you create a memory tree
> structure of elements (having attributes) and text nodes (and less common
> nodes: comments, processing instructions...).
>
> When the text node is parsed, any escaping or entity is substitued with its
> final value, resulting in a "canonical" string. You can not tell if any
> character was put raw in the file, or if it was an entity. So "Sword",
> "Sword", "Sword" and "<[CDATA[Sword]]>" result in the same text
> node. You don't have to care how the text was writen in the file, you got
> the same final result.
>
>
> HTTP (not HTML) use a different encoding system with %<hexa>, (for example
> %20 for space) that allow to mix easilly both escaping systems. This could
> be used for escaping space(%20), colons(%3A) and percents(%25) in gloss,
> lemma and morph. It should allow to represent any character in the
> content.
>
>
>
> On Fri, Dec 12, 2014 at 08:01:31AM -0600, Greg Hellings wrote:
> > If that's the case, how does it handle escaping <>? I believe entity
> > replacement is after XML validation but before passing them to a
> > transformer or such.
> > On Dec 12, 2014 7:52 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
> >
> > > Best I can recall:
> > > Nope. An entity is merely an alternate way of specifying a character.
> The
> > > XML parser is supposed to replace the entity with the corresponding
> code
> > > point before the value is evaluated against the schema.
> > >
> > > On Dec 12, 2014, at 8:49 AM, Greg Hellings <greg.hellings at gmail.com>
> > > wrote:
> > >
> > > It should be possible to escape any such characters with an XML
> entity, no?
> > > On Dec 12, 2014 7:44 AM, "DM Smith" <dmsmith at crosswire.org> wrote:
> > >
> > >>
> > >> > On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <refdoc at gmx.net>
> wrote:
> > >> >
> > >> > Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr
> > >> > Von: "Troy A. Griffitts" <scribe at crosswire.org>
> > >> >
> > >> >> Not sure, but I thought we used optional prefixes to specify the
> kind
> > >> of gloss if there are multiple, e.g., > gloss="en_US:18 wheeler
> > >> en_UK:articulated lorry"
> > >> >
> > >> > Should there be an option to escape colons?
> > >>
> > >> IMHO:
> > >> Yes.
> > >>
> > >> The definition of gloss in the schema is xs:string, not osisGenRegex.
> > >> The former places no semantic on the content an allows for an empty
> > >> string.
> > >>
> > >> If gloss should have a semantic, then it should be changed in the OSIS
> > >> spec.
> > >>
> > >> The latter is used by lemma and morph and is specified as:
> > >> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)
> > >> which basically is work:value.
> > >> If I read this right it does not allow for : to be escaped. I know we
> > >> allow lemma=“x:a y:b” but I don’t see that this allows for the
> pattern to
> > >> be repeated, separated by spaces.
> > >>
> > >> The pattern would need to change ([^:\s])+ to (\\:|[^:\s])+ [ not
> > >> tested ]
> > >>
> > >> In His Service,
> > >> DM
> > >> _______________________________________________
> > >> sword-devel mailing list: sword-devel at crosswire.org
> > >> http://www.crosswire.org/mailman/listinfo/sword-devel
> > >> Instructions to unsubscribe/change your settings at above page
> > >
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20141212/d3949902/attachment.html>
More information about the sword-devel
mailing list