<p dir="ltr">Yes, that was my point. However, Sword does not use standard XML parsing tools that operate this way and thus we might be able to handle an entity and the raw character differently. This would still permitting the file to validate with standard XML tools as well.</p>
<p dir="ltr">--Greg</p>
<div class="gmail_quote">On Dec 12, 2014 8:52 AM, "Sebastien KOECHLIN" <<a href="mailto:seb.sword@koocotte.org">seb.sword@koocotte.org</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
In fact, an entity can store more than a single character. It can be a string (less<br>
common) or a more complex structure (I've never seen it in real usage).<br>
<br>
See<br>
<a href="http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx" target="_blank">http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx</a><br>
for examples.<br>
<br>
<br>
To answer Greg, when you read an XML file, you create a memory tree<br>
structure of elements (having attributes) and text nodes (and less common<br>
nodes: comments, processing instructions...).<br>
<br>
When the text node is parsed, any escaping or entity is substitued with its<br>
final value, resulting in a "canonical" string. You can not tell if any<br>
character was put raw in the file, or if it was an entity. So "Sword",<br>
"&#83;word", "&#x53;word" and "<[CDATA[Sword]]>" result in the same text<br>
node. You don't have to care how the text was writen in the file, you got<br>
the same final result.<br>
<br>
<br>
HTTP (not HTML) use a different encoding system with %<hexa>, (for example<br>
%20 for space) that allow to mix easilly both escaping systems. This could<br>
be used for escaping space(%20), colons(%3A) and percents(%25) in gloss,<br>
lemma and morph. It should allow to represent any character in the content.<br>
<br>
<br>
<br>
On Fri, Dec 12, 2014 at 08:01:31AM -0600, Greg Hellings wrote:<br>
> If that's the case, how does it handle escaping <>? I believe entity<br>
> replacement is after XML validation but before passing them to a<br>
> transformer or such.<br>
> On Dec 12, 2014 7:52 AM, "DM Smith" <<a href="mailto:dmsmith@crosswire.org">dmsmith@crosswire.org</a>> wrote:<br>
><br>
> > Best I can recall:<br>
> > Nope. An entity is merely an alternate way of specifying a character. The<br>
> > XML parser is supposed to replace the entity with the corresponding code<br>
> > point before the value is evaluated against the schema.<br>
> ><br>
> > On Dec 12, 2014, at 8:49 AM, Greg Hellings <<a href="mailto:greg.hellings@gmail.com">greg.hellings@gmail.com</a>><br>
> > wrote:<br>
> ><br>
> > It should be possible to escape any such characters with an XML entity, no?<br>
> > On Dec 12, 2014 7:44 AM, "DM Smith" <<a href="mailto:dmsmith@crosswire.org">dmsmith@crosswire.org</a>> wrote:<br>
> ><br>
> >><br>
> >> > On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <<a href="mailto:refdoc@gmx.net">refdoc@gmx.net</a>> wrote:<br>
> >> ><br>
> >> > Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr<br>
> >> > Von: "Troy A. Griffitts" <<a href="mailto:scribe@crosswire.org">scribe@crosswire.org</a>><br>
> >> ><br>
> >> >> Not sure, but I thought we used optional prefixes to specify the kind<br>
> >> of gloss if there are multiple, e.g., > gloss="en_US:18&nbsp;wheeler<br>
> >> en_UK:articulated&nbsp;lorry"<br>
> >> ><br>
> >> > Should there be an option to escape colons?<br>
> >><br>
> >> IMHO:<br>
> >> Yes.<br>
> >><br>
> >> The definition of gloss in the schema is xs:string, not osisGenRegex.<br>
> >> The former places no semantic on the content an allows for an empty<br>
> >> string.<br>
> >><br>
> >> If gloss should have a semantic, then it should be changed in the OSIS<br>
> >> spec.<br>
> >><br>
> >> The latter is used by lemma and morph and is specified as:<br>
> >> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)<br>
> >> which basically is work:value.<br>
> >> If I read this right it does not allow for : to be escaped. I know we<br>
> >> allow lemma=“x:a y:b” but I don’t see that this allows for the pattern to<br>
> >> be repeated, separated by spaces.<br>
> >><br>
> >> The pattern would need to change ([^:\s])+ to (\\:|[^:\s])+ [ not<br>
> >> tested ]<br>
> >><br>
> >> In His Service,<br>
> >> DM<br>
> >> _______________________________________________<br>
> >> sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
> >> <a href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
> >> Instructions to unsubscribe/change your settings at above page<br>
> ><br>
<br>
_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
<a href="http://www.crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above page</blockquote></div>