[sword-devel] XML attribute delimiters in OSIS files?
Troy A. Griffitts
scribe at crosswire.org
Wed Oct 26 11:50:23 MST 2011
Hey guys. Just did some testing. If you have a look at
sword/tests/xmltest and try the problem case:
./xmltest "<title type='nested \"quotation\" '/>"
(xmltest already tries to add an attribute to your input which tests for
embedded quotes, so you'll see an addedAttribute in your output)
You get:
[scribe at charis tests]$ ./xmltest "<title type='nested \"quotation\" '/>"
<title type='nested "quotation" '/>
<title type='nested "quotation" '/>
<title addedAttribute='with a " quote' type='nested "quotation" '/>
Tag name: [title]
- attribute: [addedAttribute] = [with a " quote]
4 parts:
with
a
"
quote
- attribute: [type] = [nested "quotation" ]
3 parts:
nested
"quotation"
isEmpty: 1
isEndTag: 0
It is a little odd that the second attribute has "3 parts", but looking
at the example given, it have a space at the end, so I supposed this
might be correct.
Hope this is helpful in tracking this down,
Troy
On 10/26/2011 06:38 PM, DM Smith wrote:
> On 10/26/2011 09:47 AM, Peter von Kaehne wrote:
>> Is there any actual credible reason for having quotation marks in
>> attributes? I agree that it may be grammatically correct for XML as
>> such, but OSIS's attributes are defined and do not contain quotation
>> marks. And x-marked attributes are largely thrown out during the
>> osis2mod run, no? Or at least ignored - apart from our own - like
>> x-preverse.
>>
>> Peter
>
> I had never spent the time to look at the allowable attribute values
> in an OSIS document. Now, having looked at the schema, it is allowed
> to nest quotes. See below for details.
>
> I think there are many good reasons that a single quote will be found
> in an attribute value. Many languages use it for other things than
> quoting.
>
> I can only think of a few, probably obscure, reasons for a double
> quote to be there. E.g chapterTitle='xxx aka "yyy"', who='James
> "Jimmy" Smith', ...
>
> Osis2mod *should* allow for all well-formed, valid (both syntactically
> and semantically) OSIS documents. Regarding quoting attribute values,
> the recommendation still stands, use double quotes if at all possible,
> but also avoid " and ' too. (Note that these entities are
> only needed within attribute values and never elsewhere in the text.)
>
> (Below I'm using x at y to mean element x with attribute y.)
>
> In looking at this, I think there are some bugs in the definition of
> l at type, lg at type, and rdg at type.
>
> In Him,
> DM
>
> Here are the attributes that allow for arbitrary text:
> actor at who
> <xs:attribute name="who" type="xs:string" use="optional"/>
> contributor at file-as
> <xs:attribute name="file-as" type="xs:string" use="optional"/>
> a at href
> <xs:attribute name="href" type="xs:string" use="required"/>
> abbr at expansion
> <xs:attribute name="expansion" type="xs:string" use="optional"/>
> chapter at chapterTitle
> <xs:attribute name="chapterTitle" type="xs:string" use="optional"/>
> figure at alt, @catalog, @location, @rights, @size, @src
> <xs:attribute name="alt" type="xs:string" use="optional"/>
> <xs:attribute name="catalog" type="xs:string" use="optional"/>
> <xs:attribute name="location" type="xs:string" use="optional"/>
> <xs:attribute name="rights" type="xs:string" use="optional"/>
> <xs:attribute name="size" type="xs:string" use="optional"/>
> <xs:attribute name="src" type="xs:string"/>
> index at index, @level1, @level2, @level3, @level4, @see
> <xs:attribute name="index" type="xs:string" use="required"/>
> <xs:attribute name="level1" type="xs:string" use="required"/>
> <xs:attribute name="level2" type="xs:string" use="optional"/>
> <xs:attribute name="level3" type="xs:string" use="optional"/>
> <xs:attribute name="level4" type="xs:string" use="optional"/>
> <xs:attribute name="see" type="xs:string" use="optional"/>
> item at role
> <xs:attribute name="role" type="xs:string" use="optional"/>
> label at role
> <xs:attribute name="role" type="xs:string" use="optional"/>
> milestone at marker
> <xs:attribute name="marker" type="xs:string" default="DEFAULT"
> use="optional"/>
> milestoneEnd at start
> <xs:attribute name="start" type="xs:string" use="required"/>
> milestoneStart at end
> <xs:attribute name="end" type="xs:string" use="required"/>
> name at regular
> <xs:attribute name="regular" type="xs:string" use="optional"/>
> q at level, @marker, @who
> <xs:attribute name="level" type="xs:string" use="optional"/>
> <xs:attribute name="marker" type="xs:string" default="DEFAULT"
> use="optional"/>
> <xs:attribute name="who" type="xs:string" use="optional"/>
> speaker at who
> <xs:attribute name="who" type="xs:string" use="optional"/>
> speech at marker
> <xs:attribute name="marker" type="xs:string" default="DEFAULT"
> use="optional"/>
> title at short
> <xs:attribute name="short" type="xs:string" use="optional"/>
> w at gloss, @src, @xlit
> <xs:attribute name="gloss" type="xs:string" use="optional"/>
> <xs:attribute name="src" type="xs:string" use="optional"/>
> <xs:attribute name="xlit" type="xs:string" use="optional"/>
> Globally (globalWithType, globalWithoutType)
> @annotateWork, @resp, @n
> <xs:attribute name="annotateWork" type="xs:string" use="optional"/>
> <xs:attribute name="resp" type="xs:string" use="optional"/>
> <xs:attribute name="n" type="xs:string" use="optional"/>
> Milestone attributes
> @sID, @eID
> <xs:attribute name="sID" type="xs:string" use="optional"/>
> <xs:attribute name="eID" type="xs:string" use="optional"/>
> osisID, osisRef, osisAnnotateType regexes allowing quotation marks:
> (look for [^...] constructs)
> <xs:pattern
> value="((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+)"/>
> <xs:pattern
> value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?"/>
> <xs:pattern
> value="(((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?:)?((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?(\-((((\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*)+)(!((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?)?(@(cp\[(\p{Nd})*\]|s\[(\p{L}|\p{N})+\](\[(\p{N})+\])?))?)?"/>
> Attribute extension regex:
> <xs:pattern value="x-([^\s])+"/>
> l at type
> <xs:union memberTypes="osisLine attributeExtension xs:string"/>
> lg at type
> <xs:union memberTypes="osisLineGroup attributeExtension xs:string"/>
> <xs:simpleType name="osisLineGroup">
> <xs:restriction base="xs:string">
> <!-- <xs:enumeration value="doxology"/> -->
> </xs:restriction>
> </xs:simpleType>
> rdg at type
> <xs:union memberTypes="osisRdg attributeExtension xs:string"/>
>
>>
>>
>> -------- Original-Nachricht --------
>>> Datum: Wed, 26 Oct 2011 08:59:14 -0400
>>> Von: DM Smith<dmsmith at crosswire.org>
>>> An: SWORD Developers\' Collaboration Forum<sword-devel at crosswire.org>
>>> Betreff: Re: [sword-devel] XML attribute delimiters in OSIS files?
>>> Ah, now I understand. This is a bug. And should be fixed. (BTW, not
>>> having
>>> the entire thread reproduced in each email makes it harder to
>>> understand
>>> the context of the email. I don't like having to go digging for the
>>> context.
>>> Having looked, I see that the first email in the thread defines
>>> delimiters.)
>>>
>>> But I'm not sure where it should be fixed. I haven't looked at the
>>> code,
>>> but as I recall, we use the SWORD parser to obtain the attribute
>>> value. My
>>> guess is that it is returning it with the quotes. If the problem is
>>> there
>>> and we fix it there, it may break a whole host of other things.
>>> (This parser
>>> is not a true XML parser, but one that is highly optimized for speed
>>> and
>>> thus we work with it's definition.)
>>>
>>> It should be easy to change osis2mod to work. I'll look into doing this
>>> soon.
>>>
>>> That said, it is and has been the recommendation that double quotes be
>>> used to wrap attribute values. It is valid to use single quotes, but
>>> it may
>>> (does) expose bugs. Fixing this bug does not change this
>>> recommendation.
>>>
>>> Until osis2mod has been changed and it is available, it is advisable to
>>> change the input so that the quoting of sID/eID pairs to be identical.
>>>
>>> In Him,
>>> DM
>>>
>>> On Oct 26, 2011, at 6:38 AM, David Haslam wrote:
>>>
>>>> Mixing double and single quotes, as per earlier messages in this
>>>> thread.
>>>>
>>>> Example (minus the chaff):
>>>>
>>>> sID="reference"
>>>> .....
>>>> eID='reference'
>>>>
>>>> But this time for the same verse, just as Chris replied, rather
>>>> than in
>>>> completely separate OSIS elements.
>>>>
>>>> As this is just an observation, I see no immediate need to give a
>>> detailed
>>>> example of what happens to the module.
>>>> To locate the places where I spotted it yesterday would take some
>>>> time.
>>>>
>>>> Perhaps the most interesting thing is that there was no error message
>>> from
>>>> osis2mod.
>>>>
>>>> And I agree with Chris, the OSIS needs fixing first, before using as
>>> input
>>>> for osis2mod.
>>>>
>>>> David
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>> http://sword-dev.350566.n4.nabble.com/XML-attribute-delimiters-in-OSIS-files-tp3907261p3940110.html
>>>
>>>> Sent from the SWORD Dev mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> sword-devel mailing list: sword-devel at crosswire.org
>>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>>> Instructions to unsubscribe/change your settings at above page
>>>
>>> _______________________________________________
>>> sword-devel mailing list: sword-devel at crosswire.org
>>> http://www.crosswire.org/mailman/listinfo/sword-devel
>>> Instructions to unsubscribe/change your settings at above page
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list