<div dir="ltr">There are two period-delimited lists in the OSIS spec: osisWork and the identifier and subidentifier components of what I call "osisPassage". These are respectively:<br><br>Bible.en.KJV<br>John.3.16!a.1<br>
<br>The regular expression for osisWork is:<br><span style="font-family: courier new,monospace;">((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?</span><br><br>Whereas the regular expression for the segment lists in osisPassage are:<br>
<span style="font-family: courier new,monospace;">((\p{L}|\p{N}|_|(\\[^\s]))+)((\.(\p{L}|\p{N}|_|(\\[^\s]))+)*)?</span><br><br>Namely, the osisPassage segments are allowed to have escaped characters whereas the osisWork segments are not. Is this intentional? Why would one allow escapes but the other not?<br>
<br>BTW, I have simplified the osisWork regular expression to:<br><br><span style="font-family: courier new,monospace;">segment_regexp = re.compile(ur"""</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> (?P<segment> \w+ )</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> (?P<delimiter> \. | $ )?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">""", re.VERBOSE | re.UNICODE)</span><br><br>And the osisPassage identifier/subidentifier segments to:<br><br><span style="font-family: courier new,monospace;">segment_regex = re.compile(ur"""</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;"> (?P<segment> (?: \w | \\\S )+ )</span><br style="font-family: courier new,monospace;"><span style="font-family: courier new,monospace;"> (?P<delimiter> \. | $ )?</span><br style="font-family: courier new,monospace;">
<span style="font-family: courier new,monospace;">""", re.VERBOSE | re.UNICODE)</span><br><br>These patterns get matched repeatedly until the end of the string. The Unicodified <span style="font-family: courier new,monospace;">\w</span> character class in Python may not exactly match the correspondingly used XML Schema regular expression character classes, but they should be very close and practically equivalent.<br>
<br>So is there a reason why osisWork and osisPassage have different segments allowed?<br><br>Weston<br></div>