[osis-users] osis.py
Weston Ruter
westonruter at gmail.com
Sat Jun 26 07:23:36 MST 2010
Excellent questions, Robert.
The OSIS XML Schema has the following regular expression for the osisWork
type:
((\p{L}|\p{N}|_)+)((\.(\p{L}|\p{N}|_)+)*)?
Which I'm simplifying in Python to (with re.UNICODE):
\w+(\.\w+)*
Note that this is even more restrictive than the passage part of an osisID:
(\p{L}|\p{N}|_|(\\[^\s]))+)(\.(\p{L}|\p{N}|_|(\\[^\s]))*)*
Which again I'm simplifying in Python to:
(\w|\\\S)+(\.(\w|\\\S)+)*
Note that for both osisWork and osisPassage, not even a bare hyphen is
technically allowed, so using "Bible.en.OET-LV" would be illegal.
Furthermore, osisWorks also don't allow escapes (but osisPassages do), so
this would also be illegal "Bible.en.OET\-LV". So backslash escapes are
allowed in osisPassages but not osisWorks, and quoted segments are allowed
in neither (Bible.en."Freely-Given.org".OET-LV.2011). I am not sure why the
osisWork is a more limited subset of the pattern used in osisPassage. Not
being able to include a domain name in an osisWork seems like a big
drawback.
So far as encoding "OET-LV" in the osisWork, since hyphens aren't allowed,
an alternative option is to use "OET_LV". But actually, it would probably be
best to just break it up into two segments: "OET.LV". Multi-segment work
names aren't yet supported by osis.py (it allows a single segment for
publisher and a single segment for the work name).
Troy mentioned that nothing was generally agreed upon beyond
"Type.lang.ABBR" (e.g. Bible.en.KJV), but I have been thinking [1] about
standard ways to indicate version, revision, and edition numbers or names,
like perhaps:
v2_1
r2341
edName
[1]
http://github.com/openscriptures/api/blob/92b6ee5420c269830baf85503270ccd4cdf4d6c5/osis.py#L451
Troy and Chris: any more insights into the osisWork identifier?
Thanks!
Weston
On Fri, Jun 25, 2010 at 9:39 PM, Robert Hunt <hunt.robertj at gmail.com> wrote:
> On 21/06/10 19:28, Weston Ruter wrote:
>
> All of the objects are now built out for osis.py, a Python module for
> representing OSIS "things". These include:
>
> - OsisWork (Bible.en.ChurchOfEngland.KJV.1611)
> - type (Bible)
> - language (en)
> - publisher (ChurchOfEngland)
> - name (KJV)
> - pub_date (1611)
> - pub_date_granularity (1)
>
> I'm planning to start studying, testing and using Weston's code in two
> weeks time, but in just re-reading this email I have some questions. I am
> working to start a new Bible translation. The details would be:
>
> - type (Bible)
> - language (en)
> - publisher (Freely-Given.org)
> - name (OET-LV)
> - pub_date (2011)
> - pub_date_granularity (1) ??? What's this
>
> My main question is: What if the publisher name has a dot in it like the
> above? Can it be quoted (or have the dot escaped)?
> e.g., OsisWork (Bible.en."Freely-Given.org".OET-LV.2011) or OsisWork
> (Bible.en.Freely-Given\.org.OET-LV.2011)
>
> Other questions include:
> What if there's a version number? e.g, 0.2 or 1.0.1
> What if there's an edition name? e.g., Men's Study Edition. (but maybe
> that's irrelevant if the Biblical text remains constant and it's only a
> "packaging" decision regarding additional notes and side-boxes???).
>
> Just thinking out loud,
> Robert.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Open Scriptures" group.
> To post to this group, send email to open-scriptures at googlegroups.com.
> To unsubscribe from this group, send email to
> open-scriptures+unsubscribe at googlegroups.com<open-scriptures%2Bunsubscribe at googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/open-scriptures?hl=en.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20100626/b05809ac/attachment.html>
More information about the osis-users
mailing list