[osis-users] OSIS with TEI and XHTML5 (was Fwd: Standardizing on a Web Infrastructure and Web Service API for Scripture)
Weston Ruter
westonruter at gmail.com
Tue Jan 19 23:03:54 MST 2010
Hi Troy :-) Thanks for following up on this.
What _practical_ benefit do you see of importing elements directly from the
> TEI specification? Really?
>
Not directly importing from TEI specifically (which I actually am very
unfamiliar with), but to re-use an existing specification and its
established semantics seems preferable over imperfectly cloning bits and
pieces. Since OSIS doesn't directly import from TEI, existing XML processors
that understand markup in the TEI namespace won't know what to do with OSIS.
Whether the elements from TEI or XHTML are imported, or a combination of
both, it doesn't so much matter.
As to the XHTML suggestion, you know OSIS is geared at marking the semantics
> of the text and not how the text should be displayed. XHTML has the direct
> opposite design goal (not as much with CSS and XHTML, but still elements are
> geared around marking out display elements). My suspicion is that using
> XHTML elements directly would bring in way to much display oriented logic.
>
This is true of HTML4 and XHTML 1.0 Transitional, but XHTML Strict has
reversed this trend, as has the drive for using purely semantic markup. When
comparing XHTML and TEI, there is vastly more systems that support the
semantics of XHTML. If OSIS imported elements from the XHTML namespace, then
the learning curve would be much less steep for authors. Additionally, OSIS
could potentially be seamlessly embedded within XHTML and would provide
semantics for marking up scripture on the Web.
OSIS, though in someways unwieldy, gives me a finite set of tags to handle
> when writing software to parse OSIS.
>
Perhaps it is the finite set that is part of the unwieldiness: in order for
OSIS to be able to mark up scripture, it has to define every structure that
can be found in in scripture. This results in a huge specification which can
be difficult to maintain, and may very well not include every possible
construct that is needed. If, however, OSIS defined only the unique elements
and attributes needed to mark up scripture and it could leave the
specification of the common document structure elements (p, q, header, etc.)
to another specification, such as TEI or XHTML5.
OSIS, though in someways unwieldy, gives me a finite set of tags to handle
> when writing software to parse OSIS. If OSIS were to import TEI tags
> directly, I am sure there are plenty of global attributes and other aspects
> of the TEI specification regarding those attributes that I would have to
> handle as a software engineer, even though they will likely never be used or
> worth allowing for use against the time it would take me to implement code
> to handle all aspects of those TEI tags and their children/attributes. And
> per my first question above, do you have a particular use case where it
> would be advantageous that the tags were actually imported from the TEI
> specification?
>
I can't argue with you here :-) Having a finite set of tags is very useful
(and essential), but this could be done via importing. The OSIS XML Schema
could restrict when and where non-OSIS namespaced elements appear, and what
they are allowed to be. This would retain the finite set but would also
re-use existing semantics. Furthermore, in some contexts
arbitrarily-namespaced elements could be used, such as in notes: this would
allow mixing in inline SVG map or HTML5 audio or video clips, or something
else that hasn't been before thought-of for example.
Having a finite set of tags and the benefits thereof also raises the issue
of the need for an unambiguous document structure, or rather a single
document structure. As you've said before, you can't actually use OSIS as
your raw data format because an OSIS document can be authored in many
different ways. I will pick this up in another thread.
I guess I am desirous of seeing OSIS not being an isolated specification
which doesn't play nice with other XML vocabularies.
What do you think?
Weston
On Mon, Jan 18, 2010 at 12:01 PM, Troy A. Griffitts <scribe at crosswire.org>wrote:
> OK Weston, I'll have a go at helping open this discussion.
>
> What _practical_ benefit do you see of importing elements directly from the
> TEI specification? Really?
>
> Can you give an example of an XHTML element which should be used directly
> over an OSIS element? (other than <hi>. <hi> doesn't count. <hi> was added
> to sway publishers to use OSIS and at least keeps display oriented markup
> isolated to a single tag).
>
>
> I cannot see any practical advantage to your suggestions. Theoretically,
> maybe. But here are my arguments against so you can try to convince me
> otherwise.
>
> OSIS, though in someways unwieldy, gives me a finite set of tags to handle
> when writing software to parse OSIS. If OSIS were to import TEI tags
> directly, I am sure there are plenty of global attributes and other aspects
> of the TEI specification regarding those attributes that I would have to
> handle as a software engineer, even though they will likely never be used or
> worth allowing for use against the time it would take me to implement code
> to handle all aspects of those TEI tags and their children/attributes. And
> per my first question above, do you have a particular use case where it
> would be advantageous that the tags were actually imported from the TEI
> specification?
>
> As to the XHTML suggestion, you know OSIS is geared at marking the
> semantics of the text and not how the text should be displayed. XHTML has
> the direct opposite design goal (not as much with CSS and XHTML, but still
> elements are geared around marking out display elements). My suspicion is
> that using XHTML elements directly would bring in way to much display
> oriented logic.
>
>
> Just being divisive to stir up conversation :)
>
>
> Troy
>
>
>
>
>
> Weston Ruter wrote:
>
>> Posting to osis-users. Full thread on Open Scriptures:
>> http://groups.google.com/group/open-scriptures/browse_thread/thread/d7de2e8a9a8fec1b/a1b20121f5c9286a
>>
>> Something else that I'd like to discuss is incorporating stand-off markup
>> in OSIS.
>>
>> ---------- Forwarded message ----------
>> From: *Weston Ruter* <westonruter at gmail.com <mailto:westonruter at gmail.com
>> >>
>> Date: Mon, Dec 21, 2009 at 9:00 AM
>> Subject: Re: Standardizing on a Web Infrastructure and Web Service API for
>> Scripture
>> To: Steve DeRose <sderose at acm.org <mailto:sderose at acm.org>>
>> Cc: SWORD Developers' Collaboration Forum <sword-devel at crosswire.org<mailto:
>> sword-devel at crosswire.org>>, Patrick Durusau <Patrick at durusau.net<mailto:
>> Patrick at durusau.net>>, Kirk Lowery <klowery at whi.wts.edu <mailto:
>> klowery at whi.wts.edu>>, "Troy A. Griffitts" <scribe at crosswire.org <mailto:
>> scribe at crosswire.org>>, sblexec <sblexec at sbl-site.org <mailto:
>> sblexec at sbl-site.org>>, open-scriptures <open-scriptures at googlegroups.com<mailto:
>> open-scriptures at googlegroups.com>>, "David Austin (bghp)" <
>> daustin at bible.org <mailto:daustin at bible.org>>, SABDA <sabda4god at gmail.com<mailto:
>> sabda4god at gmail.com>>, James Tauber <jtauber at jtauber.com <mailto:
>> jtauber at jtauber.com>>, Ulrik Sandborg-Petersen <ulrikp at emdros.org<mailto:
>> ulrikp at emdros.org>>, David Eyk <deyk at crossway.org <mailto:
>> deyk at crossway.org>>
>>
>>
>> Dear Steve:
>> Thank you very much for your thoughtful reply, and for illuminating the
>> TEI roots for OSIS. First, one point of clarification:
>>
>> XHTML5 is a fine thing, obviously far better than HTML itself. But
>> it gives you no rules about the things that OSIS specifies. It gives
>> you almost no semantics for the things it defines (other than
>> layout). And it lacks tons of specific things: poetic markup,
>> epistolary units of all kinds, Biblical and other formal references
>> schemes. TEI and OSIS provide all this kind of stuff.
>>
>> If you go with "XHTML5", you will inevitably find yourself
>> re-inventing OSIS-like conventions: What names/abbrevs will you use
>> for books, translations, and the like? How will you punctuate
>> References? What syntax will you use for range references? How will
>> you represent the various kinds of notes, and where will you place
>> them? What will you do when verses and paragraphs overlap? How will
>> we distinguish the canonical texts from notes, headings, and so on?
>>
>>
>> I wasn't advocating abandoning the unique semantic elements that OSIS
>> defines in favor of some XHTML microformat conventions. Rather I was
>> suggesting that where OSIS and XHTML5 have elements that have equivalent
>> semantics (a, abbr, figure, header, table, date, div, hi, list, p, q, etc),
>> that the elements from the XHTML namespace be used. So I was in no way
>> thinking of throwing out <verse osisID="Luke.2.1" /> in favor of some <span
>> class="osisVerseID:Luke.2.1" /> or something. OSIS conventions for work
>> names/abbrevs, poetic markup, epistolary units, range references, notes, and
>> all of the other elements unique to OSIS would remain unchanged. With regard
>> to verse and paragraph overlap, the milestoned <verse> element would be
>> obligatory (and Book-Section-Paragraph structure mandatory, as opposed to
>> Book-Chapter-Verse).
>>
>> I understand that most OSIS elements inherit their semantics from TEI, but
>> their names are copied from TEI rather than being wholly imported within the
>> TEI XML namespace: so for machines encountering an OSIS document and a TEI
>> document containing (mostly) the same elements, they wouldn't be interpreted
>> as being the same due to the differing namespaces (as you know, of course).
>> Thank you for explaining the extensive community and support surrounding
>> TEI—using TEI elements verbatim in OSIS makes complete sense. However, even
>> with the popularity of TEI, it goes without saying that elements in the
>> XHTML namespace are infinitely better recognized, if not by scholars then by
>> everyone else: if an OSIS document used relevant elements from the XHTML
>> namespace, the resulting markup could be displayed directly to browsers
>> without any transform (or stylesheets) necessary. Likewise, there would be a
>> decreased learning curve for new OSIS authors because it would explicitly
>> reuse the elements that web developers are already intimately acquainted
>> with.
>>
>> Either way, whether or not XHTML can be employed for semantically
>> equivalent elements, I am interested in the next version of OSIS explicitly
>> importing its inherited elements from the namespace of the inherited XML
>> vocabulary. For that matter, I am interested in the next version of OSIS
>> period. I love OSIS, the work you've put into it, and I am excited to see
>> its use spread; to that end, I want to see it maintained and improved to be
>> the most effective as possible. Is there active work on a next edition of
>> OSIS?
>>
>> Again, thank you so much, Steve, for taking the time to give your
>> inestimable insight into OSIS and TEI.
>>
>> I look forward to your next reply.
>>
>>
>> Blessings and Merry Christmas!
>> Weston Ruter
>> openscriptures.org <http://openscriptures.org>
>>
>>
>>
>>
>> 2009/12/21 Steven DeRose <sderose at speakeasy.net <mailto:
>> sderose at speakeasy.net>>
>>
>>
>> Much more I could say on this, and no doubt others will jump in; but
>> let me answer one key question that affects all the others:
>>
>> OSIS *does* use a pre-existing XML vocabulary: OSIS is almost
>> entirely a pure subset of TEI. The extensions are tiny, and very
>> specific to Biblical materials (for example, a very specific
>> encoding for Biblical references).
>>
>> TEI has many millions of $, over 20 years, and many thousands of
>> expert hours of labor in it. It is almost universally used for
>> serious encoding of texts of literary, linguistic, and historical
>> texts. This you can easily verify via Google or at your local
>> university. If someone wants a grant to encode some important work,
>> say from the National Endowment for the Humanities, the Mellon
>> Foundation, or other large-scale funders, using anything *but* TEI
>> is so unusual that they need to specifically make a case for it in
>> their proposals (certainly in a very specialized case that can be
>> done; but TEI has proven so valuable and so effective that it better
>> be a very specialized case before one gives up the huge advantages
>> of TEI). There are countless projects using TEI throughout, thus
>> lots of tools and expertise available.
>>
>> Also, a lot of the TEI data is data that has important connections
>> to the data OSIS people care about -- the collected works of
>> important theologians, historians, and philosophers; the Greek and
>> Latin classics; English and other literature that explores Biblical
>> themes (Dostoevsky and Milton, to name two of the most obvious
>> examples). Few if any serious projects relating to any of this, use
>> HTML or XHTML for their data. Of course most everybody delivers HTML
>> to browsers; but it's trivial to convert TEI to HTML or XHTML, and
>> extremely non-trivial to go the other way.
>>
>> XHTML5 is a fine thing, obviously far better than HTML itself. But
>> it gives you no rules about the things that OSIS specifies. It gives
>> you almost no semantics for the things it defines (other than
>> layout). And it lacks tons of specific things: poetic markup,
>> epistolary units of all kinds, Biblical and other formal references
>> schemes. TEI and OSIS provide all this kind of stuff.
>>
>> If you go with "XHTML5", you will inevitably find yourself
>> re-inventing OSIS-like conventions: What names/abbrevs will you use
>> for books, translations, and the like? How will you punctuate
>> References? What syntax will you use for range references? How will
>> you represent the various kinds of notes, and where will you place
>> them? What will you do when verses and paragraphs overlap? How will
>> we distinguish the canonical texts from notes, headings, and so on?
>>
>> Countless such questions arise, and if you go with XHTML5 (or
>> XHTML349.2, for that matter), you will have to make up your own
>> answer to each. At that point, it shouldn't surprise you that every
>> other project comes up with a slightly different set of answers. And
>> that means that every time you pass data from project A to B, the
>> developers of either A or B (or both) have to write converters.
>> Sounds like a waste of time (= poor stewardship) to me. At the very
>> start of a project many of these questions may seem trivial or
>> irrelevant; but as your project grows they'll all arise and you'll
>> either make a decision; or you can decide not to decide -- which is
>> itself a decision against consistency, portability, and verifiability.
>>
>> It seems to me inaccurate to say that there is some massive range of
>> tools for XHTML but not for XML. There are lots of HTML tools, but
>> if you look at their output you'll find that they almost all produce
>> HTML so messy (often invalid, seldom XHTML, and sometimes not even
>> well-formed), that you'll either end up with data that can't be used
>> in much of anything *except* browsers, or you'll end up writing all
>> that conversion/cleanup code again. If I were a wagering man, I'd
>> wager a lot of money that you've already had to do some of that. If
>> you've got the development skills to modify open-source XHTML tools
>> (which were you thinking of?) to support your own extensions, then
>> you could modify them to do OSIS with little more work (and if you
>> use XML tools, you get most of that support for free with XML
>> Schema, Schematron, etc.).
>>
>> Is there any XHTML5 tool out there that can't deal with arbitrary
>> XML? Not many; because it's a silly move on the developers' part to
>> make one; that's because the incremental work is trivial -- if you
>> already support styling tag X a certain way when X is a member of
>> the fixed list of XHTML tags, you already know how to support
>> styling tag X when X is *not* a member of that fixed list. There is
>> also a vast range of general XML tools out there, and in general
>> they provide far more functionality than HTML or XHTML tools (simply
>> because you have to to not be laughed out of the XML marketplace).
>>
>> Steve DeRose
>>
>>
>>
>>
>>
>> On Sat, 2009-12-19 at 00:22 -0800, Weston Ruter wrote:
>>
>>> Thank you so much, Stephen. Your historical information is
>>> extremely helpful.
>>>
>>> Is anyone able to address the current state of OSIS and future
>>> plans for the standard? Namely, how is it currently addressing
>>> Stephen's points:
>>>
>>> 1. OSIS not being designed for delivery of partial documents,
>>> 2. Its large metadata overhead,
>>> 3. Ability to include “virtual” elements, as is required for
>>>
>>> partial documents.
>>> Furthermore:
>>>
>>> For the ESV Study Bible in 2008, we again considered using
>>> OSIS as the primary XML format for the notes and quickly
>>> decided to go with XHTML5 instead. There are so many more
>>> tools for dealing with HTML designed to solve real-world
>>> problems; it was more efficient to use HTML even though it
>>> didn't map perfectly to our domain.
>>>
>>>
>>> This identifies a concern I have about OSIS and how it relates to
>>> other XML vocabularies, namely XHTML5. OSIS defines many elements
>>> (a, abbr, figure, header, table, date, div, hi, list, p, q, etc.)
>>> which are already assigned rich semantics and presentational logic
>>> in the XHTML namespace: why not reuse existing XML vocabularies
>>> instead of independently (re)defining them? If OSIS depended on
>>> XHTML:
>>>
>>> 1. It would make OSIS able to be directly embedded into (X)HTML
>>>
>>> web pages and be properly understood by the browser: Bible
>>> websites could extend their existing HTML websites with OSIS
>>> markup to make them more semantically rich, readable both to
>>> machines and web browsers.
>>> 2. Existing WYSIWYG HTML editors could be more easily extended
>>>
>>> to support the additional OSIS-specific markup.
>>> 3. Having OSIS rely on XHTML would also greatly reduce the size
>>>
>>> of the OSIS specification, and new authors would require
>>> much less time to get up to speed because the spec would
>>> only define the elements unique to scriptural markup.
>>> So I wonder if an OSIS 3.0 could then explicitly reference the
>>> relevant elements from other XML vocabularies, especially XHTML5?
>>> Thoughts?
>>>
>>> Is there anyone currently active at the Bible Technologies Group?
>>>
>>> Blessings,
>>> Weston
>>>
>>>
>>> 2009/12/16 Stephen Smith <stephen.smith at gmail.com
>>> <mailto:stephen.smith at gmail.com>>
>>>
>>>
>>> There are several reasons why Crossway's XML differs from OSIS:
>>>
>>> 1. As David Eyk notes, we created the existing XML documents
>>> in May-
>>> June 2002, when OSIS was still in flux. In particular, the
>>> milestoning
>>> process was much more complicated.
>>> 2. We were working from initial XML files provided by a vendor and
>>> didn't want to change them too much.
>>> 3. OSIS is paragraph-based, rather than verse-based, making it
>>> difficult to meet our immediate need--loading the data into a
>>> relational database.
>>> 4. At the time, OSIS had some mandatory structural elements
>>> that we
>>> weren't able to create.
>>> 5. I was hoping that someone else would take the XML from the web
>>> service and write an XSLT to transform it into OSIS so we
>>> didn't have
>>> to.
>>> 6. OSIS wasn't designed for delivery of partial documents: it
>>> wasn't
>>> immediately clear to me how to structure the metadata in a
>>> response
>>> when someone is only looking at, say, John 3:16. Further, the
>>> metadata
>>> overhead in such a request, as compared to the desired
>>> content, was
>>> prohibitive. Partial documents also require the use of "virtual"
>>> elements--you need to add beginning and ending paragraph tags if
>>> you're looking at a verse that appears in the middle of a
>>> paragraph,
>>> for example, and open/close quotes properly. I don't believe
>>> that OSIS
>>> has a handy facility for including these kinds of elements.
>>>
>>> As for mapping the Crossway XML onto OSIS, it should be
>>> straightforward. Everything we did with the ESV we did with
>>> the goal
>>> of producing a world-class OSIS ESV by 2012; I tried to do one big
>>> project per year to create metadata required by OSIS. Between
>>> 2002 and
>>> 2007, we created metadata and evolved the schema to map cleanly to
>>> OSIS--upgrading the quotation system, classifying footnotes,
>>> adding
>>> catchwords, categorizing names, identifying speakers of
>>> quotes. All
>>> this metadata uses OSIS vocabulary where possible. (Most of this
>>> metadata isn't available through the API.) Even after this
>>> work, it
>>> will still take many more hours to produce a document that fully
>>> conforms to OSIS at the "Scholarly" level defined in the spec.
>>>
>>> The goal has always been to move away from the Crossway XML to a
>>> compliant OSIS document. I just never felt we could produce
>>> documents
>>> that conformed to the Scholarly OSIS Document / Trusted Quality
>>> requirements. I saw no point in releasing anything at a lower
>>> conformance level unless, as I mentioned, someone wanted to
>>> create an
>>> interim XSLT. Further, as nearly all consumption of the ESV
>>> API was
>>> through the HTML format, there wasn't a lot of demand for the XML.
>>>
>>> For the ESV Study Bible in 2008, we again considered using
>>> OSIS as the
>>> primary XML format for the notes and quickly decided to go
>>> with XHTML5
>>> instead. There are so many more tools for dealing with HTML
>>> designed
>>> to solve real-world problems; it was more efficient to use
>>> HTML even
>>> though it didn't map perfectly to our domain.
>>>
>>> I hope that answers your historical questions.
>>>
>>> Stephen
>>>
>>>
>>> On Dec 16, 4:02 am, Weston Ruter <westonru... at gmail.com
>>> <mailto:westonru... at gmail.com>> wrote:
>>> > Greetings Crossway, CrossWire, the Bible Technologies Group,
>>> SBL, and
>>> > esteemed members of the Bible+Tech community:
>>> >
>>> > I am researching data formats used to represent
>>> scripture—including XML
>>> > vocabularies, DB schemas, and *ad hoc* text file
>>> formats—with the hope of
>>> > contributing towards the development of a standard API that
>>> is able to
>>> > commonly represent all of the constructs used by each. With
>>> such a standard
>>> > API, the hope is that (web) developers would be able to
>>> access scriptural
>>> > data from the array of Bible societies (e.g. Bible.org)
>>> using one
>>> > standardized web service interface (i.e. that mashups of
>>> multiple
>>> > translations from different sources would become easy to
>>> implement, for
>>> > example: <http://pixelfaith.com/bible/#Luke/2>).
>>> >
>>> > I have been studying the Crossway XML format and I am
>>> curious as to why
>>> > Crossway didn't use OSIS. Were there any limitations in OSIS
>>> that caused you
>>> > to develop your own XML vocabulary? Furthermore, why has
>>> development of OSIS
>>> > seemed to have ceased with the last revision being over
>>> three years ago (6
>>> > March 2006)? Moving forward, has any discussion happened
>>> regarding merging
>>> > Crossway XML into an OSIS 3.0?
>>> >
>>> > More to the crux of my inquiry, has Crossway considered any
>>> collaboration to
>>> > standardize an API such as you provide to access the ESV? Or
>>> is anyone aware
>>> > of any such effort currently being worked on? I am aware
>>> through Troy
>>> > Griffitts of the web service API the CrossWire Bible Society
>>> has developed
>>> > in coordination with the development of OSIS, and I am in no
>>> way wanting to
>>> > supplant their excellent work. But I am interested in
>>> looking at what a
>>> > Web-centric API would look like built from the ground up
>>> using the latest
>>> > Internet standards with an eye for Ajax applications, web
>>> mashups, and (most
>>> > importantly) semantically Linked Data. (I would hope any
>>> efforts in this
>>> > area simply flow back into CrossWire's efforts for the next
>>> version of their
>>> > API, which could perhaps then be more widely adopted.)
>>> >
>>> > What OSIS seeks to do for markup, I would like to see done
>>> with an API to
>>> > give developers a standard way of accessing the data in the
>>> texts on the
>>> > Web. In other words and in short, I am interested in the
>>> development a
>>> > standardized web service API and Document Object Model (DOM)
>>> for OSIS.
>>> >
>>> > I am presenting this topic at the BibleTech:2010 Conference.
>>> >
>>> > Obviously, any such standardization effort would have to be
>>> a joint effort
>>> > by all of us. Looking forward to hearing from you!
>>> >
>>> > Blessings and Merry Christmas!
>>> > Weston Ruter
>>> > OpenScriptures.org
>>>
>>>
>>> --
>>>
>>> You received this message because you are subscribed to the
>>> Google Groups "Open Scriptures" group.
>>> To post to this group, send email to
>>> open-scriptures at googlegroups.com
>>> <mailto:open-scriptures at googlegroups.com>.
>>>
>>> To unsubscribe from this group, send email to
>>> open-scriptures+unsubscribe at googlegroups.com<open-scriptures%2Bunsubscribe at googlegroups.com>
>>> <mailto:open-scriptures%2Bunsubscribe at googlegroups.com<open-scriptures%252Bunsubscribe at googlegroups.com>
>>> >.
>>>
>>> For more options, visit this group at
>>> http://groups.google.com/group/open-scriptures?hl=en.
>>>
>>>
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> osis-users mailing list
>> osis-users at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/osis-users
>>
>
>
> _______________________________________________
> osis-users mailing list
> osis-users at crosswire.org
> http://www.crosswire.org/mailman/listinfo/osis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20100119/50fd7c6d/attachment-0001.html>
More information about the osis-users
mailing list