[osis-users] OSIS with TEI and XHTML5 (was Fwd: Standardizing on a Web Infrastructure and Web Service API for Scripture)
Weston Ruter
westonruter at gmail.com
Thu Jan 14 17:02:36 MST 2010
Posting to osis-users. Full thread on Open Scriptures:
http://groups.google.com/group/open-scriptures/browse_thread/thread/d7de2e8a9a8fec1b/a1b20121f5c9286a
Something else that I'd like to discuss is incorporating stand-off markup in
OSIS.
---------- Forwarded message ----------
From: Weston Ruter <westonruter at gmail.com>
Date: Mon, Dec 21, 2009 at 9:00 AM
Subject: Re: Standardizing on a Web Infrastructure and Web Service API for
Scripture
To: Steve DeRose <sderose at acm.org>
Cc: SWORD Developers' Collaboration Forum <sword-devel at crosswire.org>,
Patrick Durusau <Patrick at durusau.net>, Kirk Lowery <klowery at whi.wts.edu>,
"Troy A. Griffitts" <scribe at crosswire.org>, sblexec <sblexec at sbl-site.org>,
open-scriptures <open-scriptures at googlegroups.com>, "David Austin (bghp)" <
daustin at bible.org>, SABDA <sabda4god at gmail.com>, James Tauber <
jtauber at jtauber.com>, Ulrik Sandborg-Petersen <ulrikp at emdros.org>, David Eyk
<deyk at crossway.org>
Dear Steve:
Thank you very much for your thoughtful reply, and for illuminating the TEI
roots for OSIS. First, one point of clarification:
XHTML5 is a fine thing, obviously far better than HTML itself. But it gives
> you no rules about the things that OSIS specifies. It gives you almost no
> semantics for the things it defines (other than layout). And it lacks tons
> of specific things: poetic markup, epistolary units of all kinds, Biblical
> and other formal references schemes. TEI and OSIS provide all this kind of
> stuff.
>
> If you go with "XHTML5", you will inevitably find yourself re-inventing
> OSIS-like conventions: What names/abbrevs will you use for books,
> translations, and the like? How will you punctuate References? What syntax
> will you use for range references? How will you represent the various kinds
> of notes, and where will you place them? What will you do when verses and
> paragraphs overlap? How will we distinguish the canonical texts from notes,
> headings, and so on?
>
I wasn't advocating abandoning the unique semantic elements that OSIS
defines in favor of some XHTML microformat conventions. Rather I was
suggesting that where OSIS and XHTML5 have elements that have equivalent
semantics (a, abbr, figure, header, table, date, div, hi, list, p, q, etc),
that the elements from the XHTML namespace be used. So I was in no way
thinking of throwing out <verse osisID="Luke.2.1" /> in favor of some <span
class="osisVerseID:Luke.2.1" /> or something. OSIS conventions for work
names/abbrevs, poetic markup, epistolary units, range references, notes, and
all of the other elements unique to OSIS would remain unchanged. With regard
to verse and paragraph overlap, the milestoned <verse> element would be
obligatory (and Book-Section-Paragraph structure mandatory, as opposed to
Book-Chapter-Verse).
I understand that most OSIS elements inherit their semantics from TEI, but
their names are copied from TEI rather than being wholly imported within the
TEI XML namespace: so for machines encountering an OSIS document and a TEI
document containing (mostly) the same elements, they wouldn't be interpreted
as being the same due to the differing namespaces (as you know, of course).
Thank you for explaining the extensive community and support surrounding
TEI—using TEI elements verbatim in OSIS makes complete sense. However, even
with the popularity of TEI, it goes without saying that elements in the
XHTML namespace are infinitely better recognized, if not by scholars then by
everyone else: if an OSIS document used relevant elements from the XHTML
namespace, the resulting markup could be displayed directly to browsers
without any transform (or stylesheets) necessary. Likewise, there would be a
decreased learning curve for new OSIS authors because it would explicitly
reuse the elements that web developers are already intimately acquainted
with.
Either way, whether or not XHTML can be employed for semantically equivalent
elements, I am interested in the next version of OSIS explicitly importing
its inherited elements from the namespace of the inherited XML vocabulary.
For that matter, I am interested in the next version of OSIS period. I love
OSIS, the work you've put into it, and I am excited to see its use spread;
to that end, I want to see it maintained and improved to be the most
effective as possible. Is there active work on a next edition of OSIS?
Again, thank you so much, Steve, for taking the time to give your
inestimable insight into OSIS and TEI.
I look forward to your next reply.
Blessings and Merry Christmas!
Weston Ruter
openscriptures.org
2009/12/21 Steven DeRose <sderose at speakeasy.net>
Much more I could say on this, and no doubt others will jump in; but let me
> answer one key question that affects all the others:
>
> OSIS *does* use a pre-existing XML vocabulary: OSIS is almost entirely a
> pure subset of TEI. The extensions are tiny, and very specific to Biblical
> materials (for example, a very specific encoding for Biblical references).
>
> TEI has many millions of $, over 20 years, and many thousands of expert
> hours of labor in it. It is almost universally used for serious encoding of
> texts of literary, linguistic, and historical texts. This you can easily
> verify via Google or at your local university. If someone wants a grant to
> encode some important work, say from the National Endowment for the
> Humanities, the Mellon Foundation, or other large-scale funders, using
> anything *but* TEI is so unusual that they need to specifically make a case
> for it in their proposals (certainly in a very specialized case that can be
> done; but TEI has proven so valuable and so effective that it better be a
> very specialized case before one gives up the huge advantages of TEI). There
> are countless projects using TEI throughout, thus lots of tools and
> expertise available.
>
> Also, a lot of the TEI data is data that has important connections to the
> data OSIS people care about -- the collected works of important theologians,
> historians, and philosophers; the Greek and Latin classics; English and
> other literature that explores Biblical themes (Dostoevsky and Milton, to
> name two of the most obvious examples). Few if any serious projects relating
> to any of this, use HTML or XHTML for their data. Of course most everybody
> delivers HTML to browsers; but it's trivial to convert TEI to HTML or XHTML,
> and extremely non-trivial to go the other way.
>
> XHTML5 is a fine thing, obviously far better than HTML itself. But it gives
> you no rules about the things that OSIS specifies. It gives you almost no
> semantics for the things it defines (other than layout). And it lacks tons
> of specific things: poetic markup, epistolary units of all kinds, Biblical
> and other formal references schemes. TEI and OSIS provide all this kind of
> stuff.
>
> If you go with "XHTML5", you will inevitably find yourself re-inventing
> OSIS-like conventions: What names/abbrevs will you use for books,
> translations, and the like? How will you punctuate References? What syntax
> will you use for range references? How will you represent the various kinds
> of notes, and where will you place them? What will you do when verses and
> paragraphs overlap? How will we distinguish the canonical texts from notes,
> headings, and so on?
>
> Countless such questions arise, and if you go with XHTML5 (or XHTML349.2,
> for that matter), you will have to make up your own answer to each. At that
> point, it shouldn't surprise you that every other project comes up with a
> slightly different set of answers. And that means that every time you pass
> data from project A to B, the developers of either A or B (or both) have to
> write converters. Sounds like a waste of time (= poor stewardship) to me. At
> the very start of a project many of these questions may seem trivial or
> irrelevant; but as your project grows they'll all arise and you'll either
> make a decision; or you can decide not to decide -- which is itself a
> decision against consistency, portability, and verifiability.
>
> It seems to me inaccurate to say that there is some massive range of tools
> for XHTML but not for XML. There are lots of HTML tools, but if you look at
> their output you'll find that they almost all produce HTML so messy (often
> invalid, seldom XHTML, and sometimes not even well-formed), that you'll
> either end up with data that can't be used in much of anything *except*
> browsers, or you'll end up writing all that conversion/cleanup code again.
> If I were a wagering man, I'd wager a lot of money that you've already had
> to do some of that. If you've got the development skills to modify
> open-source XHTML tools (which were you thinking of?) to support your own
> extensions, then you could modify them to do OSIS with little more work (and
> if you use XML tools, you get most of that support for free with XML Schema,
> Schematron, etc.).
>
> Is there any XHTML5 tool out there that can't deal with arbitrary XML? Not
> many; because it's a silly move on the developers' part to make one; that's
> because the incremental work is trivial -- if you already support styling
> tag X a certain way when X is a member of the fixed list of XHTML tags, you
> already know how to support styling tag X when X is *not* a member of that
> fixed list. There is also a vast range of general XML tools out there, and
> in general they provide far more functionality than HTML or XHTML tools
> (simply because you have to to not be laughed out of the XML marketplace).
>
> Steve DeRose
>
>
>
>
>
> On Sat, 2009-12-19 at 00:22 -0800, Weston Ruter wrote:
>
> Thank you so much, Stephen. Your historical information is extremely
> helpful.
>
> Is anyone able to address the current state of OSIS and future plans for
> the standard? Namely, how is it currently addressing Stephen's points:
>
> 1. OSIS not being designed for delivery of partial documents,
> 2. Its large metadata overhead,
> 3. Ability to include “virtual” elements, as is required for partial
> documents.
>
> Furthermore:
>
> For the ESV Study Bible in 2008, we again considered using OSIS as the
> primary XML format for the notes and quickly decided to go with XHTML5
> instead. There are so many more tools for dealing with HTML designed to
> solve real-world problems; it was more efficient to use HTML even though it
> didn't map perfectly to our domain.
>
>
> This identifies a concern I have about OSIS and how it relates to other XML
> vocabularies, namely XHTML5. OSIS defines many elements (a, abbr, figure,
> header, table, date, div, hi, list, p, q, etc.) which are already assigned
> rich semantics and presentational logic in the XHTML namespace: why not
> reuse existing XML vocabularies instead of independently (re)defining them?
> If OSIS depended on XHTML:
>
> 1. It would make OSIS able to be directly embedded into (X)HTML web
> pages and be properly understood by the browser: Bible websites could extend
> their existing HTML websites with OSIS markup to make them more semantically
> rich, readable both to machines and web browsers.
> 2. Existing WYSIWYG HTML editors could be more easily extended to
> support the additional OSIS-specific markup.
> 3. Having OSIS rely on XHTML would also greatly reduce the size of the
> OSIS specification, and new authors would require much less time to get up
> to speed because the spec would only define the elements unique to
> scriptural markup.
>
> So I wonder if an OSIS 3.0 could then explicitly reference the relevant
> elements from other XML vocabularies, especially XHTML5? Thoughts?
>
> Is there anyone currently active at the Bible Technologies Group?
>
> Blessings,
> Weston
>
>
> 2009/12/16 Stephen Smith <stephen.smith at gmail.com>
>
> There are several reasons why Crossway's XML differs from OSIS:
>
> 1. As David Eyk notes, we created the existing XML documents in May-
> June 2002, when OSIS was still in flux. In particular, the milestoning
> process was much more complicated.
> 2. We were working from initial XML files provided by a vendor and
> didn't want to change them too much.
> 3. OSIS is paragraph-based, rather than verse-based, making it
> difficult to meet our immediate need--loading the data into a
> relational database.
> 4. At the time, OSIS had some mandatory structural elements that we
> weren't able to create.
> 5. I was hoping that someone else would take the XML from the web
> service and write an XSLT to transform it into OSIS so we didn't have
> to.
> 6. OSIS wasn't designed for delivery of partial documents: it wasn't
> immediately clear to me how to structure the metadata in a response
> when someone is only looking at, say, John 3:16. Further, the metadata
> overhead in such a request, as compared to the desired content, was
> prohibitive. Partial documents also require the use of "virtual"
> elements--you need to add beginning and ending paragraph tags if
> you're looking at a verse that appears in the middle of a paragraph,
> for example, and open/close quotes properly. I don't believe that OSIS
> has a handy facility for including these kinds of elements.
>
> As for mapping the Crossway XML onto OSIS, it should be
> straightforward. Everything we did with the ESV we did with the goal
> of producing a world-class OSIS ESV by 2012; I tried to do one big
> project per year to create metadata required by OSIS. Between 2002 and
> 2007, we created metadata and evolved the schema to map cleanly to
> OSIS--upgrading the quotation system, classifying footnotes, adding
> catchwords, categorizing names, identifying speakers of quotes. All
> this metadata uses OSIS vocabulary where possible. (Most of this
> metadata isn't available through the API.) Even after this work, it
> will still take many more hours to produce a document that fully
> conforms to OSIS at the "Scholarly" level defined in the spec.
>
> The goal has always been to move away from the Crossway XML to a
> compliant OSIS document. I just never felt we could produce documents
> that conformed to the Scholarly OSIS Document / Trusted Quality
> requirements. I saw no point in releasing anything at a lower
> conformance level unless, as I mentioned, someone wanted to create an
> interim XSLT. Further, as nearly all consumption of the ESV API was
> through the HTML format, there wasn't a lot of demand for the XML.
>
> For the ESV Study Bible in 2008, we again considered using OSIS as the
> primary XML format for the notes and quickly decided to go with XHTML5
> instead. There are so many more tools for dealing with HTML designed
> to solve real-world problems; it was more efficient to use HTML even
> though it didn't map perfectly to our domain.
>
> I hope that answers your historical questions.
>
> Stephen
>
>
>
> On Dec 16, 4:02 am, Weston Ruter <westonru... at gmail.com> wrote:
> > Greetings Crossway, CrossWire, the Bible Technologies Group, SBL, and
> > esteemed members of the Bible+Tech community:
> >
> > I am researching data formats used to represent scripture—including XML
> > vocabularies, DB schemas, and *ad hoc* text file formats—with the hope of
> > contributing towards the development of a standard API that is able to
> > commonly represent all of the constructs used by each. With such a
> standard
> > API, the hope is that (web) developers would be able to access scriptural
> > data from the array of Bible societies (e.g. Bible.org) using one
> > standardized web service interface (i.e. that mashups of multiple
> > translations from different sources would become easy to implement, for
> > example: <http://pixelfaith.com/bible/#Luke/2>).
> >
> > I have been studying the Crossway XML format and I am curious as to why
> > Crossway didn't use OSIS. Were there any limitations in OSIS that caused
> you
> > to develop your own XML vocabulary? Furthermore, why has development of
> OSIS
> > seemed to have ceased with the last revision being over three years ago
> (6
> > March 2006)? Moving forward, has any discussion happened regarding
> merging
> > Crossway XML into an OSIS 3.0?
> >
> > More to the crux of my inquiry, has Crossway considered any collaboration
> to
> > standardize an API such as you provide to access the ESV? Or is anyone
> aware
> > of any such effort currently being worked on? I am aware through Troy
> > Griffitts of the web service API the CrossWire Bible Society has
> developed
> > in coordination with the development of OSIS, and I am in no way wanting
> to
> > supplant their excellent work. But I am interested in looking at what a
> > Web-centric API would look like built from the ground up using the latest
> > Internet standards with an eye for Ajax applications, web mashups, and
> (most
> > importantly) semantically Linked Data. (I would hope any efforts in this
> > area simply flow back into CrossWire's efforts for the next version of
> their
> > API, which could perhaps then be more widely adopted.)
> >
> > What OSIS seeks to do for markup, I would like to see done with an API to
> > give developers a standard way of accessing the data in the texts on the
> > Web. In other words and in short, I am interested in the development a
> > standardized web service API and Document Object Model (DOM) for OSIS.
> >
> > I am presenting this topic at the BibleTech:2010 Conference.
> >
> > Obviously, any such standardization effort would have to be a joint
> effort
> > by all of us. Looking forward to hearing from you!
> >
> > Blessings and Merry Christmas!
> > Weston Ruter
> > OpenScriptures.org
>
>
> --
>
> You received this message because you are subscribed to the Google Groups
> "Open Scriptures" group.
> To post to this group, send email to open-scriptures at googlegroups.com.
> To unsubscribe from this group, send email to
> open-scriptures+unsubscribe at googlegroups.com<open-scriptures%2Bunsubscribe at googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/open-scriptures?hl=en.
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/osis-users/attachments/20100114/b96b2c1b/attachment-0001.html>
More information about the osis-users
mailing list