[osis-core] references and self-ids Part 1 -
Assumptions/statements
Steve DeRose
osis-core@bibletechnologieswg.org
Wed, 10 Jul 2002 13:03:57 -0400
At 09:57 PM -0600 07/08/02, Todd Tillinghast wrote:
>I have read through the Patrick's most recent posting as well as several
>of the other postings.
>
>Just to make sure we are all on the same page I put together the
>following statements with I believe are TRUE and represent our intent
>with respect to references. If you don't believe any of these
>statements to be true please post a response.
>(I am using the term reference in the broadest sense to mean either self
>identification or to mean an external reference that is either work
>specific or work abstract.)
>1) The simple part of the reference (ex Gen.1.1) has no meaning outside
>of the context of at least a reference system.
If you really mean "reference system" here, as in, we can't know that
this is not the essay on Gennesee Gin, part 1, poem 1, then sure. If
you mean versification scheme, then I'd say it's meaningful, but not
fully specified -- most of the time not knowing the v. scheme won't
matter.
>2) Whether validated or not a reference system defines the set of valid
>references.
OK
>2) There may be zero or more specific works that are compliant with and
>use a given reference system.
Yes.
>3) In some cases a reference system and a specific work are equivalent,
>either because a specific work defines the reference system or because
>there is only one instance of the work.
If by equivalent you mean, predictable from each other, or in a
one-to-one relationship, then yes.
>4) It is possible to create a reference to a specific work using a
>reference system that it does not support. (The
>translation/transformation would be left to software to resolve.)
Yes (actually, probably left to a mapping table to specify, and then
for software to resolve.
>5) Although a reference system is required, the specific work is
>optional.
Right.
>
>
>It seems that we desire to do the following:
>A) Self-identify text (mainly verses but also ranges of text, chapters,
>and books).
>B) Have the self-identifying identifiers be tied to a reference system
>OR a reference system and a work without having to explicitly state the
>reference system and work with each identifier.
>C) Be able to self-identify text from more than one reference system
>and/or work within a single document.
>D) Create references (not self-identification).
>E) Create a reference to a range of text.
>F) Describe a reference at greater granularity than the reference system
>defines. (grain)
I'd agree with all those. I'd also pin down a few more details, to
some things like:
E1) A range may start and end at any reference-system-specified unit,
or any grain within one.
E2) A range must be confined to a single work.
E3) A range (or for that matter any reference) may become meaningless
when mapped from one edition of a work to another, for various
reasoning such as the referenced text not being included in some
editions, or a range's ends being re-ordered.
F1) Grains are not expected to map across editions (unless they are
very close, such as successive minor edits of the same translation,
like NIV-US-1999 vs. NIV-US-2001); thus in general any reference with
a grain should also specify a particular work, and any reference
mapped to another version should ignore a grain specification (or
perhaps offer it with a warning or something like that).
I guess I'd also add:
A work is properly an abstract notion, roughly corresponding to a
unique author/title pair, and may exist in many concrete editions,
varying in language, writing system for that language, translator,
edition, transcription, and so on.
Reference systems are assumed to be hierarchical, such that it is
meaningful and reasonable to fall back by deleting tokens from the
right (for example, dropping back from verses to chapters, or lines
to pages, etc).
(I can imagine non-hierarchical systems, but allowing for them would
either prevent such fallback, or require us to always flag which
systems are and aren't. I'd rather make the assumption and loosen it
someday later if we have to).
Also, I think we'll need consistent terminology for the various bits
and pieces of all this. How about:
Reference: Data that specifies a contiguous location in a work or
some version of a work. Logically, this includes work, starting
identifier and grain, and ending identifier and grain.
Work: A title or version
Title: An abstract work of literature, which subsumes all editions,
translations, transcriptions, and other forms of the work.
Version: A particular concrete instantiation of a title, in a
particular language, a particular translation, transcription,
edition, and so on.
Identifier: A canonical reference to a location in a work, specified
as a series of dot-separated tokens that name successive hierarchical
divisions, such as book/chapter/verse; volume/page/line;
act/scene/speech; and so on.
Grain: A machine-interpretable string that specifies a location
smaller than any identifier can specify for a given work. A grain
location is specified relative to the nearest identifiable location,
in term of counting Unicode code points, searching for a string
match, or other generic means (generic in the sense that the
interpretation of a grain specification does *not* depend on the
particular reference system in use.
Unit reference: A reference to a work as a whole, a single identified
unit in a work, or a particular grain within such a unit.
Range reference: A reference that is not a unit reference, and so
must be specified by its starting and ending identifier and grain.
>Todd
--
Steve DeRose -- http://www.stg.brown.edu/~sjd
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@speakeasy.net
Backup email: sderose@mac.com, sjd@stg.brown.edu