[osis-core] Repost of Notes from Dallas meetings 1/2002
Steven J. DeRose
osis-core@bibletechnologieswg.org
Wed, 27 Aug 2003 18:17:02 -0400
(this has an early bit about ref sys mapping files.....
------------------------------
Virtually entirely completely unorganized notes from BTG meetings,
Dallas, Jan 2002.
Types of identifier components
Matthew
5
5a
A
iv
greek/hebrew letters?
unnumbered items (psalm prescriptions, etc).
derive all from XML Schema datatypes
A component shall be an XML Schema datatype.
Enumerated list of names
Integer (range)
Letter (range)
No case distinctions
Sequence of the above
1-5, 5a, 5b, 6-50, A-F
<component-definition>
<component-name>
<short>bbn
<descr>Bible Book Names</descr>
Steps to defining the Biblical stuff:
Define components:
OSIS-works (Bible, Josephus, Plato-Phaedrus,...)
Bible-book-names
(layered Heb -> Cath -> Orth -> Prot?)
Chapter-numbers
(1, 1a, A,...)
Verse-numbers
(1, 1a, A,...)
Bible versification schemes
(prot, cath, heb, niv, nasb....)
-------
Ref system definition (change scheme to system) consists of:
Name/ID of edition (possibly abstract)
(heb, prot, cath, orth, niv, nasb...)
Display names (by language)
Description
declarations of predefined component types used
(later decided components will all be OSIS-defined)
derivations of new component types used
(not)
declaration of the aggregation (canonical identifier) form:
List of component types
(global: separated by dots)
name, and whether optional
where to get default for missing components
Inherit down tree via attribute osis:compnoent-name='value' ?>
<?OSIS work=bible edition=NIV book=Genesis ?>
in header
latest preceding PI?
declaration of scheme this one is based on
Provide OSIS-space attributes for each components, that inherit
(down the tree)
(through milsestone-pairs)
...
(attr per component is grody; single attr is ugly for inheritance)
maybe have just two attrs:
osis:work Bible.NIV
osis:ref Gen.1.1
osis:work level is globally defined by us, and typically inherited
like xml:lang
osis:ref is the in-document locator, defined by the work's
reference-system-dcl. This has to be given starting at the top; no
defaulting.
This has advantage that you can define element types per work,that
default the osis:work attribute so that you can be terse:
<josephus-ref> vs. <bible-ref>
Don't try to validate number ranges via refsysdcl; somebody else's problem.
RSD can declare list of OSIS-definded component types, plus min level
to be specified.
Mapping:
<corr from=ref to=ref/>
Anything not listed is assumed to match up.
Do we treat anything as being ordered? Like for mapping ps.1.h-1 to ps.1.1
Allow
<corr from="Ps.1.1-Ps.1.20" to="Ps.1.2-Ps.1.21">
This does stupid lexical iteration over the range.
This applies is they renumbered the same stuff.
If they re-order text and numers together, that's just rendering.
If they re-assign text to different numbers, then *that's* a change.
Can't do this across chapter boundaries, since wouldn't know last verse num
Special case for fine-grain stuff like word/char?
(Prob if not special-cased in spec: lower level(s) require software
counting; may need to distinguish in the ref system dcl file.
<ident-dcl min-level=book>
<component name=book type=OSIS:bbn/>
<component name=chapter type=int/>
<component name=verse type=int_letter/>
<component name=word type=int intrinsic=wordtoken/> ???
</ident-dcl>
<map from=NIV> <!-- to is myself -->
<corr from='Ps' to='Ps'> <!-- implies
corr of all below -->
<corr from='Ps.1.H' to='Ps.1.1'>
<corr from='Ps.1.1-Ps.1.30' to='Ps.1.2-Ps.1.31'> <!-- insert/shift -->
<corr from='Mk.16.8-Mk.16.30' to='NIL'> <!-- delete -->
<corr from='Gen.1.1-Gen.1.3' to='Gen.1.1-3'> <!-- merge -->
<corr from='NIL' to='Gen.1.4a'> <!-- insert -->
This doesn't say *where* inserted. Do we have to care?
Anything not stated is assumed to correspond
</map>
(probably shouldn't use hyphen for range delim, conflicts with
page-range, merged verse idents like in TEV, etc.)
For other languages, they can define their own (say) book names, and
map to us, but we don't register a new set of booknames for them. For
blind interchange everybody uses the normative names.
What about numbers? Tibetan digits (possibly not even base 10?)
Could do this as a localization hack:
name to name
number to number
digit to digit
<lang-map lang1=EN lang2=FR>
<element l1=parafo l2=p/>
<attribute of-element='*' l1='typo' l2='type'/>
<attr-token of-element='*' of-attribute='*' l1='kjhhhk' l2='Genesis'>
<attr-digit l1='i' l2='1'>
</lang-map>
Identifier spaces:
We provide list of works, E.g. journal title
They define types for identifiers (year, issue, date, page-range)
We can then validate loosely, but not strictly for numeric ranges.
How do we deal with distinctions:
Author/work: Josephus Antiquities (and edition?)
Bible/Edition/Book
------------------------------------------------------------------------------------------
2002-01-26 --
Can validate the whole punctuated strings, so can punctuate anyway.
How to punctuate?
a) require dots everywhere -- familiar/easy
a) require spaces everywhere -- easier to validate, conforms to NMTOKENS
b) let rsd declare punctuation between components -- flexible but
seems excessive
c) anything goes, non-name, non-dot, non-hyphen are delimiter -- way flexible
For async elements (chapter and verse and any future),
provide container elements and recommend but not require using it
when possible
provide start/end pair as well.
do we need DIVs for anything besides linegroups (and OT/NT)? Probably not.
include TEIform everywhere we can
(note types: see other mail)
add word-level annotation element with attributes:
lemma
strong's number
part-of-speech
morphology
Include discourse markup?
what about discontiguous lemmas (LOOK the word UP)?
Contractions? "Functioning as"
gloss lang...
<word x-schemename:POS='N-NM-S'>
problem: we're sort of constraining all namespaces;
but we have the right to disallow extended attrs on our elements
Let them use namespaces, or put schemename inside value?
later: norm/reg, sic/corr, abbr/expan, translit
should we provide a way to indicate canonical status:
canonical, apocyphal/deuterocanonical, OT vs. NT pseudepigrapha
should we represent somehow, what portion of the Bible is included in
the document?
must it include at least one book?
Should we somehow indicate it's just a NT + Psalms?
*** Other projects we should do:
* A collection (and public call for) objects in real Bibles that we
haven't covered. (cf)
* A collection of markup anomalies in Bible text (cf)
Accessibility information???
must identify the reference system(s) this text supports.
refsDecl in header to identify the one (for now) used here.
Within notes (cf TEV Gen 1.1) like "The phrase 'in the beginning'
refers....." Should this be XSEM's <refText> (which is documented as
where the text should be generated, but their TEV actually encloses
the text), or just be a type of q?
Types (source?):
NOTEREF -- quoting the very text this note is about
DOCUMENT -- quoting elsewhere in this same document
BIBLE -- Bible Text in some other version
(OT quoted in NT may want special formatting)
READING -- potential bible text alternative
OTHER -- non-Biblical stuff
Should this have a REF attribute on it, or should we embed a REF?
Put on same element to avoid scope ambiguity, and to express that
this is a special kind of quote, not a coincidence.
should we do salute?
what's like it:
salutation, poem, closing, letter, hymn refrain in hymn,
5 layout conventions:
Italics for words not in original
Divine name issues
L small caps ORD in all caps if translated from yhwh
NSRV distinctions of how written:
When story was written in historical present, translated into past
tense, they mark all
verbs with a star in front.
OT quotes in NT typically italicized, set as indented blocks
OT quote + inserted word, different italics. (NKJV Matt 13:15 word "Their")
Should we treat these via TEI <supplied> (not quite the same) and
other elements, or as types or reasons of <em>.
--
*****
Please note new email: sderose@speakeasy.net
Backup email address: sderose@mac.com
*****