[osis-core] div type="date"
Steven J. DeRose
osis-core@bibletechnologieswg.org
Fri, 6 Jun 2003 12:08:29 -0400
--============_-1157204642==_ma============
Content-Type: text/plain; charset="us-ascii" ; format="flowed"
Finally catching up on all these threads.
It suddenly got easier when I was bitching to Patrick about Eudora
not supporting threads well -- like, it should let you option-click
on a subject and gather together all the stuff from that thread in
date order. To illustrate to myself that it didn't work, I did it as
I spoke -- lo and behold, it *does* work.
Anyways, as for date, it seems to me there are a number of cases we
want to cover:
* Books with daily readings, like my utmost for his highest, etc.
* Books with weekly readings, like a lectionary. These may be
organized by date, or by numbered Sundays of the church year, or "4th
Sunday after Pentecost", etc.
* Books with multiple readings per day, for set but abstract times
(thanks Todd)
Vigils, Matins, Lauds, Terce, Sext, None, Vespers, & Compline
Lauds/Vespers
Sunrise/Sunset
AM/PM
The Muslim prayer times: Salat al-Fajr, Salat al-Zuhr, Salat al-`Asr,
Salat al-Maghhrib, Salat al-`Isha, and the optional
Salat al-Lail and Salat al-Dzuha
(see
http://aaiil.org/text/books/mali/muslimprayerbook/fivedailyprayers.shtml)
It appears ok to leave off the common "Salat al-". And
then there's an "'Id".
* Dramatic time within plays -- the next day, the following Tuesday,
the ides of March (every month has an ides, but it's not always the
15th).
* Time-lines of events, such as Bishop Usher's (in)famous chronology.
If you want, I can post you my unbroken lineage all the way back to
Adam....
* Books organized by entirely different calender systems: Hebrew,
Islamic, Chinese, etc. I'm tempted to punt on these due to my
virtually complete ignorance of the subject.
* Times counted in hours from sunrise or sunset. So far I'm punting this.
Whether to include Muslim prayer times could be considered
controversial; but even apart from trying to be inviting, I think
it's useful because we'll likely find Christian prayer books
organized on the Muslim schedule -- consider Christians living in
Islamic countries -- I'll bet *someone* has written a prayer book or
devotional organized that way; or at least a travelogue or cultural
study.
I think we'll find prayer books and other materials to encode that
use all of these cases; and so far as I know no one else has
addressed these issues, though TEI considered them long ago and more
or less punted, except that the <time> element has a type attribute:
(am | pm | 24hour | descriptive).
Note that we have a problem not contemplated in Internet specs:
unspecified *high* order portions. For example, referring to any
"March 1" regardless of year.
RFC 3339 specifically punts on BCE dates, and on date ranges or
intervals (why does everybody always want to punt on ranges????), and
on non-Gregorian dates. RFC 3339 also does not appear to state
whether a partial time refers to the range of all possible low-order
values that were omitted, or to first time point in that range.
So, I propose:
1) That we create a best practice guide for time-organized data be prepared.
2) That it use time formats basically following IETF RFC 3339
(excerpted below).
3) That we require the separator character "T", in upper case, when
times are given.
4) That we reiterate the RFC requirement for 4-digit years.
5) That we extend the syntax of RFC3339 to permit omitting the year,
or the second, or the second and minute, or the time in its entirety.
Thus, these would become legal:
09:11T10:42:11
2003:06:04T11:23
2003:06:04T11
We're ok for processing because if the initial component is 4 digits
it's a year, and if it's not, then it's a month (or maybe we'll
detect an error down the line). An alternative would be to use the
year 0000 for this, but it feels misleading to me.
6) That we define a partial date/time specification to mean any time
within the range it covers. For example, if you leave off the
seconds, any time during the specified minute is fair game. In
exactly the same way, if you leave off the year, the date/time refers
to the given moment or range in *any* year.
7) That we extend the syntax of RFC3339 to permit an initial hyphen
to indicates years BCE. Those using atoi() to get the year will be
fine (except for calculating time intervals, sincet there was no year
0).
8) that we extend the syntax of RFC3339 to permit named/abstract
times following the T. The production below would be changed to:
partial-time = time-hour [":" time-minute [":" time-second
[time-secfrac] ] ]
| "~(" name ")"
The twiddle flags that we're dealing with abstract time; the name
must be an XML name, and must be one that we define or must start
with "x-". I propose we define the following initial list (per XML,
all names are case-sensitive):
Vigils, Matins, Lauds, Terce, Sext, None, Vespers, Compline
Sunrise/Sunset
Morning/Afternoon/Evening/Night
AM/PM
Fajr, Zuhr,`Asr, Maghhrib,`Isha, Lail, Dzuha, 'Id
(only, the backquotes would need to change; maybe use "." or "_"?
9) We define that abstract times may not be sorted merely
mechanically, and not all of them can be sorted at all (for example,
how do you compare Vespers with Evening?). However, we should define
the correct relative order within each line shown above (which I
think is what I have listed -- anybody corrections?)
10) We define that a date/time may be preceded by "~", indicating
that it is approximate.
Does that cover things reasonably well?
It does punt on approximate dates, like "in the 1930's", and times
relative to sunrise and sunset, like "in the 3rd watch of the night"
or "it is only the third hour"; or "in the 3rd year of the reign of
Artaxerxes", and on ranges in general. For ranges, I think I'd prefer
a <date-range> element, with from/to attributes. Version 2, maybe.
We could deal with these by allowing low-order wildcards in years,
and names relative to which the date and/or time field would be
calculated; but you can't do much to process them. If we allowed
named eras, such as king's reigns, at least one could compare within
them; but it doesn't seem all that important in the short term; maybe
leave for version 5.2 or so?
Does that get us reasonably far for fairly little work?
--------------------------------------
References:
This one is worth looking over -- they cover an awful lot of issues:
http://www.faqs.org/rfcs/rfc3339.html Date and Time on the Internet:
Timestamps
Here's the basic grammar from there:
date-fullyear = 4DIGIT
date-month = 2DIGIT ; 01-12
date-mday = 2DIGIT ; 01-28, 01-29, 01-30, 01-31 based on month/year
time-hour = 2DIGIT ; 00-23
time-minute = 2DIGIT ; 00-59
time-second = 2DIGIT ; 00-58, 00-59, 00-60 based on leap second rules
time-secfrac = "." 1*DIGIT
time-numoffset = ("+" / "-") time-hour ":" time-minute
time-offset = "Z" / time-numoffset
partial-time = time-hour ":" time-minute ":" time-second [time-secfrac]
full-date = date-fullyear "-" date-month "-" date-mday
full-time = partial-time time-offset
date-time = full-date "T" full-time
NOTE: Per [ABNF] and ISO8601, the "T" and "Z" characters in this
syntax may alternatively be lower case "t" or "z" respectively.
This date/time format may be used in some environments or contexts
that distinguish between the upper- and lower-case letters 'A'-'Z'
and 'a'-'z' (e.g. XML). Specifications that use this format in
such environments MAY further limit the date/time syntax so that
the letters 'T' and 'Z' used in the date/time syntax must always
be upper case. Applications that generate this format SHOULD use
upper case letters.
And here are their examples (note the last one!):
5.8. Examples Here are some examples of Internet date/time format.
1985-04-12T23:20:50.52Z
This represents 20 minutes and 50.52 seconds after the 23rd hour of
April 12th, 1985 in UTC.
1996-12-19T16:39:57-08:00
This represents 39 minutes and 57 seconds after the 16th hour of
December 19th, 1996 with an offset of -08:00 from UTC (Pacific
Standard Time). Note that this is equivalent to 1996-12-20T00:39:57Z
in UTC.
1990-12-31T23:59:60Z
This represents the leap second inserted at the end of 1990.
1990-12-31T15:59:60-08:00
This represents the same leap second in Pacific Standard Time, 8
hours behind UTC.
1937-01-01T12:00:27.87+00:20
This represents the same instant of time as noon, January 1, 1937,
Netherlands time. Standard time in the Netherlands was exactly 19
minutes and 32.13 seconds ahead of UTC by law from 1909-05-01 through
1937-06-30. This time zone cannot be represented exactly using the
HH:MM format, and this timestamp uses the closest representable UTC
offset.
------------------------------------------
Here's most of the bibliography from there. Note the "International
Earth Rotation Service Bulletins" -- now there's a service crew that
would be interesting to work on. What do they do, attach big rockets
to try to keep the rotation from slowing down and messing up all the
atomic clocks?
[IMAIL] Crocker, D., "Standard for the Format of Arpa
Internet Text Messages", STD 11, RFC 822, August
1982.
[IMAIL-UPDATE] Resnick, P., "Internet Message Format", RFC 2822,
April 2001. [ISO8601] "Data elements and interchange formats
-- Information interchange -- Representation of
dates and times", ISO 8601:1988(E), International
Organization for Standardization, June, 1988.
[ISO8601:2000] "Data elements and interchange formats -- Information
interchange -- Representation of dates and times", ISO
8601:2000, International Organization for
Standardization, December, 2000.
[HOST-REQ] Braden, R., "Requirements for Internet Hosts --
Application and Support", STD 3, RFC 1123, October
1989.
[IERS] International Earth Rotation Service Bulletins,
<http://hpiers.obspm.fr/eop-
pc/products/bulletins.html>.
[NTP] Mills, D, "Network Time Protocol (Version 3)
Specification, Implementation and Analysis", RFC 1305,
March 1992.
[ITU-R-TF] International Telecommunication Union Recommendations
for Time Signals and Frequency Standards Emissions.
<http://www.itu.ch/publications/itu-r/iturtf.htm>
--
Steve DeRose -- http://www.derose.net
Chair, Bible Technologies Group -- http://www.bibletechnologies.net
Email: sderose@acm.org or steve@derose.net
--============_-1157204642==_ma============
Content-Type: text/html; charset="us-ascii"
<!doctype html public "-//W3C//DTD W3 HTML//EN">
<html><head><style type="text/css"><!--
blockquote, dl, ul, ol, li { padding-top: 0 ; padding-bottom: 0 }
--></style><title>Re: [osis-core] div
type="date"</title></head><body>
<div>Finally catching up on all these threads.</div>
<div><br></div>
<div>It suddenly got easier when I was bitching to Patrick about
Eudora not supporting threads well -- like, it should let you
option-click on a subject and gather together all the stuff from that
thread in date order. To illustrate to myself that it didn't work, I
did it as I spoke -- lo and behold, it *does* work.</div>
<div><br></div>
<div>Anyways, as for date, it seems to me there are a number of cases
we want to cover:</div>
<div><br></div>
<div>* Books with daily readings, like my utmost for his highest,
etc.</div>
<div><br></div>
<div>* Books with weekly readings, like a lectionary. These may be
organized by date, or by numbered Sundays of the church year, or
"4th Sunday after Pentecost", etc.</div>
<div><br></div>
<div>* Books with multiple readings per day, for set but abstract
times (thanks Todd)</div>
<div> Vigils, Matins, Lauds, Terce, Sext,
None, Vespers, & Compline</div>
<div> Lauds/Vespers</div>
<div> Sunrise/Sunset</div>
<div> AM/PM</div>
<div> The Muslim prayer times:<font size="-3"
color="#000000"> Salat al-Fajr, Salat al-Zuhr, Salat
al-`Asr,</font></div>
<div><font size="-3"
color="#000000"
>
Salat al-Maghhrib, Salat al-`Isha</font>, and the optional <font
size="-3" color="#000000"> Salat al-Lail and Salat
al-Dzuha</font></div>
<div>
(see<font size="-3" color="#000000">
http://aaiil.org/text/books/mali/muslimprayerbook/fivedailyprayers.sh<span
></span>tml</font>)</div>
<div>
It appears ok to leave off the common "Salat al-". And then
there's an "'Id".</div>
<div><br></div>
<div>* Dramatic time within plays -- the next day, the following
Tuesday, the ides of March (every month has an ides, but it's not
always the 15th).</div>
<div><br></div>
<div>* Time-lines of events, such as Bishop Usher's (in)famous
chronology. If you want, I can post you my unbroken lineage all the
way back to Adam....</div>
<div><br></div>
<div>* Books organized by entirely different calender systems: Hebrew,
Islamic, Chinese, etc. I'm tempted to punt on these due to my
virtually complete ignorance of the subject.</div>
<div><br></div>
<div>* Times counted in hours from sunrise or sunset. So far I'm
punting this.</div>
<div><br></div>
<div><br></div>
<div>Whether to include Muslim prayer times could be considered
controversial; but even apart from trying to be inviting, I think it's
useful because we'll likely find Christian prayer books organized on
the Muslim schedule -- consider Christians living in Islamic countries
-- I'll bet *someone* has written a prayer book or devotional
organized that way; or at least a travelogue or cultural study.</div>
<div><br></div>
<div>I think we'll find prayer books and other materials to encode
that use all of these cases; and so far as I know no one else has
addressed these issues, though TEI considered them long ago and more
or less punted, except that the <time> element has a type
attribute: (am | pm | 24hour | descriptive).</div>
<div><br></div>
<div>Note that we have a problem not contemplated in Internet specs:
unspecified *high* order portions. For example, referring to any
"March 1" regardless of year.</div>
<div><br></div>
<div>RFC 3339 specifically punts on BCE dates, and on date ranges or
intervals (why does everybody always want to punt on ranges????), and
on non-Gregorian dates. RFC 3339 also does not appear to state whether
a partial time refers to the range of all possible low-order values
that were omitted, or to first time point in that range.</div>
<div><br></div>
<div><br></div>
<div><br></div>
<div><br></div>
<div>So, I propose:</div>
<div><br></div>
<div>1) That we create a best practice guide for time-organized data
be prepared.</div>
<div><br></div>
<div>2) That it use time formats basically following IETF RFC 3339
(excerpted below).</div>
<div><br></div>
<div>3) That we require the separator character "T", in
upper case, when times are given.</div>
<div><br></div>
<div>4) That we reiterate the RFC requirement for 4-digit years.</div>
<div><br></div>
<div>5) That we extend the syntax of RFC3339 to permit omitting the
year, or the second, or the second and minute, or the time in its
entirety. Thus, these would become legal:</div>
<div><br></div>
<div> 09:11T10:42:11</div>
<div> 2003:06:04T11:23</div>
<div> 2003:06:04T11</div>
<div><br></div>
<div>We're ok for processing because if the initial component is 4
digits it's a year, and if it's not, then it's a month (or maybe we'll
detect an error down the line). An alternative would be to use the
year 0000 for this, but it feels misleading to me.</div>
<div><br></div>
<div>6) That we define a partial date/time specification to mean any
time within the range it covers. For example, if you leave off the
seconds, any time during the specified minute is fair game. In exactly
the same way, if you leave off the year, the date/time refers to the
given moment or range in *any* year.</div>
<div><br></div>
<div>7) That we extend the syntax of RFC3339 to permit an initial
hyphen to indicates years BCE. Those using atoi() to get the year will
be fine (except for calculating time intervals, sincet there was no
year 0).</div>
<div><br></div>
<div>8) that we extend the syntax of RFC3339 to permit named/abstract
times following the T. The production below would be changed to:</div>
<div><br></div>
<div><font size="-3" color="#000000">partial-time =
time-hour [":" time-minute [":" time-second
[time-secfrac]</font> ] ]</div>
<div
> <span
></span>
| "~(" name ")"</div>
<div><br></div>
<div>The twiddle flags that we're dealing with abstract time; the name
must be an XML name, and must be one that we define or must start with
"x-". I propose we define the following initial list (per
XML, all names are case-sensitive):</div>
<div><br></div>
<div> Vigils, Matins, Lauds, Terce, Sext,
None, Vespers, Compline</div>
<div> Sunrise/Sunset</div>
<div> Morning/Afternoon/Evening/Night</div>
<div> AM/PM</div>
<div> <font size="-3" color="#000000"> Fajr,
Zuhr,`Asr, Maghhrib,`Isha</font>,<font size="-3" color="#000000">
Lail, Dzuha</font>, 'Id</div>
<div> (only, the backquotes would
need to change; maybe use "." or "_"?</div>
<div><br></div>
<div>9) We define that abstract times may not be sorted merely
mechanically, and not all of them can be sorted at all (for example,
how do you compare Vespers with Evening?). However, we should define
the correct relative order within each line shown above (which I think
is what I have listed -- anybody corrections?)</div>
<div><br></div>
<div>10) We define that a date/time may be preceded by "~",
indicating that it is approximate.</div>
<div><br></div>
<div><br></div>
<div>Does that cover things reasonably well?</div>
<div><br></div>
<div>It does punt on approximate dates, like "in the
1930's", and times relative to sunrise and sunset, like "in
the 3rd watch of the night" or "it is only the third
hour"; or "in the 3rd year of the reign of Artaxerxes",
and on ranges in general. For ranges, I think I'd prefer a
<date-range> element, with from/to attributes. Version 2,
maybe.</div>
<div><br></div>
<div>We could deal with these by allowing low-order wildcards in
years, and names relative to which the date and/or time field would be
calculated; but you can't do much to process them. If we allowed named
eras, such as king's reigns, at least one could compare within them;
but it doesn't seem all that important in the short term; maybe leave
for version 5.2 or so?</div>
<div><br></div>
<div>Does that get us reasonably far for fairly little work?</div>
<div><br></div>
<div>--------------------------------------</div>
<div><br></div>
<div><br></div>
<div>References:</div>
<div><br></div>
<div>This one is worth looking over -- they cover an awful lot of
issues:</div>
<div><br></div>
<div>http://www.faqs.org/rfcs/rfc3339.html Date and Time on the
Internet: Timestamps</div>
<div><br></div>
<div>Here's the basic grammar from there:</div>
<div><br></div>
<div>date-fullyear = 4DIGIT </div>
<div>date-month = 2DIGIT ;
01-12 </div>
<div>date-mday = 2DIGIT ;
01-28, 01-29, 01-30, 01-31 based on month/year </div>
<div>time-hour = 2DIGIT ;
00-23 </div>
<div>time-minute = 2DIGIT ; 00-59
</div>
<div>time-second = 2DIGIT ; 00-58,
00-59, 00-60 based on leap second rules </div>
<div>time-secfrac = "." 1*DIGIT
</div>
<div>time-numoffset = ("+" / "-") time-hour
":" time-minute </div>
<div>time-offset = "Z" /
time-numoffset </div>
<div>partial-time = time-hour ":"
time-minute ":" time-second [time-secfrac]
</div>
<div>full-date = date-fullyear
"-" date-month "-" date-mday </div>
<div>full-time = partial-time
time-offset </div>
<div>date-time = full-date
"T" full-time </div>
<div><br></div>
<div>NOTE: Per [ABNF] and ISO8601, the "T" and "Z"
characters in this syntax may
alternatively be lower case "t" or "z"
respectively. This date/time
format may be used in some environments or
contexts that distinguish between
the upper- and lower-case letters
'A'-'Z' and 'a'-'z' (e.g. XML).
Specifications that use this format in
such environments MAY further limit the date/time syntax so
that the letters 'T' and 'Z' used
in the date/time syntax must always
be upper case. Applications that generate this format SHOULD
use upper case letters.</div>
<div><br></div>
<div>And here are their examples (note the last one!):</div>
<div><br></div>
<div>5.8. Examples Here are some examples of
Internet date/time format. </div>
<div><br></div>
<div>1985-04-12T23:20:50.52Z </div>
<div><br></div>
<div>This represents 20 minutes and 50.52 seconds after the 23rd hour
of April 12th, 1985 in UTC.
</div>
<div><br></div>
<div>1996-12-19T16:39:57-08:00 </div>
<div><br></div>
<div> This represents 39 minutes and 57 seconds after the 16th
hour of December 19th, 1996 with an offset of -08:00
from UTC (Pacific Standard Time). Note that
this is equivalent to 1996-12-20T00:39:57Z in
UTC. </div>
<div><br></div>
<div>1990-12-31T23:59:60Z </div>
<div><br></div>
<div>This represents the leap second inserted at the end of
1990. </div>
<div><br></div>
<div>1990-12-31T15:59:60-08:00 </div>
<div><br></div>
<div>This represents the same leap second in Pacific Standard Time,
8 hours behind UTC.
</div>
<div><br></div>
<div>1937-01-01T12:00:27.87+00:20 </div>
<div><br></div>
<div>This represents the same instant of time as noon, January 1,
1937, Netherlands time. Standard time in the
Netherlands was exactly 19 minutes and 32.13 seconds
ahead of UTC by law from 1909-05-01 through
1937-06-30. This time zone cannot be represented exactly using
the HH:MM format, and this timestamp uses the
closest representable UTC offset.</div>
<div><br></div>
<div><br></div>
<div>------------------------------------------</div>
<div><br></div>
<div><br></div>
<div>Here's most of the bibliography from there. Note the
"International Earth Rotation Service Bulletins" -- now
there's a service crew that would be interesting to work on. What do
they do, attach big rockets to try to keep the rotation from slowing
down and messing up all the atomic clocks?</div>
<div><br></div>
<div> [IMAIL]
Crocker, D., "Standard for the Format of Arpa
Internet <span
></span> Text
Messages", STD 11, RFC 822, August 1982. </div>
<div><br></div>
<div>[IMAIL-UPDATE] Resnick, P., "Internet Message Format",
RFC
2822, <span
></span> April
2001. [ISO8601]
"Data elements and interchange formats --
Information <span
></span>
interchange -- Representation of dates and times",
ISO <span
></span> 8601:1988(E),
International Organization
for <span
></span> Standardization,
June, 1988. </div>
<div><br></div>
<div> [ISO8601:2000] "Data elements and interchange formats
--
Information <span
></span> interchange
-- Representation of dates and times",
ISO <span
></span> 8601:2000,
International Organization
for <span
></span> Standardization,
December, 2000. </div>
<div><br></div>
<div>[HOST-REQ] Braden, R., "Requirements
for Internet Hosts
-- <span
></span> Application and
Support", STD 3, RFC 1123,
October <span
></span> 1989.
</div>
<div><br></div>
<div>[IERS]
International Earth Rotation Service
Bulletins, <span
></span>
<http://hpiers.obspm.fr/eop- <span
></span
>
pc/products/bulletins.html>. </div>
<div><br></div>
<div>[NTP]
Mills, D, "Network Time Protocol (Version
3) <span
></span> Specification,
Implementation and Analysis", RFC
1305, <span
></span> March 1992.
</div>
<div><br></div>
<div> [ITU-R-TF] International
Telecommunication Union
Recommendations <span
></span> for
Time Signals and Frequency Standards
Emissions. <span
></span>
<http://www.itu.ch/publications/itu-r/iturtf.htm></div>
<x-sigsep><pre>--
</pre></x-sigsep>
<div><br>
Steve DeRose -- http://www.derose.net<br>
Chair, Bible Technologies Group --
http://www.bibletechnologies.net<br>
Email: sderose@acm.org or steve@derose.net</div>
</body>
</html>
--============_-1157204642==_ma============--