[osis-core] USFM and "x-" on attribute values

Todd Tillinghast osis-core@bibletechnologieswg.org
Thu, 19 Feb 2004 17:24:45 -0700


Patrick,

I don't think there is a pressing need to use "x-" values when
converting from SFM or USFM to OSIS.

The way I view USFM format markers they fall into two categories:
1) Format markers that have a clear semantic/logical meaning
    p for paragraph
    s for section title
    v for verse 
    c for chapter
    ... many more that are not complicated with formatting information

2) Format markers that are complicated by formatting information
    pc for paragraph centered (this is the appropriate format marker for
inscriptions but can be used anytime you want to indicate that text is
be centered and is a part of a paragraph)
    qr for poetry right aligned 
    qc for poetry center aligned

For category one format markers a natural mapping exists between the
format markers and OSIS elements.

For category two format markers a natural map exists between the format
marker and an OSIS element, only a specific format marker will map to
different OSIS elements in different cases and many format markers may
map to a single OSIS element depending on who did created the USFM file.
Fortunately for category two format markers in most cases the
appropriate OSIS element can be disambiguated based on the passage.
(For example, if we know where inscriptions occur and we know the two or
three formatting styles used to format inscriptions, with this
information we can construct a condition/context based mapping between
format markers and OSIS elements.)  Exceptions are determined based on
manual assessment of the text to determine the appropriate OSIS element.

As a result of the analysis Chris, Kees, and I did while in TX in
January we identified a few cases where there is not a clean way to
encode a USFM format marker using OSIS elements.  Most of these
suggestions are related to less commonly used format markers.  Chris
posted the suggested adjustments. (This was in addition to the solution
for qc and qr.)

Of all of the email I have gotten relative to people trying to encode
Bibles as well as in my experience encoding a handful of texts, the qc
and qr format markers are the cases where there was no good way matching
OSIS mechanism.  (There are a number format markers in USFM that are
less commonly used and some of those generated the suggestions in Chris'
post.)

We should expect that there will be a need for a few more enumerated
type values over time as people stretch the limits of USFM.  In our
normal pattern of operation we will address them as they come up.

PLEASE LETS NOT ENCOURAGE PEOPLE TO USE "x-" WHEN THERE ARE CONCRETE
MAPPINGS TO OSIS ELEMENTS.

Note: It is possible for a USFM file to use a format marker to purely
record how text should be laid out in a specific rendering.  I have seen
a case where a chapter started with a very short paragraph that only
filled one line and was followed by a second paragraph.  Because of the
drop cap chapter number they marked the second paragraph as something
other than a simple paragraph so that the text of the first line would
not be indented (not look nice due to the drop cap).  It is my belief
that these sorts of tricks should not be encoded in OSIS documents.

Todd


> -----Original Message-----
> From: osis-core-admin@bibletechnologieswg.org [mailto:osis-core-
> admin@bibletechnologieswg.org] On Behalf Of Patrick Durusau
> Sent: Thursday, February 19, 2004 2:56 PM
> To: osis-core@bibletechnologieswg.org
> Cc: Jim_Albright@wycliffe.org
> Subject: [osis-core] USFM and "x-" on attribute values
> 
> Greetings,
> 
> In some side traffic Jim Albright has responded to my suggestion of
> mapping USFM values (where necessary) to the type attributes with the
> "x-" extension that:
> 
> > Again this will mean a LOT OF STANDARDS when using the x-extention
> >
> > method of handling situations. I believe we have a reasonable set of
> > enumerated values that can be added. Including them in OSIS will
help
> > standardize the situation. Allowing everyone to choose x-what they
want
> > will put us back to where we came from...
> > Read MANY STANDARDS EQUALS NO STANDARD!!!!!
> 
> Interesting point. I was assuming that with a standardized set of "x-"
> values, that would solve the problem. By standardized I mean by the
> organization using them. Of course, that would mean that you would not
> get schema validation of the values for free, which is something I did
> not consider.
> 
> Fortunately, the schema is designed so that we can enumerate values
for
> the type attributes, without breaking prior versions (so long as we
are
> adding values) since all types without enumerated values were required
> to have the "x-" extensions anyway.
> 
> Since the USFM to OSIS case is a very important one, the most
important
> one I can imagine, what is the general feeling about incorporating
> (where necessary) the values required for the USFM to OSIS mapping?
> 
> Users who are not interested in the USFM values would be free to
ignore
> them, so I don't think there is a downside to including them.
> 
> Something similar to what we worked on in Dallas for type on <seg>.?
> 
> Contrarywise, it is a case of going from USFM, which is supposed to
> already be standarized, to OSIS. If one starts with a non-standardized
> file, then the result will vary from conversion to conversion.
> 
> Sounds like the conversion to OSIS would become the test of whether
the
> USFM file meet its own standardization practices. In other words, if
the
> values from the USFM file resulted in a non-allowable value for an
OSIS
> attribute, you get a schema validation error. Hmmm, that seems like a
> long way around to go in terms of getting validation on the usage in
the
> USFM file.
> 
> Thoughts? Comments?
> 
> Hope everyone is having a great day!
> 
> Patrick
> 
> --
> Patrick Durusau
> Director of Research and Development
> Society of Biblical Literature
> Patrick.Durusau@sbl-site.org
> Chair, V1 - Text Processing: Office and Publishing Systems Interface
> Co-Editor, ISO 13250, Topic Maps -- Reference Model
> 
> Topic Maps: Human, not artificial, intelligence at work!
> 
> 
> _______________________________________________
> osis-core mailing list
> osis-core@bibletechnologieswg.org
> http://www.bibletechnologieswg.org/mailman/listinfo/osis-core