[osis-core] On using OpenOffice as an OSIS editor

Kirk Lowery osis-core@bibletechnologieswg.org
Fri, 14 Jun 2002 08:25:23 -0400


Note that there is *already* underway a project by the Linux 
Documentation folks to do the same for the DocBook DTD so that 
OpenOffice can be used for the Linux Documentation Project, which uses 
DocBook for their work. I can get the info for you about who exactly is 
doing it, their status and what they've learned.

I think we're going to see a lot of DTD/Schemas "ported" to OpenOffice, 
so an implementation path is going to be well-worn...

Kirk

Patrick Durusau wrote:
> Harry,
> 
> Just a brief and inadequte reply to your post on OpenOffice. ;-)
> 
> I have been using it for several weeks and while sometimes slow, seems 
> fairly stable.
> 
> Not certain that we would have to use macros. The issue has arisen in 
> TEI land (again!) of how to get users better tools for entering markup. 
> One suggestion has been to use OpenOffice (with styles) and an XSLT 
> stylesheet to convert the underlying XML from OpenOffice XML into TEI. 
> (This originated in a discussion between Sebastian Rahtz and myself over 
> writing an export filter for OpenOffice. Since OpenOffice has a native 
> XML format, XSLT would be a simple way to test the difficulty of going 
> from OpenOffice XML to TEI, without the overhead of writing the filter.)
> 
> If I get some time this weekend, I may try to input a chapter or so, 
> probably the Matthew chapter that has been the subject of so much 
> discussion, to see what sort of XML we would get from OpenOffice with no 
> tweaking. Might be a good measure of how much trouble we would encounter 
> with such an approach.
> 
> Thanks!
> 
> Patrick
> 
> Harry Plantinga wrote:
> 
>> Preface: I've thought for years how to make ThML easy
>> for a non-XML-user to edit and I haven't yet come up with a solution 
>> that gets the documents all the way
>> to the valid XML stage. I've tried using Word as an
>> editor with a custom stylesheet and macros, and that's
>> about the best solution I've had, but it leaves quite
>> a bit of work for an expert to correct markup, validate
>> the document, convert to XML, etc. Often several hours
>> per document.
>> I'd more or less given up on Word because I want the
>> resulting documents to be valid XML, not requiring additional work. 
>> (Requiring an XML expert to finish up
>> documents is a major bottleneck in the pipeline, to mix
>> metaphores slightly.) The obvious approach is an XML editor, and this 
>> summer I'm experimenting with XMetaL. 
>> In theory it is a very nice approach. You can edit in
>> a view that looks as wysiwyg as CSS can make it. You can write Word 
>> import macros and save in HTML
>> or PDF as well as XML. You can preview in a browser
>> with XSLT and CSS styling.  You can add macros, buttons,
>> and the like to the user interface.
>>
>> In practice, it's working out reasonably well. The main
>> gotchas are that the software is poorly documented in some
>> cases, slow, buggy, and possibly in flux (Corel recently
>> bought SoftQuad). Oh, and it costs hundreds of dollars
>> and runs only on Windows.
>> Reading up on the archive for this list, I came across
>> teh discussion about using OpenOffice, and I thought I'd
>> give it another look. (Last time I checked, it couldn't print, etc.)  
>> I expected to report that it wouldn't be
>> appropriate without extensive source code hacking, for
>> the same reason that Word isn't great: the content model
>> is pretty flat and basic, making it hard to use to validate
>> more complex content models.
>> ============= summary ========================
>>
>> I came away from my exploration thinking that one could
>> do a pretty decent job of an OSIS editor with fairly
>> extensive macro programming but no source-code hacking. Maybe a few 
>> months' effort. There are sufficient UI interface elements to do a 
>> decent to good job, but not great: I doubt it will be possible to 
>> prevent illegal structure entry. It'll require a "validate" button and 
>> a validation process to correct errors before the document can be 
>> saved in OSIS format.
>>
>> =========== about OpenOffice ==========================
>>
>> OpenOffice has several modules: word processor, spreadsheet,
>> drawing program, presentation program, etc. All use XML
>> as their native file format. The suite has recently been
>> released in Version 1.0.  It's quite a full featured near-clone of 
>> Microsoft Office, and it works quite well.
>> There are still lots of little gotchas in reading or saving
>> Microsoft Office documents though. OpenOffice is free, open,
>> and available for Windows, Mac, Linux, Unix, etc.  Download
>> from www.openoffice.org
>>
>>
>> =========== about OpenOffice's text DTD module =========
>>
>> The openoffice DTD has many modules. One, called text, is
>> the primary one for the word processor, though it doesn't
>> contain the table elements.  It has 181 elements, including 84 with 
>> content model PCDATA and 38 EMPTY. The
>> main structure is that sections contain paragraph-level
>> elements (p, h, lists, tables, indexes, etc.).  Paragraph-
>> level elements contain inline elements (PCDATA, span, tabstop,
>> bookmark, drawing, a, set-page-variable, reference-mark-start,
>> footnote-ref, etc. etc.).
>>
>> It doesn't have a nice mapping to OSIS, but it may be possible
>> to "fake it" as described below.
>>
>> ========= Proposal for editing OSIS with OpenOffice =========
>>
>> osisText/header:
>>  - store information in predefined openOffice elements or    an 
>> openOffice element field element of type user-defined.
>>  - make an openOffice form to enter the data in the document.
>>
>> OSIS front, body, back
>>  - use OpenOffice section elements
>>
>> OSIS divs
>>  - use outlining facility of OpenOffice. Specifically, use the    <h> 
>> element, which has a numeric level attribute.   - each heading is the 
>> start of a new div, with the heading
>>    level giving the nesting depth of the div
>>  - each div ends at the next heading paragraph
>>  - text of the heading could be used as the divTitle
>>  - maybe display the heading in reverse video to show that    the text 
>> of the heading is not part of the document flow
>>  - bonus: OpenOffice "outline view" would show the div structure
>>    of the document.
>>
>> OSIS linegroups
>>  - the list facility appears to have sufficient sturcture
>>    to handle lines and linegroups
>>
>> Verses split across paragraph boundaries
>>  - select the text of the verse and click a "verse" button. A    macro 
>> could prompt for verse identifier and add prev and    next attributes 
>> to span any paragraph boundaries.
>>
>> Loading, saving
>>  - a macro or plug-in could read and save OSIS documents.  - bonus: 
>> importing Word documents, saving HTML and PDF.
>>
>> Word-level markup.  - I suppose you could do it wiht a combination of 
>> macros,    spans, and user-defined field elements.
>>
>> -Harry
>>
> 


-- 
Kirk E. Lowery, Ph.D.
Director, Westminster Hebrew Institute
Adjunct Professor of Old Testament
Westminster Theological Seminary, Philadelphia