[sword-devel] search idea

Paul Gear sword-devel@crosswire.org
Thu, 13 Jan 2000 09:22:15 +0000


Matthias Ansorg wrote:

> On Tue, 4 Jan 2000, Trevor Jenkins wrote:
>
> >>>(By the way, where is a specification of ThML?)
> >>
> >> Try http://www.ccel.org/ThML/ThML1.0.htm
> >
> >Thanks. At first glance this is a real heavy weight markup scheme in the
> >sense that all the element names are very long. I'd like to see some
> >minimisation there. But comments upon ThML will have to wait until I've got
> >more time.
>
> Right. It is the same problem like with HTML where ThML is based on: unnecessarily long tag
> names blow up the file size without addional information. The specifications of ThML were 90K in
> version 0.93 (PDF) and 240K in version 1.0 (HTML) - this is not due to the new features ...
> Another issue is the ability to memorize the tag names when learning a markup language.

If file size is a problem, throw in a gzip in the file I/O part.  I understand zlib works pretty well
there, and bzip2 also has a library.  Far better to do that than to have a markup where you're not
sure whether '<bt>' means 'book title' or 'bibliography text'.  Long tag names take up more space, but
this can be overcome with compression, and the benefits for understandability are enormous.  (And if
you start complaining about too many keystrokes, i'll start talking about macros...  ;-)

Basing ThML on HTML has another big benefit: most editors already understand 75% of the tags.  This is
important if we are ever to get a large body of literature that is freely available.

Paul
---------
"He must become greater; i must become less." - John 3:30
http://www.bigfoot.com/~paulgear