[sword-devel] Project Gutenberg Etexts
Steve Tang
sword-devel@crosswire.org
Tue, 25 Jun 2002 06:14:57 -0600 (MDT)
> The problem with Project Gutenberg is that all the books are in plain
> ASCII, with NO markup. So you will need to insert paragraph breaks,
> minimally. You may wish to insert scripture reference tags, if present.
> And many pieces of markup like emphasized text have been lost thanks to
> Project Gutenberg.
Parsing natural language is difficult, certainly beyond our reach for the
moment. But just parsing chapters, or paragraphs should be relatively
straight forward and therefore perl-able.
>
> There's no (reasonable) possibilty of an automatic converter like thml2gbs
> for Gutenberg works since they lack markup and any kind of organization.
>
> Good luck though! I'm sure CCEL would be happy to take books like City of
> God if you put them into a good XML format.
>
> --Chris
>
I thought Sword has 'general book tool' but I don't know how to use it.
Steve Tang...