[sword-devel] How to get vowels in the Arabic bible module?
Sebastien Koechlin
seb.sword at koocotte.org
Thu Jan 24 01:18:39 MST 2008
On Wed, Jan 23, 2008 at 08:33:52PM -0800, Chris Little wrote:
> > While exporting to xhtml is clearly of benefit, the "as usual" is still
> > a few steps to steep for me to do in a couple of hours.
> >
> > Do you have a transformation tool from xhtml to osis or thml?
>
> As usual (for me) is writing a script to convert whatever text I have
> into OSIS. (I always use Perl because it's simple and has the best regex
> implementation I know of.) You just have to look for patterns and try to
> exploit them to tease out a regularized text.
I did the same. You can find the script at
http://koocotte.org/darby/darbywork.pl I wrote a Makefile in the same
directory.
Using the Makefile, I do the transformation in many steps:
1. Download the last version (the writer already produce a html version from
ms-word to download).
2. Correct 'ms-word' apostrophes 'perl -p -e "s/’/\'/gm;" $input > $output'
3. Run the file througt 'tidy --word-2000 ...' to get well formed xhtml
4. Patch at low level to correct some markup errors, add a fake chapter 1 for books
without chapters...
5. Here is my big perl script, I read twice the file, the first time to get
notes, the second to produce a "light" OSIS file.
6. Using XSLT, I put headers on my "light" file to have a real OSIS file.
After this, I have rules to produce the '.conf' file also using XSLT; rules
to check the OSIS file, rules to compile the sword module, rules to install
it, rules to produce HTML files using XSLT, rules to zip all this for
downloaders, and rules to clean.
Perl regexs are really powerfull, but I hit some bugs resulting in
segmentation fault with UTF-8 text. Use a revision system (I use CVS).
About make and a Makefile, I can correct anything in any program. I only
have to run 'make', to rebuild needed files and install module. It save me a
lot of time.
--
Seb, autocuiseur
More information about the sword-devel
mailing list