Module Making
The information below is intended to give you a short introduction. Some of it might be out of date. Please refer to our developer's wiki for in-depth info.
1. Introduction
A SWORD module consists of a set of binary files in any of an increasing number of formats created for SWORD plus a .conf file that specifies the location and attributes of the module.
The .conf file is located in a standard location, such as the mods.d directory of the SWORD install directory, that may be specified to The SWORD Engine in a number of ways that are outside the scope of this document. This file should be created with a standard text editor like notepad, emacs, vi, or pico. Its contents are described below in section II.
The module files themselves usually require an amount of pre-processing before they are ready to be imported to SWORD. How you go about this pre-processing is something you will need to decide for yourself. You may be able to do all of your pre-processing with simple search & replace operations in notepad or more complex regular expression search & replace operations in emacs, but the majority of modules will probably require even more complex editing using a scripting language such as Perl plus a fair amount of manual correction. On the other hand, some modules may come in a standard format such as ThML or OSIS encoded files, which does not require any modification, assuming they are valid documents. Document pre-processing is outside the scope of this document, but we will explain how you need to format documents to prepare them for import to SWORD, both in terms of encoding and markup.
Once you have a document ready for import, you will need to run it through an importer to create the SWORD module files, which will then be placed in the module directory you specify in your .conf file.
After this, you may test your work and consider submitting it to The SWORD Project for public distribution from our website.
2. .conf Files
2.1. .conf File Overview
.conf files contain all the information needed to make your
module be found and displayed correctly by The SWORD Engine. Each line consists of a key followed by its value. Here is
a sample .conf file:
[Sample]
DataPath=./modules/texts/ztext/sample/
ModDrv=zText
CompressType=ZIP
BlockType=Book
SourceType=GBF
CipherKey=1234ABCD5678EFGH
GlobalOptionFilter=GBFFootnotes
GlobalOptionFilter=GBFStrongs
GlobalOptionFilter=GBFHeadings
GlobalOptionFilter=GBFMorph
Feature=StrongsNumbers
Feature=DailyDevotion
Version=1.1
History_1.1=enciphered
module
Font=Ezra SIL
Category=Sample texts
Lang=he
Encoding=UTF-8
Description=The Sample Translation
About=This .conf
file is intended as a sample and \par {\i1 should not} be used
for any other purposes
LCSH=Bible. English.
DistributionLicense=Public Domain
TextSource=CCEL
Now let's go through it line by line:
The first line contains the name of the module in square
brackets.
DataPath is the path to the module relative to the SWORD module
root directory. All modules are typically stored under the modules
directory. Its subdirectories are texts, comments, lexdict, and genbook. The
texts directory currently has rawtext and ztext directories, which
contains directories for each of the installed Bible texts. The
comments directory contins zcom, hrefcom, rawcom, and rawfiles
directories, one for each of the used commentary formats. And the
lexdict directory has zld, rawld, and rawld4 directories, for each LD-related format. Currently, the genbook directory only contains the rawgenbook directory. This
sample text is a Bible, so it has been located in
./modules/texts/ztext/sample/.
2.2. .conf File Attributes
ModDrv indicates the driver used for reading the module, at the time of writing, the valid values for this field include: RawText, zText, RawCom, HREFCom, zCom, RawLD, RawLD4, zLD, RawGenBook, and RawFiles.
CompressType indicates what type of compression is used in
compressed modules (zText, zCom, & zLD). Currently supported compression
type include Zip and LZSS, with Zip being the preferred setting.
BlockType indicates the block granularity for verse-based compressed modules
(zText & zCom). Granularities include Book, Chapter, and Verse. Typically, Bibles should be compressed using Book granularity and commentaries
using Chapter granularity, but you should test them. The smaller the
granularity, the faster the module will be, but the larger it will
be as well.
BlockCount indicates the number of items compressed in a single block for block-based compressed modules (zLD). This can be any integer, with the default being 200. Higher values will make the module slower, but smaller.
SourceType can take a value of GBF, ThML, or OSIS to
indicate the markup format of the module data.
CipherKey is where you will put the unlock key for enciphered
modules. Leave a blank line ("CipherKey=") to indicate that the
module is enciphered but has no unlock key. (Omit for unlocked
modules.)
GlobalOptionFilter lines indicate additional filters that can be
run on the text. You may use multiple GlobalOptionFilter lines if your module has more than one of these features. The GBFFootnotes, GBFStrongs, GBFMorph, GBFHeadings, and GBFRedLetterWords filters each enable certain features of some GBF texts
to be toggled. Correspondingly, ThMLFootnotes, ThMLStrongs, ThMLMorph, ThMLHeadings, ThMLVariants, THMLScripref, and ThMLLemma apply to some ThML texts, and OSISStrongs and OSISMorph apply to some OSIS texts. Greek texts with diacritics may make use of the UTF8GreekAccents filter, regardless of markup, to toggle these features. And Hebrew texts with pointing and cantillation may make use of the UTF8HebrewPoints and UTF8Cantillation filters respectively, regadless of markup, to toggle these features.
Feature indicates special features of the text. You may use multiple Feature lines if your module has more than one of these features. Currently
supported values are: StrongsNumbers (for modules that include Strong's numbers), DailyDevotion (for daily devotionals using one of the LD drivers), GreekDef & HebrewDef (for modules with Strong's number encoded Greek or Hebrew definitions), Glossary (for collections of glosses using one of the LD drivers), GreekParse & HebrewParse (for modules with Greek or Hebrew morphology expansions).
Version is the module's revision number of the module.
Incrementing it when changes are made alerts users of the SWORD
Installers to the presence of updated modules. Please start with version 1.0 and increment by 0.1 for minor updates and by larger values for more major updates such as a new text source.
History lines alert users to what has changed between different
versions.
Font is the font to be used for display of the module. Omit this line to
use the default font. Do not make use of font-specific encodings in your documents, but use Unicode instead and the Private Use Area if necessary for codepoints that are not handled by Unicode.
Category allows the module to be placed in different module
categories by the installer and CrossWire's download page. We use
this to segregate modules with unorthodox teachings or that need
beta versions of the software. Omit this line to include a module in the main list of
modules.
Lang is the primary language code of the module and should include a value according to RFC 2066. ISO 639-1 codes
are the preferred code (e.g. en for English). If there is none for
the given language, use an ISO 639-2/T code (e.g. ceb for Cebuano).
See http://lcweb.loc.gov/standards/iso639-2/englangn.html for
ISO 639-1 and 639-2/T codes. In cases where no ISO 639 code is
available, use "x-E-" followed by the SIL Ethnologue code for the
language (e.g. x-E-KEK for Kekchi). If a text is country specific,
such as the Anglicized NIV, include the ISO 3166-1 country code
after the language code and an underscore (e.g. en_GB for UK
English). See http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html
for ISO 3166-1 codes.
Encoding is the encoding name for the module character encoding.
Currently, the only supported values are Latin-1 and UTF-8. (Latin-1
is default, but UTF-8 is preferred.)
Description is a short (1 line) description of the module, such
as its title.
About is a lengthier description and may include copyright,
source, etc. information. It may be formatted as RTF or as plain text.
TextSource is a description, either in prose (such as "CCEL") or as a URL of the source of the text.
LCSH is the Library of Congress Subject Heading. You may search the Library of Congress catalog at http://catalog.loc.gov/ or use it as a guide for determining an appropriate LCSH for books that are not in the Library of Congress.
DistributionLicense describes the license underwhich the module is being distributed. This value is case-sensitive and should be one of the following, "Public Domain", "Copyrighted", "Copyrighted; Permission to distribute granted to CrossWire", "Copyrighted; Free non-commercial distribution". If you need to use additional values, please notify us.
Copyright is the copyright notice for the work, including the year of copyright and the owner of the copyright.
CopyrightContactName is the name of the copyright holder.
CopyrightContactAddress is the mailing address of the copyright holder.
CopyrightContactEmail is the email address of the copyright holder.
DistributionSource indicates where the text may be found, such as a URL.
DistributionNotes indicates any additional notes about distribution of the module.
SwordVersionDate identifies the date of this version of the module.
MinimumVersion identifies the minimum version of the Sword library required for this module.
Obsoletes lists any modules that are made obsolete by this module (usually this indicates former names of the module)
3. Preparing a Text for Import
The SWORD Project currently requires that all submitted texts be Unicode (specifically UTF-8) encoded documents. We recommend that texts be marked up in OSIS, but will still accept those marked up in either ThML or GBF.
3.1. Encoding
For English language texts that only make use of ASCII characters, no change to the source encoding will be required. For other European language and most other languages, there probably exist simple encoding converters for ISO and national standards to UTF-8. For more complex source encodings, you may need to create your own converter or adapt an existing one. Some currently available conversion tools that you may find useful, depending on your platform and needs, include:
uconv (part of ICU), available compiled for Win32 at http://crosswire.org/ftpmirror/pub/sword/utils/win32/uconv.zip or in source format from ICU at http://oss.software.ibm.com/icu/.
font2uni from CCEL, available at http://www.ccel.org/info/gkheb/.
uconv is best suited for standard encodings and font2uni is best suited for font-specific encodings. When creating XML texts, the only entities that should be used are & for '&' and < for '<'. All other entities should be encoded as their UTF-8 equivalents.
3.2. Markup
Internally, SWORD can process text in one of three formats: GBF, ThML, and OSIS. From these formats, it can convert to other formats including RTF and HTML for display. GBF has been our preferred format in the past for Bibles and commentaries. ThML became our preferred format for commentaries, dictionaries, and general books. Now we are moving to the OSIS format, but are still at work bringing all of its features to SWORD. You may find documentation for each of these standards at their respective websites:
General Bible Format (GBF) : http://www.ebible.org/bible/gbf.htm
Theological Markup Format (ThML) : http://www.ccel.org/ThML/
Open Scriptural Information Standard (OSIS) : http://www.bibletechnologies.net/
In SWORD, for modules encoded with ThML and OSIS, each verse, dictionary entry, and book division needs to be well-formed XML or it will result in display problems in some frontends. SWORD only handles a subset of the tags used by each of these standards that we have found necessary, but we are willing to supporting additional tags, as the need arises.
Supported GBF tags include: <WG>, <WH>, <WTG>, <WTH>, <RX>, <RF>, <FI>, <FB>, <FN>, <FR>, <FS>, <FU>, <FO>, <FV>, <CA>, <CL>, <CG>, <CM>, <CT>, <JR>, <JC>, <JL>, <TT>, & <TS> (plus closing tags where appropriate). In addition, SWORD allows full use of UTF-8 rather than merely ASCII as the GBF standard specifies.
Supported ThML tags include: <sync> (with type parameters of Strongs, morph, & lemma), <scripRef>, and <note> (plus closing tags where appropriate). XHTML tags that ThML inherits, which may be used in SWORD modules include <div> (with types of sechead for section headings and title for titles, <i>, <br>, and <b>. Additional XHTML tags may be interpreted by those SWORD frontends that render HTML, but will not be translated to RTF for the Win32 frontend.
OSIS support is still being implemented, but we plan to support the full tagset.
3.3. Import formats
3.3.1. ThML and OSIS Formatted General Books
With ThML and OSIS formatted general books, provided your document is valid XML according to the ThML DTD or the OSIS Schema, you should not need to do any further processing. You can use your XML file with thml2gbs and xml2gbs.
3.3.1. vpl Format
vpl or verse-per-line format may only be used in creating Bibles. This format requries that each line start with a verse reference that SWORD can understand, such as "Genesis 1:1" or "Jn 3:16". Most English abbreviations are acceptable. Following the verse reference, the verse itself should be written. For example:
Genesis 1:1 In the beginning God created the heaven and the earth.
Genesis 1:2 And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
This format is used with the utility vpl2mod, discussed below. To import Bibles that have have combined verses, you will need to use imp format, instead of vpl.
3.3.2. imp Format
imp or import format may be used in creating all types of modules (Bibles, commentaries, dictionaries, daily devotionals, glossaries, general books, etc.). Each entry in an imp file may take as many lines as are needed. The first line of the entry will have a format such as "$$$<key>" and will be followed by all lines of text that should be included with that entry. So our above example in imp format would be written as:
$$$Genesis 1:1
In the beginning God created the heaven and the earth.
$$$Genesis 1:2
And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.
Commentaries would follow the same format, but would probably include a greater number of lines of text. If your Bible or commentary uses a single entry to handle multiple verses, simply give a list or range of verses as the key (e.g. "$$$Genesis 1:1-5", "$$$Exodus 1", "$$$Leviticus 1:1,5"). Lexicons, dictionaries, glossaries and daily devotionals would take a form such as:
$$$Adam
Adam was the first man created by God.
$$$Eve
Eve was the first woman created by God.
For daily devotionals, you must encode the key as "$$$mm.dd", such as "$$$01.01" for January 1st and "$$$12.31" for December 31st.
General books are encoded with each book division as a separate entry. The entries are then listed as a tree hierarchy with keys similar to a file system directory structure. For example, if you were encoding the Josephus' Works, you might have a structure like this:
$$$/War
The War of the Jews
$$$/War/Book 1
Book 1 of the War of the Jews
$$$/War/Book 1/Chapter 1
Chapter 1 of Book 1 of the War of the Jews
$$$/War/Book 1/Chapter 1/Section 1
Section 1 of Chapter 1 of Book 1 of the War of the Jews
$$$/War/Book 1/Chapter 1/Section 2
Section 2 of Chapter 1 of Book 1 of the War of the Jews
4. Importing
Now that your text is ready to be imported, you will need to use one of the command line utilities for converting documents to SWORD format. Depending on the format of your document at this point, you will need to use the appropriate importer.
If your text is a valid ThML document, use thml2gbs.
If your text is a valid OSIS document, use xml2gbs.
If your text is a vpl format Bible, use vpl2mod.
If your text is an imp format Bible or commentary, use imp2vs.
If your text is an imp format dictionary, lexicon, glossary, or daily devotional, use imp2ld.
If your text is an imp format general book, use imp2gbs.
You may find these files in the SWORD Project source distribution or compiled for Win32 at http://crosswire.org/ftpmirror/pub/sword/utils/win32/. Each utility has brief usage information that can be viewed by running it once without any arguments.
5. Additional Utilities
There are additional utilities that may be used on SWORD modules:
5.1. Compressing Modules
To compress a Bible, commentary, or LD module, use the mod2zmod utility. First you will need to install the module so that it can be accessed using the SWORD engine. Next, run "mod2zmod <modname> <datapath> [blockType [compressType]]". blockType can be 4 = book (default), 3 = chapter, or 1 = verse and indicates the granularity of the compression blocks. The larger the block is, the longer it will take to access a piece of the text, but the smaller the resulting module will be. compressType can be either 1 = LZSS (default) or 2 = Zip.
You may wish to try different compression settings to find out which is best for your module. Typically, we use chapter compression for large commentaries, book compression for Bibles, and the Zip compression algorithm
5.2. Locking Modules
To lock a rawText Bible or rawCom commentary module, use the cipherraw utility. Just run "cipherraw </path/to/module> '<key>'". I know of no limitations on the key, but we have historically used a 16 character key and it should be difficult to guess.
5.3. Checking for Missing Verses
You can use the utility emptyvss to find verses in a module that contain no text, since this may indicate errors in the module. Just run "emptyvss <module name>" to generate a list.
6. Submitting content to the SWORD Project
After you have tested your module, you may wish to submit it to the SWORD Project for public release so that other people can benefit from your work. All modules submitted to the SWORD Project for distribution either on the internet or on CDs should include both the module as a single document and the .conf file.
The module itself should be an uncompiled, plaintext document in
either vpl (verse-per-line), imp (import), ThML, or OSIS format, ready to be run through vpl2mod, one of the imp2* tools, or xml2gbs.
Before any module will be considered for posting, we expect that the following minimum set of tags be included in its .conf file: DataPath, ModDrv, Lang, Description, About, DistributionLicense, and TextSource. We also strongly prefer that an LCSH line be included with the .conf file, but will look the LCSH up ourselves if you have trouble deciding on a value. (You can look at other .conf files for examples.)
When you feel your module is ready to be submitted, you may
email it to modules@crosswire.org. If you
are unable to email it or would prefer to send the files by some other
means, you may contact us at the same email address, and we can
discuss other arrangements.
|