[sword-svn] r89 - in trunk: . modules modules/calvinscommentaries python python/swordutils python/swordutils/xml
lukeplant at www.crosswire.org
lukeplant at www.crosswire.org
Thu Jul 19 15:51:33 MST 2007
Author: lukeplant
Date: 2007-07-19 15:51:32 -0700 (Thu, 19 Jul 2007)
New Revision: 89
Added:
trunk/modules/calvinscommentaries/
trunk/modules/calvinscommentaries/README
trunk/modules/calvinscommentaries/calvinscommentaries.conf
trunk/modules/calvinscommentaries/combine_calcom.py
trunk/python/
trunk/python/swordutils/
trunk/python/swordutils/__init__.py
trunk/python/swordutils/xml/
trunk/python/swordutils/xml/__init__.py
trunk/python/swordutils/xml/combine.py
trunk/python/swordutils/xml/thml.py
trunk/python/swordutils/xml/utils.py
Log:
Added Python library of various tools for making modules, and
specific script for creating a combined Calvin's Commentaries module
Added: trunk/modules/calvinscommentaries/README
===================================================================
--- trunk/modules/calvinscommentaries/README (rev 0)
+++ trunk/modules/calvinscommentaries/README 2007-07-19 22:51:32 UTC (rev 89)
@@ -0,0 +1,45 @@
+
+Conversion of Calvin's commentaries into OSIS format and a Sword module
+
+Requirements:
+-------------
+- ThML sources: calcom??.xml files, as downloaded from CCEL.
+ For convenience, a recent version of the files can be downloaded here:
+ http://lukeplant.me.uk/misc/sword/calcom_sources.tar.bz2
+ Extract this file.
+- thml2osis.xslt from
+ http://crosswire.org/svn/sword-tools/trunk/thml2osis/xslt/
+- xsltproc for processing the above
+- Python for script that combines calcom??.xml files
+- Python swordutils library:
+ http://crosswire.org/svn/sword-tools/trunk/python
+ A checkout of this directory should be in your PYTHONPATH
+
+Make the module
+---------------
+
+$ ./combine_calcom.py calcom_sources/calcom??.xml
+(output stored in calvinscommentaries.thml)
+$ xsltproc --novalid path/to/thml2osis.xslt calvinscommentaries.thml > calvinscommentaries.osis
+
+TODO
+- convert OSIS commentary to Sword module
+
+Explanation of these steps
+--------------------------
+1) 'Correct' some of the ThML files. In particular, change the
+ 'scripCom' tags so that they enclose the text they refer to,
+ rather than just come at the beginning of it.
+ This is done as part of combine_calcom.py
+
+2) Combine all the ThML files into one big one, and at the same time:
+ - modify the header information, using one of the calcom??.xml files
+ as a template
+ - make any corrections necessary to the ThML for the new context
+
+ Output: calvinscommentaries.thml
+
+3) Convert to OSIS, using thml2osis.xslt
+
+4) TODO - convert to Sword module. The current osis2mod utility expects
+ commentaries to be marked up like Bibles.
Added: trunk/modules/calvinscommentaries/calvinscommentaries.conf
===================================================================
--- trunk/modules/calvinscommentaries/calvinscommentaries.conf (rev 0)
+++ trunk/modules/calvinscommentaries/calvinscommentaries.conf 2007-07-19 22:51:32 UTC (rev 89)
@@ -0,0 +1,17 @@
+[CalvinsCommentaries]
+DataPath=./modules/comments/zcom/calvinscommentaries/
+ModDrv=zCom
+BlockType=CHAPTER
+SourceType=OSIS
+CompressType=ZIP
+Lang=en
+Description=Calvin's Collected Commentaries
+About=John Calvin's commentaries on many books of the Bible, collected \
+into a single volume from material found at Christian Classics Ethereal Library \par \
+Converted to Sword module format by Luke Plant <L.Plant.98 at cantab.net>
+Version=1.0
+Encoding=UTF-8
+LCSH=Bible--Commentaries.
+DistributionLicense=Public Domain
+TextSource=http://www.ccel.org/
+MinimumVersion=1.5.2
Added: trunk/modules/calvinscommentaries/combine_calcom.py
===================================================================
--- trunk/modules/calvinscommentaries/combine_calcom.py (rev 0)
+++ trunk/modules/calvinscommentaries/combine_calcom.py 2007-07-19 22:51:32 UTC (rev 89)
@@ -0,0 +1,78 @@
+#!/usr/bin/env python
+
+# Converts the source calcom??.xml files into a single
+# ThML file, with corrections made to allow it to be
+# used as a Sword module
+
+#------------------------------------------------------------
+# CONFIG
+
+PUBLISHERID = u"lukeplant.me.uk"
+
+#------------------------------------------------------------
+
+from xml.dom import minidom
+from xml import xpath
+from datetime import datetime
+from swordutils.xml import thml, utils
+from swordutils.xml.utils import RemoveNode, GeneralReplaceContents, ReplaceContents, do_replacements
+from swordutils.xml.combine import LazyNodes
+import sys
+
+
+now = datetime.now() # for general timestamping purposes
+
+
+def do_head_replacements(doc):
+
+ corrections = {
+ "//DC.Title[@sub='Main']": ReplaceContents(u"Calvin's Combined Commentaries"),
+ "//DC.Title[@sub='authTitle']": RemoveNode(),
+ "//DC.Title[@sub='Alternative']": RemoveNode(),
+ "//printSourceInfo": ReplaceContents(u"<published>Multiple printed works, Baker</published>"),
+ "//electronicEdInfo/bookID": ReplaceContents(u"calvincommentaries"),
+ "//DC.Identifier": RemoveNode(), # TODO - new identifier?
+ "//electronicEdInfo/editorialComments":
+ GeneralReplaceContents(lambda t: u"Multiple ThML files combined into single ThML file by a script. Original editoral comments: " + t),
+ "//electronicEdInfo/revisionHistory":
+ GeneralReplaceContents(lambda t: unicode(now.strftime('%Y-%m-%d')) + u": Multiple ThML files combined into single ThML file by a script. Original revision history:" + t),
+ "//electronicEdInfo/publisher": ReplaceContents(PUBLISHERID),
+
+ }
+ do_replacements(doc, corrections)
+
+def do_body_corrections(doc):
+ # Correct <scripCom>
+ rootNode = utils.getRoot(doc)
+ thml.expandScripComNodes(rootNode)
+ # Other corrections
+ corrections = {
+ # id attributes can now contain duplicates due to combination
+ # of multiple files, so we remove them all.
+ "//@id": RemoveNode(),
+
+ }
+ do_replacements(doc, corrections)
+
+def combine(templatefile, allfiles):
+ # Get the main one
+ templatexml = minidom.parse(templatefile)
+ mainBody = utils.getNodesFromXPath(templatexml, '//ThML.body')[0]
+ mainBody.childNodes = []
+ do_head_replacements(templatexml)
+ # The following childNodes will be lazily evaluated as
+ # templatexml.writexml iterates over them
+ mainBody.childNodes = LazyNodes(templatexml, allfiles, do_body_corrections, '//ThML.body')
+
+ fh = open('calvinscommentaries.thml', 'wb')
+ utils.writexml(templatexml, fh)
+ fh.close()
+
+def main(filenames):
+ combine(filenames[0], filenames)
+
+if __name__ == "__main__":
+ if len(sys.argv) < 2:
+ print "Usage: ./combine_and_correct.py filename.xml [filename2.xml ...]"
+ sys.exit(1)
+ main(sys.argv[1:])
Property changes on: trunk/modules/calvinscommentaries/combine_calcom.py
___________________________________________________________________
Name: svn:executable
+ *
Name: svn:eol-style
+ native
Added: trunk/python/swordutils/__init__.py
===================================================================
Property changes on: trunk/python/swordutils/__init__.py
___________________________________________________________________
Name: svn:eol-style
+ native
Added: trunk/python/swordutils/xml/__init__.py
===================================================================
Property changes on: trunk/python/swordutils/xml/__init__.py
___________________________________________________________________
Name: svn:eol-style
+ native
Added: trunk/python/swordutils/xml/combine.py
===================================================================
--- trunk/python/swordutils/xml/combine.py (rev 0)
+++ trunk/python/swordutils/xml/combine.py 2007-07-19 22:51:32 UTC (rev 89)
@@ -0,0 +1,29 @@
+# Utilities for combining multiple module source files
+# into one.
+
+from xml.dom import minidom
+from swordutils.xml import utils
+
+class LazyNodes(object):
+ # Pulling all the documents in at once uses up too much memory.
+ # This class is responsible for acting as a replacement
+ # 'childNodes' which loads documents one at a time,
+ # does corrections on them and spews out the body nodes
+ def __init__(self, maindoc, files, alterationfunc, nodepath):
+ self.maindoc = maindoc # Don't actually need this
+ self.files = files
+ self.iterated_count = 0
+ self.nodepath = nodepath
+ self.alterationfunc = alterationfunc
+
+ def __iter__(self):
+ self.iterated_count += 1
+ if self.iterated_count == 2:
+ # We've got a big performance bug if this happens.
+ raise Exception('Performance bug')
+ for f in self.files:
+ doc = minidom.parse(f)
+ self.alterationfunc(doc)
+ body = utils.getNodesFromXPath(doc, self.nodepath)[0]
+ for n in body.childNodes:
+ yield n
Property changes on: trunk/python/swordutils/xml/combine.py
___________________________________________________________________
Name: svn:eol-style
+ native
Added: trunk/python/swordutils/xml/thml.py
===================================================================
--- trunk/python/swordutils/xml/thml.py (rev 0)
+++ trunk/python/swordutils/xml/thml.py 2007-07-19 22:51:32 UTC (rev 89)
@@ -0,0 +1,87 @@
+# Utility functions for manipulating ThML
+
+from xml.dom import minidom
+from swordutils.xml import utils
+
+
+def isScripCom(node):
+ return node.nodeName == u'scripCom'
+
+def findParentDiv(node):
+ pnode = node.parentNode
+ if pnode is None:
+ raise Exception("Cannot find parent div for node %r" % node)
+ if pnode.nodeType == minidom.Document.ELEMENT_NODE \
+ and pnode.nodeName.startswith(u'div'):
+ return pnode
+ else:
+ return findParentDiv(pnode)
+
+def moveToParent(node, destParent):
+ if node.parentNode is destParent:
+ return
+ else:
+ pnode = node.parentNode
+ pnode.removeChild(node)
+ pnode.parentNode.insertBefore(node, pnode)
+ return moveToParent(node, destParent)
+
+def _findNextScripComNode(node, return_parent):
+ if node is None:
+ return None
+ if isScripCom(node):
+ if return_parent:
+ return node.parentNode
+ else:
+ return node
+
+ else:
+ # Search deeper, but return node that is on the
+ # same level as our original node
+ descendent = _findNextScripComNode(node.firstChild, True)
+ if descendent is not None:
+ if return_parent:
+ return descendent.parentNode
+ else:
+ return descendent
+ else:
+ return _findNextScripComNode(node.nextSibling, False)
+
+def _expandScripComNode(scNode):
+ nextSCN = _findNextScripComNode(scNode.nextSibling, False)
+ collection = []
+ n = scNode.nextSibling
+ while (n is not None and n is not nextSCN):
+ collection.append(n)
+ n = n.nextSibling
+ for n in collection:
+ n.parentNode.removeChild(n)
+ scNode.appendChild(n)
+
+def expandScripComNodes(node):
+ """Expands all empty <scripCom> nodes so that they contain
+ the nodes that they refer to, using neighboring <scripCom>
+ nodes and the structure of the XML as a guide,
+ starting at the supplied node"""
+
+ if isScripCom(node):
+ # Often placed as markers instead of enclosing
+ # the nodes to which they apply.
+ if node.nodeValue is None or node.nodeValue == "":
+ # Try to find scope over which the <scripCom> element
+ # should actually be placed.
+ # Rules:
+ # - move the scripCom element 'up' the tree until is
+ # a descendent of a `divX' node, placing it before
+ # any of its parent nodes along the way
+ # - make all its sibling nodes that are below it
+ # into child nodes, up to the point where there
+ # is another <scripCom> element
+ div = findParentDiv(node)
+ moveToParent(node, div)
+ _expandScripComNode(node)
+
+ if node.childNodes.length > 0:
+ for n in node.childNodes:
+ expandScripComNodes(n)
+
Property changes on: trunk/python/swordutils/xml/thml.py
___________________________________________________________________
Name: svn:eol-style
+ native
Added: trunk/python/swordutils/xml/utils.py
===================================================================
--- trunk/python/swordutils/xml/utils.py (rev 0)
+++ trunk/python/swordutils/xml/utils.py 2007-07-19 22:51:32 UTC (rev 89)
@@ -0,0 +1,65 @@
+# General XML utilities
+
+from xml.dom import minidom
+from xml import xpath
+import codecs
+
+def getFileWriter(fileHandle):
+ """Gets a 'writer' for a file object that encodes
+ as UTF-8"""
+ return codecs.lookup("UTF-8").streamwriter(fileHandle)
+
+def writexml(doc, fileHandle):
+ """Writes an XML document to a file handle"""
+ doc.writexml(getFileWriter(fileHandle), encoding="UTF-8")
+
+def getNodesFromXPath(document, path):
+ """Selects nodes specified by 'path' from 'document',
+ where path is a string or a compiled xpath object"""
+ if isinstance(path, basestring):
+ path = xpath.Compile(path)
+ return path.select(xpath.CreateContext(document))
+
+_rootxpath = xpath.Compile('/')
+def getRoot(doc):
+ """Returns the root node of a document"""
+ return getNodesFromXPath(doc, _rootxpath)[0]
+
+
+# Classes to help us with modifications
+class RemoveNode:
+ def act(self, node):
+ if isinstance(node, minidom.Attr):
+ node.ownerElement.removeAttribute(node.name)
+ else:
+ node.parentNode.removeChild(node)
+
+class GeneralReplaceContents:
+ """Replace the contents of a node,
+ with user providable function for calculating replacement text
+ """
+ def __init__(self, replacefunc):
+ self.replacefunc = replacefunc
+ def act(self, node):
+ origText = u''.join(c.toxml() for c in node.childNodes)
+
+ # Usually replacefunc will just return text,
+ # but we allow it to return xml as well
+ newNodes = minidom.parseString(u'<dummy>' + self.replacefunc(origText) + u'</dummy>' )
+ # newNodes is a DOM instance, and it is has a dummy
+ # element wrapping the nodes we actually want.
+ node.childNodes = newNodes.childNodes[0].childNodes
+
+class ReplaceContents(GeneralReplaceContents):
+ def __init__(self, replacementtext):
+ assert isinstance(replacementtext, unicode)
+ def _replacefunc(text):
+ return replacementtext
+ self.replacefunc = _replacefunc
+
+def do_replacements(doc, replacements):
+ ctx = xpath.CreateContext(doc)
+ for path, action in replacements.items():
+ xp = xpath.Compile(path)
+ for n in xp.select(ctx):
+ action.act(n)
Property changes on: trunk/python/swordutils/xml/utils.py
___________________________________________________________________
Name: svn:eol-style
+ native
More information about the sword-cvs
mailing list