[sword-devel] SWORD Java support?
Joe Walker
sword-devel@crosswire.org
Fri, 18 Aug 2000 00:13:46 +0100
This is a multi-part message in MIME format.
--------------B1906975EECC0E99489BA1F1
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
"Troy A. Griffitts" wrote:
>
> Please. Many people have asked for a Java interface to our API and
> modules. The vss format is quite simple really. Rather than descibe
> it, I'd rather point you in the direction of
> sword/src/modules/common/rawverse.cpp This file contains the low level
> routines to operate on the .vss file, but basically it's a 6 byte
> record: struct { unsigned long offset, unsigned short size }
I've attached an initial stab.
> I've been following the premise of porting class for class from C++ to
> Java so that someone familiar with the API could use either language
> with a minimum learning curve. Any comments on this?
There are 2 versions - RawVerse follows your approach, and SwordBible
re-does the i/f
My 2c?
>From my point of view, closeness to the sword API won't help me much
because I don't know a huge amount about Sword.
But I would be helped from a clean i/f which I would not get from a
close port.
A good example is findoffset() which has 2 return params - not
possible in Java. You'll see I chose to create a quick Location inner
class to get around the problem, but it does make life messy.
The alternative in SwordBible just has a getText() method which is
far cleaner.
But I'm biased because I don't know Sword internals very well. (However
I guess I'm learning fairly rapidly) So the concensus may be different.
The reading works. The writing has not been tested, and will fail due
to endian problems.
There was a few bits of code to quietly tidy up if something went
wrong that i missed out in both versions because normally I want to
know sooner rather than later if it is broken. This is fine for
testing but the approach for 'live' may be different.
Open to your thoughts.
Joe.
--------------B1906975EECC0E99489BA1F1
Content-Type: text/plain; charset=us-ascii;
name="RawVerse.java"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="RawVerse.java"
package com.eireneh.bible.book.sword;
import java.io.*;
/**
* Code for class 'RawVerse'- a module that reads raw text files
* ot and nt using indexs ??.bks ??.cps ??.vss and provides lookup and parsing
* functions based on class VerseKey
*/
public class RawVerse
{
/** constant for the introduction */
public static final int TESTAMENT_INTRO = 0;
/** constant for the old testament */
public static final int TESTAMENT_OLD = 1;
/** constant for the new testament */
public static final int TESTAMENT_NEW = 2;
/**
* RawVerse Constructor - Initializes data for instance of RawVerse
* @param path - path of the directory where data and index files are located.
* be sure to include the trailing separator (e.g. '/' or '\')
* (e.g. 'modules/texts/rawtext/webster/')
*/
public RawVerse(String path) throws FileNotFoundException
{
idx_raf[TESTAMENT_OLD] = new RandomAccessFile(path + "ot.vss", "r");
idx_raf[TESTAMENT_NEW] = new RandomAccessFile(path + "nt.vss", "r");
txt_raf[TESTAMENT_OLD] = new RandomAccessFile(path + "ot", "r");
txt_raf[TESTAMENT_NEW] = new RandomAccessFile(path + "nt", "r");
// The original had a dtor that did the equiv of .close()ing the above
// I'm not sure that there is a delete type ability in Book.java and
// the finalizer for RandomAccessFile will do it anyway so for the
// moment I'm going to ignore this.
// The original also stored the path, but I don't think it ever used it
// The original also kept an instance count, which went unused (and I
// noticed in a few other places so it is either c&p or a pattern?
// Either way the assumption that there is only one of a static is not
// safe in many java environments (servlets, ejbs at least) so I've
// deleted it
}
/**
* Finds the offset of the key verse from the indexes
* @param testament testament to find (0 - Bible/module introduction)
* @param idxoff offset into .vss
* @param start address to store the starting offset
* @param size address to store the size of the entry
*/
public Location findOffset(int testament, long idxoff) throws IOException
{
Location loc = new Location();
// There was a bodge here to move testament around if someone wanted
// to read the intro? We just have the set of static finals above
// if (testament == 0)
// testament = idx_raf[1] == null ? 1 : 2;
// There was a test here to check ensure that is idx_raf[testament-1]
// was null then we returned an default Location (of 0,0). However
// This seems like papering over any errors so I have left it out for
// the time being
// I've now totally re-written this because we did have byte-sex
// problems. The file is little endian, and we read big endianly.
// read the next 6 byes.
idx_raf[testament].seek(idxoff*6);
byte[] read = new byte[6];
idx_raf[testament].readFully(read);
int[] temp = new int[6];
for (int i=0; i<temp.length; i++)
{
temp[i] = read[i] >= 0 ? read[i] : 256 + read[i];
System.out.println("temp["+i+"]="+temp[i]);
}
loc.start = (temp[3] << 24) | (temp[2] << 16) | (temp[1] << 8) | temp[0];
loc.size = (temp[5] << 8) | temp[4];
// the original lseek used SEEK_SET. This is the only option in Java
// The *6 is because we use 4 bytes for the offset, and 2 for the length
// There used to be some code at the start of the method like:
// idxoff *= 6;
// But itn't good to alter parameters and here is the only place that
// it is used.
// There was some BIGENDIAN swapping stuff here. To be honest I
// can't be bothered to think about whether or not this is needed
// right now.
// *start = lelong(*start);
// *size = leshort(*size);
// There was also some code here to patch over any errors if you
// could only read one of the 2 bytes from above. I'm not sure that
// that is a good idea, so I've left it out.
return loc;
}
/**
* Gets text at a given offset.
* @param testament testament file to search in (0 - Old; 1 - New)
* @param loc Where to read from
*/
public String getText(int testament, Location loc) throws IOException
{
// The original had the size param as an unsigned short.
// It also used SEEK_SET as above (default in Java)
byte[] buffer = new byte[loc.size];
txt_raf[testament].seek(loc.start);
txt_raf[testament].read(buffer);
// We should probably think about encodings here?
return new String(buffer);
}
/**
* Prepares the text before returning it to external objects
* @param buf buffer where text is stored and where to store the prep'd text
*/
protected String prepText(String text)
{
StringBuffer buf = new StringBuffer(text);
boolean space = false;
boolean cr = false;
boolean realdata = false;
char nlcnt = 0;
int to = 0;
for (int from=0; from<buf.length(); from++)
{
switch (buf.charAt(from))
{
case 10:
if (!realdata)
continue;
space = (cr) ? false : true;
cr = false;
nlcnt++;
if (nlcnt > 1)
{
// buf.setCharAt(to++, nl);
buf.setCharAt(to++, '\n');
// nlcnt = 0;
}
continue;
case 13:
if (!realdata)
continue;
buf.setCharAt(to++, '\n');
space = false;
cr = true;
continue;
}
realdata = true;
nlcnt = 0;
if (space)
{
space = false;
if (buf.charAt(from) != ' ')
{
buf.setCharAt(to++, ' ');
from--;
continue;
}
}
buf.setCharAt(to++, buf.charAt(from));
}
// This next line just ensured that we were null terminated.
// buf.setCharAt(to, '\0');
// There followed a lot of code that stomed \o to the end of the
// string if there was whitespace there. trim() is easier.
return buf.toString().trim();
}
/**
* Sets text for current offset
* @param testament testament to find (0 - Bible/module introduction)
* @param idxoff offset into .vss
* @param buf buffer to store
*/
protected void setText(int testament, long idxoff, String buf) throws IOException
{
// As in getText() we don't alter the formal parameter
// idxoff *= 6;
// As in getText() There was some messing around with testament
// if (testament == 0)
// testament = idx_raf[1] == null ? 1 : 2;
// outsize started off being unsigned
// and it looks like "unsigned short size;" is not used
short outsize = (short) buf.length();
// There was some more BIGENDIAN nonsense here. Again ignoring the
// MACOSX bits it looked like:
// start = lelong(start);
// outsize = leshort(size);
// I've also moved things around very slightly, the endian bits came
// just before the writeShort();
idx_raf[testament].seek(idxoff*6);
long start = idx_raf[testament].readLong();
idx_raf[testament].writeShort(outsize);
// There is some encoding stuff to be thought about here
byte[] data = buf.getBytes();
txt_raf[testament].seek(start);
txt_raf[testament].write(data);
}
/**
* Creates new module files
* @param path Directory to store module files
*/
public static void createModule(String path) throws IOException
{
truncate(path + "ot.vss");
truncate(path + "nt.vss");
truncate(path + "ot");
truncate(path + "nt");
// I'm not at all sure what these did. I'd guess they wrote data to
// the files we just created? But how they'd neatly (or otherwise) go
// about this is beyond me right now.
// RawVerse rv(path);
// VerseKey mykey("Rev 22:21");
}
/**
* Create an empty file, deleting what was there
*/
private static void truncate(String filename) throws IOException
{
// The original code did something like this. I recon this basically
// deleted and recreated (empty) the named file.
// unlink(buf);
// fd = FileMgr::systemFileMgr.open(buf, O_CREAT|O_WRONLY|O_BINARY, S_IREAD|S_IWRITE);
// FileMgr::systemFileMgr.close(fd);
File file = new File(filename);
file.delete();
file.createNewFile();
}
/**
* There has to be a better method than this. findoffset() returned a start
* and and offset, and multiple return values are not possible in Java.
* It seems to me that returning start and size from a public i/f represents
* showing our callers more than we should and I expect that the solution
* lies in a thorough sorting out if the interface, but I want to keep
* the methods unchanged as reasonable right now.
*/
public class Location
{
/** Where does the data start */
public long start = 0;
/** The data length. Is short long enough? the original was unsigned short */
public int size = 0;
/**
* Debug only
*/
public String toString()
{
return "start="+start+", size="+size;
}
}
/**
* A test program
*/
public static void main(String[] args)
{
try
{
// To start with I'm going to hard code the path
String path = "/usr/apps/sword/modules/texts/rawtext/kjv/";
RawVerse verse = new RawVerse(path);
Location loc = verse.findOffset(RawVerse.TESTAMENT_NEW, 6);
String pre = verse.getText(RawVerse.TESTAMENT_NEW, loc);
System.out.println("loc="+loc);
System.out.println("pre="+pre);
System.out.println("post="+verse.prepText(pre));
}
catch (Exception ex)
{
ex.printStackTrace();
}
}
/** The array of index files */
private RandomAccessFile[] idx_raf = new RandomAccessFile[3];
/** The array of data files */
private RandomAccessFile[] txt_raf = new RandomAccessFile[3];
}
--------------B1906975EECC0E99489BA1F1
Content-Type: text/plain; charset=us-ascii;
name="SwordBible.java"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="SwordBible.java"
package com.eireneh.bible.book.sword;
import java.io.*;
/**
*
*/
public class SwordBible
{
/** constant for the introduction */
public static final int TESTAMENT_INTRO = 0;
/** constant for the old testament */
public static final int TESTAMENT_OLD = 1;
/** constant for the new testament */
public static final int TESTAMENT_NEW = 2;
/**
* Open the sword data files.
* @param path - path of the directory where data and index files are located.
*/
public SwordBible(String path) throws FileNotFoundException
{
idx_raf[TESTAMENT_OLD] = new RandomAccessFile(path + "ot.vss", "r");
idx_raf[TESTAMENT_NEW] = new RandomAccessFile(path + "nt.vss", "r");
txt_raf[TESTAMENT_OLD] = new RandomAccessFile(path + "ot", "r");
txt_raf[TESTAMENT_NEW] = new RandomAccessFile(path + "nt", "r");
}
/**
* Finds the offset of the key verse from the indexes
* @param testament testament to find (0 - Bible/module introduction)
* @param idxoff offset into .vss
* @param start address to store the starting offset
* @param size address to store the size of the entry
*/
public String getText(int testament, long idxoff) throws IOException
{
long start;
int size;
// Read the next 6 byes.
idx_raf[testament].seek(idxoff*6);
byte[] read = new byte[6];
idx_raf[testament].readFully(read);
// Un-2s-complement them
int[] temp = new int[6];
for (int i=0; i<temp.length; i++)
{
temp[i] = read[i] >= 0 ? read[i] : 256 + read[i];
}
// The data is little endian - extract the start and size
start = (temp[3] << 24) | (temp[2] << 16) | (temp[1] << 8) | temp[0];
size = (temp[5] << 8) | temp[4];
// Read from the data file.
// I wonder if it would be safe to do a readLine() from here.
// Probably be safer not to risk it since we know how long it is.
byte[] buffer = new byte[size];
txt_raf[testament].seek(start);
txt_raf[testament].read(buffer);
// We should probably think about encodings here?
return new String(buffer);
}
/** The array of index files */
private RandomAccessFile[] idx_raf = new RandomAccessFile[3];
/** The array of data files */
private RandomAccessFile[] txt_raf = new RandomAccessFile[3];
/**
* Quick test program
*/
public static void main(String[] args)
{
try
{
// To start with I'm going to hard code the path
String path = "/usr/apps/sword/modules/texts/rawtext/kjv/";
SwordBible data = new SwordBible(path);
String text = data.getText(SwordBible.TESTAMENT_NEW, 6);
System.out.println("text="+text);
}
catch (Exception ex)
{
ex.printStackTrace();
}
}
}
--------------B1906975EECC0E99489BA1F1--