[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?

Fri Jul 19 04:26:09 EDT 2024

On 19.07.24 08:59, Aaron Rainbolt wrote:
> hmm, it sounds like you're thinking something along the lines of
> remotely accessing the SWORD modules *directly*? My thought was
> something more along the lines of a SWORD client making a call to a
> specialized SWORD server that read the file and returned the desired
> verse references or whatever for it. The server would take some sort of
> syntax as input and then spit out basically the same kind of info that
> mod2imp spits out, then send it back over the wire for the client's
> libsword to parse, mutate, and eventually hand to whatever rendering
> engine the frontend uses. Trying to remotely access parts of a SWORD
> module sounds like a nightmare, and if the modules were loaded all at
> once it would kind of undo the point since we already can download
> entire modules at once. I think most of your concerns below are solved
> by this way of doing things (although I'm sure it comes with its own
> fun set of problems).

Ah yes, my apologies! I suppose I somehow misread and got stuck to 
presenting my perspective on things and wrote a hugely irrelevant reply. 
Please disregard.

> As for the actual issues themselves, on a scale of 1 to 10 how
> difficult do you think it would be to have a repository descriptor file
> that is located in a predictable place and that contains data about
> what modules exist on the server and how to find them? This is
> more-or-less what the apt package manager does and given Debian and
> Ubuntu's success it seems to work well.

Depends on what you mean by difficulty. I suppose it might only take a 
few working days for the initial design of a simple file format for your 
protocol.
>> Another obstacle to defining a new repository format/protocol is that
>> there is no complete and sound formal specification for the module
>> configuration file format and its fields. The descriptions in the
>> SWORD wiki are incomplete and contain ambiguity.
> 
> This is not something that my server idea overcomes, so I'll think
> about that. Perhaps it would be worth digging into just that and
> overcoming it by strictly defining the configuration file format?

You can use certainly use libsword in the backend, but I don't think 
that the protocol you're suggesting has to be directly tied to the SWORD 
configuration format. What I mistakingly wrote about was on an entirely 
lower level, and these SWORD specifics might not concern you at all. If 
you want something like this:

   libsword <-> server <-NEW PROTOCOL-> client

then it might make sense to ignore everything below the libsword API and 
perhaps also the libsword API itself when designing the new protocol. 
Use the API, but don't inherit it. In this perspective perhaps 
"SWORD-over-network" is a slight misnomer?

>> While perhaps not strictly be a blocker to creating a new repository
>> format/protocol, but there are no formal specifications for the
>> module content and content index files. I remember these formats
>> having being described as internal libsword details which don't
>> require specification, because the format and libsword might change.
>> However, I think this reasoning is incorrect, because files of these
>> formats are exchanged over the wire, used in multiple repositories
>> not all which are managed by Crosswire, and libsword wants to retain
>> backwards compatibility with older modules as well.
> 
> I agree with you w.r.t. the shortcomings of this. It also makes me
> realize that it means that the libsword on the server would have to be
> "close enough" to the libsword of the client in order for my server
> idea to work, because otherwise the server's libsword will send markup
> data that the client can't process. If backwards compatibility is still
> maintained, some way of transferring versioning information over the
> wire might be enough.

I again apologize for the confusion I caused by my reply. I'll try to 
entangle this.

When accessing modules the libsword API abstracts away the (outer) 
format of the content and index files, and presents to the user a way to 
read individual content entries (e.g. verses). So this outer (or 
container) format might not be of concern to you. The entries themselves 
are fragments of OSIS, ThML, TEI, GBF, plain text etc, but the exact 
formats are also somewhat underspecified. Libsword allows you to apply 
"filters" (transformations) on these entries, including ones which 
convert the entries to other formats, e.g. to (fragments of) HTML.

It is up to the protocol if and where (client or server) any filters are 
applied.

>> In my opinion the repository format should not much depend on the
>> underlying transport protocol (HTTP(S), FTP, local filesystem) and
>> should not require special handling on the server side. For HTTP this
>> means that all repository files may be served statically on a regular
>> web server without requiring extra server-side scripting. Just files
>> and directories, no parsing of directory indexes, only retrieval of
>> regular files by their path.
> 
> Hmm, I don't see how this is really possible in a "retrieve part of a
> module" situation. I mean it probably would work if you used HTTP
> partial downloads to retrieve the blocks of files you want, but that
> sounds like it would probably require quite a lot of HTTP requests to
> load one chapter from a module, which would probably put undue load on
> the server and slow down the client.

Have you thought about the possibility of generating all assets (e.g. 
entries/verses, and possibly multiple different versions thereof) on the 
server-side statically?

Best regards,
Jaak Ristioja