[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?

Mon Jul 29 00:08:33 EDT 2024

So looking at this, you need to understand the goal of the libsword library
and its repository system.

The goal for a repository is to support the simplest methods of access.
Especially to support access by people who have no network so that an
entire repository can be loaded directly onto a CD, DVD, USB stick, or
other external media and passed around. Libsword can then install directly
from that. FTP was first implemented because it allowed the same super
simplistic process of pointing an FTP server at a working repository and
then anyone who can access that FTP server can also access and install
modules from there.

The goal of repository access and installs is not to create or define a
standard. It is, rather, to have the very simplest and easiest access
possible. Others have come through and implemented some parsing for the
HTML served up over HTTP/HTTPS which can be used if libsword is compiled
with the optional libcurl support. When I first helped contribute to those
code bases they strove to support both the Apache and Nginx form of HTML
that was served up by the automatic indexing those servers offer. Again,
the goal was not to provide cryptographic security, it was not to sign
files, it was not to define a standardized server process, or use some more
robust standard like WebDAV or what have you. The goal was simplicity.
Initially to allow iPhone users access to remote repositories while on cell
networks where FTP was blocked by many carriers and possibly the device
itself to some extent. Again, the goal was simplicity of access - pointing
the root of a folder to a working repository installation would allow
someone to remotely access all of the resources on that remote repository.

So the rest of the concerns were not addressed because they are not part of
the goal. It's not the goal of the process to specify a strict format the
repository needs to be exposed by over the network nor to ensure
cryptographically signed files are transmitted. If those are needs for
someone's use cases, then they should be implemented outside of the SWORD
library and its native support. The goal of the library is to be very
small, very fast, and as broadly portable as possible to more or less any
device for which there is a C compiler available, and the goal of its
support for remote repositories is to make it as simple as possible to get
the data onto those devices. Thus, no standardized parser is required
(though anyone using the library is free to extend its code to use one)
because that becomes less portable and more heavyweight. Libcurl isn't even
required - though without it access to HTTP/HTTPS sources vanishes because
the library does not provide an implementation of that.

Again, small size, speed, and nimbleness are the goals of the library.
Anything else that needs to be implemented for someone's requirements is up
to them to implement above the library's level. Nothing stops someone from
writing an application that connects over WebDAV to a server, fetches the
SWORD files, checks them against cryptographic signatures, and uses well
known libraries to handle all of that. But it's not the goal of libsword to
offer that. That is much higher friction than the goal of the underlying
library.

------

Now, to switch to the idea of a specialized SWORD protocol to address the
user who does not want to fetch the entirety of a module: why? The library
can already generate HTML documents and document fragments. Just do the
rendering on the server and pass the fragment to the client over HTTP. Wrap
the rendered string into a JSON object if you need to. Why try to pass the
binary blob of some random data to the remote unit when you could already
render it on the server?

A simple REST library written in something like Go could easily be linked
to the libsword C library. It could query libsword to get the list of
modules and expose them, along with certain query parameters specifying the
format request. Then serve the resulting text over HTTP. So a client
library could hit something like
http://mylibrary.com/texts/KJV/Gen/1/1?format=html and it will get back
{"osisRef": "Gen.1.1", "text": "<p>In the beginning...</p>"}. You wouldn't
need to write some low level application protocol. You would save the
client device from needing to render the text and have extra knowledge of
the module. You wouldn't have to alter the library in any fashion.

A simple application like this could be written up, distributed in a static
binary, and anyone would be able to hit it for a REST accessed, rendered
format of a given text. Going back to the goal of simplicity: this
application could be run by anyone on any computer where a SWORD library
already existed, and it could serve the baseline of those peoples' needs.

That's just an idea I've had bouncing around in my head for a long time. I
just have no need to access the scripture over REST or I would have already
written it. All the bits are already out there. There are lots of good REST
frameworks, every language with them has the ability to encode JSON, and
most of the popular ones we have bindings for the language in (Python, PHP,
Java) or it can easily be integrated directly (CGO).

--Greg

On Fri, Jul 19, 2024 at 4:19 PM Aaron Rainbolt <arraybolt3 at gmail.com> wrote:

> On Fri, 19 Jul 2024 11:26:09 +0300
> Jaak Ristioja <jaak at ristioja.ee> wrote:
>
> ...snip...
>
> > > As for the actual issues themselves, on a scale of 1 to 10 how
> > > difficult do you think it would be to have a repository descriptor
> > > file that is located in a predictable place and that contains data
> > > about what modules exist on the server and how to find them? This is
> > > more-or-less what the apt package manager does and given Debian and
> > > Ubuntu's success it seems to work well.
> >
> > Depends on what you mean by difficulty. I suppose it might only take
> > a few working days for the initial design of a simple file format for
> > your protocol.
>
> I meant how to solve the issue in libsword itself with finding and
> downloading modules. My server shouldn't need a configuration file at
> all I don't think.
>
> > >> Another obstacle to defining a new repository format/protocol is
> > >> that there is no complete and sound formal specification for the
> > >> module configuration file format and its fields. The descriptions
> > >> in the SWORD wiki are incomplete and contain ambiguity.
> > >
> > > This is not something that my server idea overcomes, so I'll think
> > > about that. Perhaps it would be worth digging into just that and
> > > overcoming it by strictly defining the configuration file format?
> >
> > You can use certainly use libsword in the backend, but I don't think
> > that the protocol you're suggesting has to be directly tied to the
> > SWORD configuration format. What I mistakingly wrote about was on an
> > entirely lower level, and these SWORD specifics might not concern you
> > at all. If you want something like this:
> >
> >    libsword <-> server <-NEW PROTOCOL-> client
> >
> > then it might make sense to ignore everything below the libsword API
> > and perhaps also the libsword API itself when designing the new
> > protocol. Use the API, but don't inherit it. In this perspective
> > perhaps "SWORD-over-network" is a slight misnomer?
>
> Well it's not really a misnomer because I want a graph more like this:
>
>     SWORD repo -> libsword <-> server <-NEW PROTOCOL-> libsword <->
>     client
>
> i.e., libsword would be able to natively support the new protocol so
> that clients that wished to use it could do so *almost* transparently.
> That way any SWORD client could add network support with minimal
> effort, rather than having to be built specifically to use the new
> protocol. (The system would still work pretty good even if libsword
> doesn't support the protocol and special support is needed, but
> probably no existing SWORD clients would pick up support for it.)
>
> > >> While perhaps not strictly be a blocker to creating a new
> > >> repository format/protocol, but there are no formal specifications
> > >> for the module content and content index files. I remember these
> > >> formats having being described as internal libsword details which
> > >> don't require specification, because the format and libsword might
> > >> change. However, I think this reasoning is incorrect, because
> > >> files of these formats are exchanged over the wire, used in
> > >> multiple repositories not all which are managed by Crosswire, and
> > >> libsword wants to retain backwards compatibility with older
> > >> modules as well.
> > >
> > > I agree with you w.r.t. the shortcomings of this. It also makes me
> > > realize that it means that the libsword on the server would have to
> > > be "close enough" to the libsword of the client in order for my
> > > server idea to work, because otherwise the server's libsword will
> > > send markup data that the client can't process. If backwards
> > > compatibility is still maintained, some way of transferring
> > > versioning information over the wire might be enough.
> >
> > I again apologize for the confusion I caused by my reply. I'll try to
> > entangle this.
> >
> > When accessing modules the libsword API abstracts away the (outer)
> > format of the content and index files, and presents to the user a way
> > to read individual content entries (e.g. verses). So this outer (or
> > container) format might not be of concern to you. The entries
> > themselves are fragments of OSIS, ThML, TEI, GBF, plain text etc, but
> > the exact formats are also somewhat underspecified. Libsword allows
> > you to apply "filters" (transformations) on these entries, including
> > ones which convert the entries to other formats, e.g. to (fragments
> > of) HTML.
> >
> > It is up to the protocol if and where (client or server) any filters
> > are applied.
>
> That makes sense.
>
> > >> In my opinion the repository format should not much depend on the
> > >> underlying transport protocol (HTTP(S), FTP, local filesystem) and
> > >> should not require special handling on the server side. For HTTP
> > >> this means that all repository files may be served statically on a
> > >> regular web server without requiring extra server-side scripting.
> > >> Just files and directories, no parsing of directory indexes, only
> > >> retrieval of regular files by their path.
> > >
> > > Hmm, I don't see how this is really possible in a "retrieve part of
> > > a module" situation. I mean it probably would work if you used HTTP
> > > partial downloads to retrieve the blocks of files you want, but that
> > > sounds like it would probably require quite a lot of HTTP requests
> > > to load one chapter from a module, which would probably put undue
> > > load on the server and slow down the client.
> >
> > Have you thought about the possibility of generating all assets (e.g.
> > entries/verses, and possibly multiple different versions thereof) on
> > the server-side statically?
>
> I guess that would be possible but at that point I'll basically have
> created a new repository format and be porting SWORD modules to it.
>
> Thanks for the feedback!
> Aaron
>
> > Best regards,
> > Jaak Ristioja
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240728/9340eb49/attachment.htm>