[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?

Mon Jul 29 09:44:54 EDT 2024

On Mon, Jul 29, 2024 at 3:26 AM Aaron Rainbolt <arraybolt3 at gmail.com> wrote:

> On Sun, 28 Jul 2024 23:08:33 -0500
> Greg Hellings <greg.hellings at gmail.com> wrote:
> >
> > Now, to switch to the idea of a specialized SWORD protocol to address
> > the user who does not want to fetch the entirety of a module: why?
> > The library can already generate HTML documents and document
> > fragments. Just do the rendering on the server and pass the fragment
> > to the client over HTTP. Wrap the rendered string into a JSON object
> > if you need to. Why try to pass the binary blob of some random data
> > to the remote unit when you could already render it on the server?
>
> The idea is to make it so that *existing* SWORD clients can be able to
> access data on remote servers without downloading the whole thing. I
> laid out some reasons why this is helpful in certain use cases in my
> first email. Existing SWORD clients are meant to retrieve information
> from libsword and then render it in somme way, thus to maximize the
> possibility of adoption, my hope was to implement in libsword the
> ability to fetch "raw" data from a remote server and then pass it
> through to the client, which already has code for rendering it however
> the client chooses. Ideally a client should need to do nothing more
> than point an SWMgr object at the remote server and then use it exactly
> the same way it would use a local repository (perhaps with some extra
> error checks for things like timeouts, interrupted connections, and
> whatnot).
>

A few caveats that might dissuade you from sending the raw data:

The raw data can be in multiple formats, depending on the source module and
fields in use. OSIS, TEI, ThML, RTF, and GBF are the most common formats.
Meanwhile there are also a few major output formats that might be
requested. HTML, RTF, plain text are the most common. But I think there
might also be support for TeX? I'm not 100% sure. All of that metadata is
stored in the module configuration file and not at the verse level in the
resulting file. Why force the client to handle this parsing and consume the
extra information when it could just slurp up the requested format as
rendered by the server. On top of that the configuration file will have
information on if there are lemmas, footnotes, and other information all of
which will inform how to render a particular blob of text.

Much simpler to just tell the server, "Please send me HTML and strip out
footnotes" than to try and encode all of that, send it to the client, and
then render it there. Every round trip would essentially need the fully
parsed config file to travel with it in your proposed raw form.

> > A simple REST library written in something like Go could easily be
> > linked to the libsword C library. It could query libsword to get the
> > list of modules and expose them, along with certain query parameters
> > specifying the format request. Then serve the resulting text over
> > HTTP. So a client library could hit something like
> > http://mylibrary.com/texts/KJV/Gen/1/1?format=html and it will get back
> > {"osisRef": "Gen.1.1", "text": "<p>In the beginning...</p>"}. You
> > wouldn't need to write some low level application protocol. You would
> > save the client device from needing to render the text and have extra
> > knowledge of the module. You wouldn't have to alter the library in
> > any fashion.
>
> This is similar to what I was thinking. I wasn't sure if JSON was the
> best wrapper to do it in, but I don't see any reason to use
> anything else, other than SWORD's apparent preference for XML-like
> formats. However my "text" field would probably look more like "text":
> "$$$Revelation of John 22:19\n<w lemma="strong:G2532 lemma.TR:και"
> morph="robinson:CONJ" src="1">And</w>..." or some such (this is what
> mod2imp spit out when I used it to get an example).
>

There are two difficulties with this as I can see. The first I mentioned in
the paragraph above. You and I can look at this and recognize it as OSIS,
but that information is only encoded in the configuration file. Likewise,
telling the renderer that lemmas and morphology are supported is encoded
there. Why not just allow the library to handle that?

Secondly, the library has no knowledge of the import format. Knolwedge of
parsing that block from an input formatted file is known only in the
imp2mod utility. That utility is not linked into the library. It is
frequently distributed with the library, but it is not an inherent part of
it. Similarly the mod2imp utility is not a portion of the library. Those
two applications handle the generating and parsing of the input format, and
no attempt is made by either one to preserve the full round-trip
functionality of the underlying encoding format. You would need to
re-implement parsing of that format into your library and you would lose
the extra data that is not held internal to the passage requested.

I don't see any benefit to transmitting the module in this way when you
could just ask the server for the rendered text. Especially if you are
looking at this from the type of limited device where such a feature would
be helpful, you might as well offload as much of the processing work as
possible to the server. Just ask it for the format you want to render, and
consume what it sends you.

> > A simple application like this could be written up, distributed in a
> > static binary, and anyone would be able to hit it for a REST
> > accessed, rendered format of a given text. Going back to the goal of
> > simplicity: this application could be run by anyone on any computer
> > where a SWORD library already existed, and it could serve the
> > baseline of those peoples' needs.
> >
> > That's just an idea I've had bouncing around in my head for a long
> > time. I just have no need to access the scripture over REST or I
> > would have already written it. All the bits are already out there.
> > There are lots of good REST frameworks, every language with them has
> > the ability to encode JSON, and most of the popular ones we have
> > bindings for the language in (Python, PHP, Java) or it can easily be
> > integrated directly (CGO).
>
> This is a really good idea, and if this is going beyond what libsword
> is designed for, that's probably the route I'll take. I have a
> preference for C# for development tasks along these lines so I'll
> probably try to resurrect that first (the actual VS solution in
> libsword is *old* but the SWIG code should be up-to-date, so I don't
> imagine it will be *too* hard to get it going again - failing that,
> C++/Qt is probably my next choice though Qt is a bit of a strange
> choice for a server application). Then I'll probably implement something
> more-or-less like what you're mentioning here. Might not catch on, but
> if nothing else it will be interesting.
>

I don't know about C# bindings for libsword. And I'm not completely sure
the VS build files ever worked. Maybe they did, but if so it was a long
time ago. When I have compiled for Windows I used the MinGW tools from
Linux to just cross-compile the library and tools. I also have written a
grand total of about 20 lines of C# in my life, so I can't really give you
any specifics on leveraging it. I know it at least used to be very well
regarded in web services, but it has been a long time since I have seen
greenfield web work written in it. But I have not kept my finger on the
pulse of that.

I don't know that Qt is going to give you very good Web Server support.
It's a great utility library but that's not really its goal. Their
documentation page on their web server includes this warning: "Qt HTTP
Server does not have many of the more advanced features and optimizations
that general-purpose HTTP servers have. It also has not seen the same
scrutiny regarding various attack vectors over the network. Use Qt HTTP
Server, therefore, only for local connections or in a trusted network, and
do not expose the ports to the internet."

I've been in professional web dev at various parts of my career. For new
work I would only tackle either Python, NodeJS, or Golang right now. Python
we have Swig support for, NodeJS we don't really have support for in the
library but there is an existing application called Ezra which is written
in NodeJS and probably has enough support for the needs here. And of course
Golang (and Rust) can easily tap into an existing C library with no real
effort required.

So yeah - my suggestion would be to push the rendering off into the server
code rather than trying to get the raw format over to the client app. And
to leverage the easiest bindings. Either Swig or one of the languages that
is easy to link against the raw library.

--Greg

> Hope you're doing well,
> Aaron
>
> > --Greg
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240729/39647fee/attachment.htm>