[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?

Mon Jul 29 13:45:45 EDT 2024

On Mon, 29 Jul 2024 08:44:54 -0500
Greg Hellings <greg.hellings at gmail.com> wrote:

> On Mon, Jul 29, 2024 at 3:26 AM Aaron Rainbolt <arraybolt3 at gmail.com>
> wrote:
> 
> > On Sun, 28 Jul 2024 23:08:33 -0500
> > Greg Hellings <greg.hellings at gmail.com> wrote:
> > >
> > > Now, to switch to the idea of a specialized SWORD protocol to
> > > address the user who does not want to fetch the entirety of a
> > > module: why? The library can already generate HTML documents and
> > > document fragments. Just do the rendering on the server and pass
> > > the fragment to the client over HTTP. Wrap the rendered string
> > > into a JSON object if you need to. Why try to pass the binary
> > > blob of some random data to the remote unit when you could
> > > already render it on the server?
> >
> > The idea is to make it so that *existing* SWORD clients can be able
> > to access data on remote servers without downloading the whole
> > thing. I laid out some reasons why this is helpful in certain use
> > cases in my first email. Existing SWORD clients are meant to
> > retrieve information from libsword and then render it in somme way,
> > thus to maximize the possibility of adoption, my hope was to
> > implement in libsword the ability to fetch "raw" data from a remote
> > server and then pass it through to the client, which already has
> > code for rendering it however the client chooses. Ideally a client
> > should need to do nothing more than point an SWMgr object at the
> > remote server and then use it exactly the same way it would use a
> > local repository (perhaps with some extra error checks for things
> > like timeouts, interrupted connections, and whatnot).
> >
> 
> A few caveats that might dissuade you from sending the raw data:
> 
> The raw data can be in multiple formats, depending on the source
> module and fields in use. OSIS, TEI, ThML, RTF, and GBF are the most
> common formats. Meanwhile there are also a few major output formats
> that might be requested. HTML, RTF, plain text are the most common.
> But I think there might also be support for TeX? I'm not 100% sure.
> All of that metadata is stored in the module configuration file and
> not at the verse level in the resulting file. Why force the client to
> handle this parsing and consume the extra information when it could
> just slurp up the requested format as rendered by the server. On top
> of that the configuration file will have information on if there are
> lemmas, footnotes, and other information all of which will inform how
> to render a particular blob of text.

The problem is that nothing but the raw data will work in the way I
hope. What's Xiphos going to do with pre-rendered data? Or Bibletime?
Or Ezra, or SWORDWeb, or Bishop, or AndBible? These all get raw info
from the library and render it, using the render helpers in libsword
to do so (if I remember correctly, it's been a while since I tried to
write a SWORD client). If you give them prerendered data, they aren't
going to know what to do with that. Again, my hope was for *existing
SWORD clients* to be able to add support for use of remote repositories
with as little effort as possible. I don't expect the developers of any
of these apps to spend their valuable time adding support for an
entirely new input format (prerendered JSON) to their apps. I suspect
they'd be willing to spend thirty minutes or so to add network support
using the approach I'm suggesting though.

Again, this might be out of scope for SWORD. If it is, then sending
prerendered data is probably fine since only custom clients will be
able to use the server anyway. But that would undo the point of the
project somewhat because then only custom clients will be able to use
the server, meaning that the number of clients that would exist for it
will probably be low. My hope was that one day someone would be able to
just point Xiphos or whatever at, say, eBible.org, and have instant
access to the whole library stored there. If the client and server have
to be separate from SWORD, then there probably won't be that many
clients, or users, and therefore no real reason to adopt the server,
which would make the project interesting but not very useful.

(This isn't me griping that "no one likes my great idea" for the
record. I sent a message to the mailing list in the first place to see
if people liked the idea, so I didn't waste my time and theirs trying
to do it without input.)

> Much simpler to just tell the server, "Please send me HTML and strip
> out footnotes" than to try and encode all of that, send it to the
> client, and then render it there. Every round trip would essentially
> need the fully parsed config file to travel with it in your proposed
> raw form.
> 
> 
> > > A simple REST library written in something like Go could easily be
> > > linked to the libsword C library. It could query libsword to get
> > > the list of modules and expose them, along with certain query
> > > parameters specifying the format request. Then serve the
> > > resulting text over HTTP. So a client library could hit something
> > > like http://mylibrary.com/texts/KJV/Gen/1/1?format=html and it will get back
> > > {"osisRef": "Gen.1.1", "text": "<p>In the beginning...</p>"}. You
> > > wouldn't need to write some low level application protocol. You
> > > would save the client device from needing to render the text and
> > > have extra knowledge of the module. You wouldn't have to alter
> > > the library in any fashion.
> >
> > This is similar to what I was thinking. I wasn't sure if JSON was
> > the best wrapper to do it in, but I don't see any reason to use
> > anything else, other than SWORD's apparent preference for XML-like
> > formats. However my "text" field would probably look more like
> > "text": "$$$Revelation of John 22:19\n<w lemma="strong:G2532
> > lemma.TR:και" morph="robinson:CONJ" src="1">And</w>..." or some
> > such (this is what mod2imp spit out when I used it to get an
> > example).
> >
> 
> There are two difficulties with this as I can see. The first I
> mentioned in the paragraph above. You and I can look at this and
> recognize it as OSIS, but that information is only encoded in the
> configuration file. Likewise, telling the renderer that lemmas and
> morphology are supported is encoded there. Why not just allow the
> library to handle that?
> 
> Secondly, the library has no knowledge of the import format.
> Knolwedge of parsing that block from an input formatted file is known
> only in the imp2mod utility. That utility is not linked into the
> library. It is frequently distributed with the library, but it is not
> an inherent part of it. Similarly the mod2imp utility is not a
> portion of the library. Those two applications handle the generating
> and parsing of the input format, and no attempt is made by either one
> to preserve the full round-trip functionality of the underlying
> encoding format. You would need to re-implement parsing of that
> format into your library and you would lose the extra data that is
> not held internal to the passage requested.

hmm, good point. Still, there's *some* sort of "raw" data you can get
out of a module, and it's that data that you then put through a
"filter" or some such, right? And if a JSON blob is being sent anyway,
the server may as well send metadata about a module to a client that
requests it, so it knows how to process the info.

> I don't see any benefit to transmitting the module in this way when
> you could just ask the server for the rendered text. Especially if
> you are looking at this from the type of limited device where such a
> feature would be helpful, you might as well offload as much of the
> processing work as possible to the server. Just ask it for the format
> you want to render, and consume what it sends you.
> 
> 
> > > A simple application like this could be written up, distributed
> > > in a static binary, and anyone would be able to hit it for a REST
> > > accessed, rendered format of a given text. Going back to the goal
> > > of simplicity: this application could be run by anyone on any
> > > computer where a SWORD library already existed, and it could
> > > serve the baseline of those peoples' needs.
> > >
> > > That's just an idea I've had bouncing around in my head for a long
> > > time. I just have no need to access the scripture over REST or I
> > > would have already written it. All the bits are already out there.
> > > There are lots of good REST frameworks, every language with them
> > > has the ability to encode JSON, and most of the popular ones we
> > > have bindings for the language in (Python, PHP, Java) or it can
> > > easily be integrated directly (CGO).
> >
> > This is a really good idea, and if this is going beyond what
> > libsword is designed for, that's probably the route I'll take. I
> > have a preference for C# for development tasks along these lines so
> > I'll probably try to resurrect that first (the actual VS solution in
> > libsword is *old* but the SWIG code should be up-to-date, so I don't
> > imagine it will be *too* hard to get it going again - failing that,
> > C++/Qt is probably my next choice though Qt is a bit of a strange
> > choice for a server application). Then I'll probably implement
> > something more-or-less like what you're mentioning here. Might not
> > catch on, but if nothing else it will be interesting.
> >
> 
> I don't know about C# bindings for libsword. And I'm not completely
> sure the VS build files ever worked. Maybe they did, but if so it was
> a long time ago. When I have compiled for Windows I used the MinGW
> tools from Linux to just cross-compile the library and tools. I also
> have written a grand total of about 20 lines of C# in my life, so I
> can't really give you any specifics on leveraging it. I know it at
> least used to be very well regarded in web services, but it has been
> a long time since I have seen greenfield web work written in it. But
> I have not kept my finger on the pulse of that.
> 
> I don't know that Qt is going to give you very good Web Server
> support. It's a great utility library but that's not really its goal.
> Their documentation page on their web server includes this warning:
> "Qt HTTP Server does not have many of the more advanced features and
> optimizations that general-purpose HTTP servers have. It also has not
> seen the same scrutiny regarding various attack vectors over the
> network. Use Qt HTTP Server, therefore, only for local connections or
> in a trusted network, and do not expose the ports to the internet."

Didn't realize that about Qt, thank you for letting me know.

> I've been in professional web dev at various parts of my career. For
> new work I would only tackle either Python, NodeJS, or Golang right
> now. Python we have Swig support for, NodeJS we don't really have
> support for in the library but there is an existing application
> called Ezra which is written in NodeJS and probably has enough
> support for the needs here. And of course Golang (and Rust) can
> easily tap into an existing C library with no real effort required.

I hate both Python and Golang :P and I'm not a big fan of NodeJS
either. Rust I have no experience with, though it looks interesting. I
personally would rather work in C++ (as odd as that may sound).

> So yeah - my suggestion would be to push the rendering off into the
> server code rather than trying to get the raw format over to the
> client app. And to leverage the easiest bindings. Either Swig or one
> of the languages that is easy to link against the raw library.

Makes sense, I'll take that under consideration.

Have a blessed day!
Aaron

> --Greg
> 
> 
> > Hope you're doing well,
> > Aaron
> >
> > > --Greg
> > _______________________________________________
> > sword-devel mailing list: sword-devel at crosswire.org
> > http://crosswire.org/mailman/listinfo/sword-devel
> > Instructions to unsubscribe/change your settings at above page
> >