[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?

Mon Jul 29 15:33:34 EDT 2024

On Mon, Jul 29, 2024 at 12:45 PM Aaron Rainbolt <arraybolt3 at gmail.com>
wrote:

> On Mon, 29 Jul 2024 08:44:54 -0500
> Greg Hellings <greg.hellings at gmail.com> wrote:
>
> > On Mon, Jul 29, 2024 at 3:26 AM Aaron Rainbolt <arraybolt3 at gmail.com>
> > wrote:
> >
> > > On Sun, 28 Jul 2024 23:08:33 -0500
> > > Greg Hellings <greg.hellings at gmail.com> wrote:
> > > >
> > > > Now, to switch to the idea of a specialized SWORD protocol to
> > > > address the user who does not want to fetch the entirety of a
> > > > module: why? The library can already generate HTML documents and
> > > > document fragments. Just do the rendering on the server and pass
> > > > the fragment to the client over HTTP. Wrap the rendered string
> > > > into a JSON object if you need to. Why try to pass the binary
> > > > blob of some random data to the remote unit when you could
> > > > already render it on the server?
> > >
> > > The idea is to make it so that *existing* SWORD clients can be able
> > > to access data on remote servers without downloading the whole
> > > thing. I laid out some reasons why this is helpful in certain use
> > > cases in my first email. Existing SWORD clients are meant to
> > > retrieve information from libsword and then render it in somme way,
> > > thus to maximize the possibility of adoption, my hope was to
> > > implement in libsword the ability to fetch "raw" data from a remote
> > > server and then pass it through to the client, which already has
> > > code for rendering it however the client chooses. Ideally a client
> > > should need to do nothing more than point an SWMgr object at the
> > > remote server and then use it exactly the same way it would use a
> > > local repository (perhaps with some extra error checks for things
> > > like timeouts, interrupted connections, and whatnot).
> > >
> >
> > A few caveats that might dissuade you from sending the raw data:
> >
> > The raw data can be in multiple formats, depending on the source
> > module and fields in use. OSIS, TEI, ThML, RTF, and GBF are the most
> > common formats. Meanwhile there are also a few major output formats
> > that might be requested. HTML, RTF, plain text are the most common.
> > But I think there might also be support for TeX? I'm not 100% sure.
> > All of that metadata is stored in the module configuration file and
> > not at the verse level in the resulting file. Why force the client to
> > handle this parsing and consume the extra information when it could
> > just slurp up the requested format as rendered by the server. On top
> > of that the configuration file will have information on if there are
> > lemmas, footnotes, and other information all of which will inform how
> > to render a particular blob of text.
>
> The problem is that nothing but the raw data will work in the way I
> hope. What's Xiphos going to do with pre-rendered data? Or Bibletime?
> Or Ezra, or SWORDWeb, or Bishop, or AndBible? These all get raw info
> from the library and render it, using the render helpers in libsword
> to do so (if I remember correctly, it's been a while since I tried to
> write a SWORD client). If you give them prerendered data, they aren't
> going to know what to do with that. Again, my hope was for *existing
> SWORD clients* to be able to add support for use of remote repositories
> with as little effort as possible. I don't expect the developers of any
> of these apps to spend their valuable time adding support for an
> entirely new input format (prerendered JSON) to their apps. I suspect
> they'd be willing to spend thirty minutes or so to add network support
> using the approach I'm suggesting though.
>

Actually not really. Most of the client's simply ask the library, "Please
give me the HTML for this" or "Please give me the plaintext for this". They
don't get the raw data and then pass it through the filters themselves.
They can do that, but it's rare. More often they'll construct an SWMgr
object with target FOO output format and then just call the renderText
method on that object with no knowledge of what it came from or how it got
into the FOO format. So your idea can still completely work under the hood
of the SWMgr. It just wouldn't be reading the file from local storage but
instead fetching it remotely. It would slot in nicely with the existing
system and would require 0 modification from existing applications other
than a way to specify the source type. And they all already know how to
specify remote repositories.

Where it is rendered is a moot point there. I was just suggesting you push
that off to the server in order to have the simplest possible transport
format and the simplest client-side code. Your server could leverage the
existing filter mechanism of creating an SWMgr that asks for HTML output
and then it would just pump the output of renderText across the network.
Meanwhile the client renderText method would call the server, listen for
the results, and return the resulting rendered text. It's a much simpler
format than sending all of the metadata, populating it into an instance of
a module, then rendering it on the client side. Still completely possible,
but far more work than rendering server side.

>
> Again, this might be out of scope for SWORD. If it is, then sending
> prerendered data is probably fine since only custom clients will be
> able to use the server anyway. But that would undo the point of the
> project somewhat because then only custom clients will be able to use
> the server, meaning that the number of clients that would exist for it
> will probably be low. My hope was that one day someone would be able to
> just point Xiphos or whatever at, say, eBible.org, and have instant
> access to the whole library stored there. If the client and server have
> to be separate from SWORD, then there probably won't be that many
> clients, or users, and therefore no real reason to adopt the server,
> which would make the project interesting but not very useful.
>
> (This isn't me griping that "no one likes my great idea" for the
> record. I sent a message to the mailing list in the first place to see
> if people liked the idea, so I didn't waste my time and theirs trying
> to do it without input.)
>

I think this is the real problem. The library strives for simplicity,
speed, and extraordinary portability of C/C++ code. This proposal is none
of those things. But, with the library the way it is, it would not at all
be difficult to write up the server portion. I just don't think you'd get
much traction from client apps, and I can't see writing network code into
the library for this purpose. As evidenced by how prolific the warning
about persecuted people being snooped on by outside actors is in our
tooling, this is yet another way that users might be exposed and would need
to be thought deeply about how to warn them before they engage in network
activity.

> > Much simpler to just tell the server, "Please send me HTML and strip
> > out footnotes" than to try and encode all of that, send it to the
> > client, and then render it there. Every round trip would essentially
> > need the fully parsed config file to travel with it in your proposed
> > raw form.
> >
> >
> > > > A simple REST library written in something like Go could easily be
> > > > linked to the libsword C library. It could query libsword to get
> > > > the list of modules and expose them, along with certain query
> > > > parameters specifying the format request. Then serve the
> > > > resulting text over HTTP. So a client library could hit something
> > > > like http://mylibrary.com/texts/KJV/Gen/1/1?format=html and it will
> get back
> > > > {"osisRef": "Gen.1.1", "text": "<p>In the beginning...</p>"}. You
> > > > wouldn't need to write some low level application protocol. You
> > > > would save the client device from needing to render the text and
> > > > have extra knowledge of the module. You wouldn't have to alter
> > > > the library in any fashion.
> > >
> > > This is similar to what I was thinking. I wasn't sure if JSON was
> > > the best wrapper to do it in, but I don't see any reason to use
> > > anything else, other than SWORD's apparent preference for XML-like
> > > formats. However my "text" field would probably look more like
> > > "text": "$$$Revelation of John 22:19\n<w lemma="strong:G2532
> > > lemma.TR:και" morph="robinson:CONJ" src="1">And</w>..." or some
> > > such (this is what mod2imp spit out when I used it to get an
> > > example).
> > >
> >
> > There are two difficulties with this as I can see. The first I
> > mentioned in the paragraph above. You and I can look at this and
> > recognize it as OSIS, but that information is only encoded in the
> > configuration file. Likewise, telling the renderer that lemmas and
> > morphology are supported is encoded there. Why not just allow the
> > library to handle that?
> >
> > Secondly, the library has no knowledge of the import format.
> > Knolwedge of parsing that block from an input formatted file is known
> > only in the imp2mod utility. That utility is not linked into the
> > library. It is frequently distributed with the library, but it is not
> > an inherent part of it. Similarly the mod2imp utility is not a
> > portion of the library. Those two applications handle the generating
> > and parsing of the input format, and no attempt is made by either one
> > to preserve the full round-trip functionality of the underlying
> > encoding format. You would need to re-implement parsing of that
> > format into your library and you would lose the extra data that is
> > not held internal to the passage requested.
>
> hmm, good point. Still, there's *some* sort of "raw" data you can get
> out of a module, and it's that data that you then put through a
> "filter" or some such, right? And if a JSON blob is being sent anyway,
> the server may as well send metadata about a module to a client that
> requests it, so it knows how to process the info.
>

Oh, certainly. You can get the raw data out without much issue. The mod2imp
tool does exactly that. Spitting out whatever raw data is contained in each
entry, with no output filters applied. That's not the difficulty. It's just
much less effort to let the server do that processing rather than the
client since it's all invisible to most client applications anyway.

> > I don't see any benefit to transmitting the module in this way when
> > you could just ask the server for the rendered text. Especially if
> > you are looking at this from the type of limited device where such a
> > feature would be helpful, you might as well offload as much of the
> > processing work as possible to the server. Just ask it for the format
> > you want to render, and consume what it sends you.
> >
> >
> > > > A simple application like this could be written up, distributed
> > > > in a static binary, and anyone would be able to hit it for a REST
> > > > accessed, rendered format of a given text. Going back to the goal
> > > > of simplicity: this application could be run by anyone on any
> > > > computer where a SWORD library already existed, and it could
> > > > serve the baseline of those peoples' needs.
> > > >
> > > > That's just an idea I've had bouncing around in my head for a long
> > > > time. I just have no need to access the scripture over REST or I
> > > > would have already written it. All the bits are already out there.
> > > > There are lots of good REST frameworks, every language with them
> > > > has the ability to encode JSON, and most of the popular ones we
> > > > have bindings for the language in (Python, PHP, Java) or it can
> > > > easily be integrated directly (CGO).
> > >
> > > This is a really good idea, and if this is going beyond what
> > > libsword is designed for, that's probably the route I'll take. I
> > > have a preference for C# for development tasks along these lines so
> > > I'll probably try to resurrect that first (the actual VS solution in
> > > libsword is *old* but the SWIG code should be up-to-date, so I don't
> > > imagine it will be *too* hard to get it going again - failing that,
> > > C++/Qt is probably my next choice though Qt is a bit of a strange
> > > choice for a server application). Then I'll probably implement
> > > something more-or-less like what you're mentioning here. Might not
> > > catch on, but if nothing else it will be interesting.
> > >
> >
> > I don't know about C# bindings for libsword. And I'm not completely
> > sure the VS build files ever worked. Maybe they did, but if so it was
> > a long time ago. When I have compiled for Windows I used the MinGW
> > tools from Linux to just cross-compile the library and tools. I also
> > have written a grand total of about 20 lines of C# in my life, so I
> > can't really give you any specifics on leveraging it. I know it at
> > least used to be very well regarded in web services, but it has been
> > a long time since I have seen greenfield web work written in it. But
> > I have not kept my finger on the pulse of that.
> >
> > I don't know that Qt is going to give you very good Web Server
> > support. It's a great utility library but that's not really its goal.
> > Their documentation page on their web server includes this warning:
> > "Qt HTTP Server does not have many of the more advanced features and
> > optimizations that general-purpose HTTP servers have. It also has not
> > seen the same scrutiny regarding various attack vectors over the
> > network. Use Qt HTTP Server, therefore, only for local connections or
> > in a trusted network, and do not expose the ports to the internet."
>
> Didn't realize that about Qt, thank you for letting me know.
>
> > I've been in professional web dev at various parts of my career. For
> > new work I would only tackle either Python, NodeJS, or Golang right
> > now. Python we have Swig support for, NodeJS we don't really have
> > support for in the library but there is an existing application
> > called Ezra which is written in NodeJS and probably has enough
> > support for the needs here. And of course Golang (and Rust) can
> > easily tap into an existing C library with no real effort required.
>
> I hate both Python and Golang :P and I'm not a big fan of NodeJS
> either. Rust I have no experience with, though it looks interesting. I
> personally would rather work in C++ (as odd as that may sound).
>

C++ doesn't sound odd at all. It's not common in web programming to
leverage it directly, but it is neither odd nor unheard of.

--Greg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240729/edf97796/attachment-0001.htm>