[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?
Jaak Ristioja
jaak at ristioja.ee
Sun Jul 14 14:08:08 EDT 2024
Hello,
+1, however this is not a small feat. Having also considered this, I
would like to share some toughts on this topic which I hope you find useful.
As far as I understand libsword, it tries to support both FTP and
HTTP(S) repositories.
* Libsword seems to include a hand-written parser to parse the
non-standardized FTP directory listings in order to figure out the
modules present on the remote repository.
* Similarly for HTTP(S), libsword expects to the web server to
provide (Apache HTTPD style?) HTML directory indexes, for which it seems
to include an overly-simplistic hand-written parser.
Reliance on these non-standardized server-specific index files/directory
listings is very fragile, as slight deviations of server output might
cause the respective parsing in libsword to be unreliable. The quality
of these (and other[*]) hand-written parsers in libsword is
questionable, and I would not be suprised to find in it bugs which put
users in danger. ;(
Cryptographic signing of Sword modules and/or repository index files
would only marginally alleviate the situation while also introducing
biggers problems such as public key distribution and secure handling of
private keys. This might still be a good optional feature in some later
design, but more important things first...
Another problem is that a single Sword modules consist of multiple
files: the configuration file and one or more files with the actual
content or content indexes (e.g. old testament content, old testament
content index, new testament content, new testament content index).
These are distributed in different repository directories and require
multiple client requests to download. The module file and directory
names do not contain a version identifier, nor is there any checksumming
between the files. So when a server updates a module when a client is in
the middle of downloading these files, this might cause the client to
download files pertaining to different versions of the module or
download partially uploaded files, leading to all kinds of nasty
problems. Proper versioning in filenames and checksumming could help
alleviate this.
It might be a blocker that libsword does not support having multiple
versions of a single module installed.
It might be a blocker that libsword does not have a namespacing scheme
for modules e.g. there can only be one module named "KJV" and it might
be problematic if two repositories (vendors) provide their own different
"KJV" modules. And it would probably be a bad idea to try reserve the
use of identifiers like "KJV" to specific vendors e.g. by using some
kind of registry.
Another obstacle to defining a new repository format/protocol is that
there is no complete and sound formal specification for the module
configuration file format and its fields. The descriptions in the SWORD
wiki are incomplete and contain ambiguity.
While perhaps not strictly be a blocker to creating a new repository
format/protocol, but there are no formal specifications for the module
content and content index files. I remember these formats having being
described as internal libsword details which don't require
specification, because the format and libsword might change. However, I
think this reasoning is incorrect, because files of these formats are
exchanged over the wire, used in multiple repositories not all which are
managed by Crosswire, and libsword wants to retain backwards
compatibility with older modules as well.
In my opinion the repository format should not much depend on the
underlying transport protocol (HTTP(S), FTP, local filesystem) and
should not require special handling on the server side. For HTTP this
means that all repository files may be served statically on a regular
web server without requiring extra server-side scripting. Just files and
directories, no parsing of directory indexes, only retrieval of regular
files by their path.
In the most simple case, the client would retrieve the (root) index file
from a fixed location in the repository (e.g. using HTTP GET), parse it,
and proceed to download selected modules, where each module version is a
single archive file in the repository. Various specific repository
(directory) layouts are possible. Since SWORD repositories are
relatively small it might probably suffice for only one (root) index
file which would contain all necessary metadata from all the module
archives in the repository. I recommend JSON to be used for index files
(for interoperability), and an extensible versioned JSON schema to be
defined.
Best regards,
Jaak
[*] Rewriting just the repository logic would not prevent other libsword
parser bugs from being exploited.
On 13.07.24 07:30, Aaron Rainbolt wrote:
> As it stands, SWORD users face some disadvantages when accessing SWORD
> resources - they have to be downloaded in their entirety, installed
> onto the end user's system, and then stay there for as long as the
> user wishes to access them. While it is possible and even easy to copy
> modules from one system to another in theory, the system that views
> the modules must still *have* the modules in order to view them.
>
> Since SWORD already is basically a universal Bible-related data access
> system, it seems to me like it could be useful to take the concept one
> step further - allowing access to SWORD modules over the network,
> where a viewing device must only request the part of a module it wants
> to view, and simply discards it when it's done with it.
>
> Some advantages of this over what SWORD already does:
>
> * The device used for viewing no longer has to be the device used for
> storing the modules. Individuals can stand up a "SWORD server" and
> then access the modules from any network-capable SWORD client on any
> device.
> * The device used for module storage can be located on the Internet,
> allowing individuals to access a potentially large library of modules
> without installation.
> * Assuming a properly secured, encrypted connection can be established
> and the server is not obvious as a SWORD server, individuals in
> persecuted countries could potentially access SWORD modules over the
> Internet, allowing them to access the Bible without leaving a trace on
> their devices.
> * Organizations with permission to redistribute copyrighted texts
> could provide those texts via a SWORD server, allowing them to be
> accessed by network-capable SWORD clients (i.e., this could
> potentially allow people to legally access texts such as the ESV and
> NIV in their favorite SWORD client rather than being forced to resort
> to clunky websites, proprietary software, or piracy). Server-side,
> open-source DRM measures could be enforced to make downloading entire
> modules for offline use more difficult, providing some level of
> peace-of-mind to copyright owners.
>
> Advantages this would have over existing "access the Bible online" solutions:
>
> * It would provide a standardized interface for accessing the Bible
> and Bible-related resources over the Internet, rather than every
> project coming up with its own storage conventions and network
> protocols.
> * People could theoretically use almost any SWORD client to access the
> modules, allowing access to the Bible using a native desktop or mobile
> application, rather than having to resort to a web browser, clunky
> "cross-platform" (read: doesn't work quite right anywhere) app, or a
> tracker-laden mess like YouVersion.
> * Since the SWORD server itself would essentially be a SWORD client
> that provided access to its modules over the network, one SWORD server
> could daisy-chain to another one, thus acting as a proxy. This way
> small, non-suspicious websites could provide access to major SWORD
> servers via a proxy, making it easier to help individuals in
> persecuted situations to access the Bible.
> * Given the above proxying mechanism, blocking access to SWORD servers
> could become very difficult, for much the same reasons why blocking
> access to Matrix chat is very difficult. (In a world with unlimited
> time and development resources, a full federation system could be
> implemented so that anyone could access any module that anyone else
> hosted... but that's almost without question overkill and impractical.
> Just proxying would be cheap to implement and powerful in use.)
> * Anyone could self-host a SWORD server and provide themselves, their
> family, their community, or even the world easier access to the Bible
> and Bible-related resources.
> * Advanced features like fast Lucene search could be provided
> server-side, giving a much faster search experience than almost any
> modern Web-based Bible application I've used.
> * If used alongside a feature like BibleSync, it could be a powerful
> tool for churches and Bible studies to use. People could simply
> connect to the church's SWORD server and enable BibleSync, then be
> able to follow along perfectly with everyone else, with access to the
> same resources that their pastor, study leader, etc. is actively
> using. No prep work needed (beyond having the proper app installed and
> knowing how to point it to the right server).
>
> In the event this idea is actually worth pursuing, it seems to me like
> there would be four things needed to make it a reality.
>
> * A SWORD network protocol specification. This would probably be the
> hardest thing to get right since it has to be gotten right the first
> time and then only incrementally updated in the future in a
> backwards-compatible manner for best results.
> * An actual SWORD server implementation. Once the specification
> exists, writing this should theoretically be easy.
> * Server access support in the SWORD library itself. This would enable
> existing SWORD clients to adopt network support with little effort.
> * Adoption of the new feature by SWORD frontends. This of course is up
> to (and at the discretion of) each SWORD frontend developer, but if
> the SWORD library made accessing network resources act almost
> identically to accessing local resources, it would hopefully be easy
> to take advantage of the feature, and thus it would hopefully gain
> traction.
>
> So there's my brain-dump of all the reasons I think this is worth
> doing and how I think it should be done :P Let me know what you think
> and if you have any advice or feedback. This whole thing popped into
> my head tonight and I just wanted to share it to see if it's worth
> pursuing, or if maybe something similar to this was tried already in
> the past.
>
> Thanks for reading my wall of text. God bless.
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
More information about the sword-devel
mailing list