[sword-devel] Creating a "SWORD-over-network" protocol for remote SWORD repo access?

Sat Aug 3 18:30:09 EDT 2024

On 03.08.24 22:32, Aaron Rainbolt wrote:
> On Sat, Aug 3, 2024 at 5:30 AM Jaak Ristioja <jaak at ristioja.ee> wrote:
>>
>> On 29.07.24 11:10, Aaron Rainbolt wrote:
>>> The idea is to make it so that *existing* SWORD clients can be able to
>>> access data on remote servers without downloading the whole thing. I
>>> laid out some reasons why this is helpful in certain use cases in my
>>> first email. Existing SWORD clients are meant to retrieve information
>>> from libsword and then render it in somme way, thus to maximize the
>>> possibility of adoption, my hope was to implement in libsword the
>>> ability to fetch "raw" data from a remote server and then pass it
>>> through to the client, which already has code for rendering it however
>>> the client chooses. Ideally a client should need to do nothing more
>>> than point an SWMgr object at the remote server and then use it exactly
>>> the same way it would use a local repository (perhaps with some extra
>>> error checks for things like timeouts, interrupted connections, and
>>> whatnot).
>>
>> In my opinion, this is not worth the effort. Is it really too much for
>> the client to download the whole module(s) to some temporary storage or RAM?
> 
> Depending on the module, yes. I've worked on Internet connections that
> were so slow that module downloads repeatedly timed out and failed. If
> I am using a SWORD client that has access to remote data, I don't want
> to have to wait thirty seconds to switch modules, and have some
> modules just outright refuse to work. I want to be able to work almost
> as fast and seamlessly as if I had the modules locally. Also for
> someone who used a lot of modules in their study, having these modules
> be temporarily downloaded on the fly could consume a lot of RAM or
> disk space (which may be a problem for users who are stuck with
> underpowered hardware), and they would have to be repeatedly
> downloaded every time they opened the client. For someone working with
> a modern laptop on gigabit WiFi, this might be comfortable, but for
> the person in (for instance) Africa working with a 32-bit Celeron on a
> dialup connection, this is not going to work.

However, it might be likely that a user reads/browses/searches through 
the whole module anyway. This might lead to whole modules being 
downloaded verse-by-verse, each download in a separate request to the 
server, with duplicate metadata in each response, whole modules possibly 
downloaded several times over unless there is sufficient caching. I 
suspect the suggested approach would actually require more bandwidth and 
would be overall much slower. It would also slow down frontends in the 
middle of their operations where they expect to be operating on more 
performant local storage without much delay.

There is also an issue of privacy, because the SWORD server might learn 
in too much detail who is reading what and when.

>> Also, how do you envision this to work with existing SWORD clients?
> 
> Ideally when making an SWMgr object, they could just pass a URL to the
> object pointing it to the remote repository, then use it as if it were
> local. Listing all modules would give you a list of all remote
> modules, fetching parts of a module would fetch them from a remote
> server, etc. For the client, it should be totally transparent except
> for the initial connection, and error handling in the event of a
> network issue.

This might be problematic, because it may cause confusion in existing 
SWORD front-ends which might not expect repositories they consider to be 
local, to actually be remote. For example, front-ends might attempt to 
try to write to the repository locations on the local filesystem.

Jaak