[sword-devel] Introducing the Bible Scraper
Arnaud Vié
unas.zole+avie at gmail.com
Sat Jun 1 19:09:14 EDT 2024
Hello all,
Cyrille already teased it in some of his previous mails on this list, but
I've been working for several months on a tool to scrape bibles from any
web page into a standard format (OSIS and USFM outputs are supported) : the
Bible Scraper.
It mostly serves two purposes :
- *Help converting "loosely formatted" bibles, such as bibles
transcribed from facsimiles on wikisource, to a standard semantic format.*
These bibles usually have some light formatting that aims at replicating
the visual appearance of the original document, but without a strong
semantic markup. With proper configuration, the scraper can convert those
to a fully formed OSIS or USFM document, as long as the formatting is
consistent throughout the bible.
This is the usage Cyrille has been experimenting a lot recently, and
with which we have been achieving promising results.
- *Allow individual users to convert bibles, which are freely available
on the web but which we don't have the rights to redistribute, into sword
modules for their personal usage*.
This relies on the right to personal copy, which is quite strongly
upheld in French law (and probably most other european countries, as there
are texts on the topic from the CJEU as well) : as long as a user has
legitimate access to the contents he wishes to copy, he is allowed to
download and process it for personal use. Since the scraper is just
software that any user can run on his own machine, there is no intermediate
that could be accused of illegitimate "redistribution" in any form.
In its current state, the tool is still mostly targeted at developers (I
don't yet publish a downloadable artifact, so interested users have to
clone the git repo, and run a maven build), but it's becoming mature enough
to be shared with those who want to have a look :
https://github.com/UnasZole/bible-scraper
I'm open to any kind of feedback or suggestions of course !
In particular :
- if you have any specific website in mind that you would like to be
able to build sword modules from, let me know, we can try to add it.
(Currently I only included a few French websites, but I'm interested to add
some other languages).
- And if you are knowledgeable about the intellectual property laws in
other countries, I'm interested : currently, I've added a section to the
README explaining why the usage of the scraper on any public website is
allowed in France with references to the related texts, but it would
probably be useful to have similar information for users from other
countries.
Thanks all and best regards,
Arnaud
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240602/a9bf576d/attachment.htm>
More information about the sword-devel
mailing list