[sword-devel] Introducing the Bible Scraper

Matěj Cepl mcepl at cepl.eu
Sun Jun 2 02:49:44 EDT 2024


On Sun Jun 2, 2024 at 1:09 AM CEST, Arnaud Vié wrote:
> I'm open to any kind of feedback or suggestions of course !
> In particular :
>
>    - if you have any specific website in mind that you would like to be
>    able to build sword modules from, let me know, we can try to add it.
>    (Currently I only included a few French websites, but I'm interested to add
>    some other languages).

Sword module CzeBKR is sourced from the Czech WikiSource [1]
and there seems to be the official way [2] how to get source
in some hopefully more useful formats (plain text, RTF, HTML,
EPubs). I was using my own home-grown Python script [3], but it
seems like with all web-scrapping scripts it rotten away (that
script is under some of kind of very free open source license,
let’s say MIT/X11 … I am going to add the proper LICENSE file
momentarily). It started at [4] (look at the source view), but it
doesn’t seem to be that useful anymore.

>    - And if you are knowledgeable about the intellectual property laws in
>    other countries, I'm interested : currently, I've added a section to the
>    README explaining why the usage of the scraper on any public website is
>    allowed in France with references to the related texts, but it would
>    probably be useful to have similar information for users from other
>    countries.

I am absolutely certain, there are no problems with CzeBKR:

    1. It is WikiSource, so we have somebody else to blame ;)
    2. The original Bible of Kralice [5] is from the sixteenth
       century and it is absolutely in the public domain.
    3. Source for the WikiSource was a scan [6] of the book
       from 1918, without any authors shown. The works of only
       possible editor of that Bible I know about [7] (and he is
       not shown on the title page, but he was working in the
       early 20th century with the International Bible Society on
       the revision of the Bible) are under the Bern Convention
       (death in 1929 + 75 years) in the public domain as well.
    4. We are in EU as well.

If you want to use CzeBKR as your test case, I am ready to help
you with any testing or Czech issues or whatever.

Blessed Sunday!

Matěj

[1] https://cs.wikisource.org/wiki/Bible_kralick%C3%A1_(1918) 
[2] https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29
[3] https://gitlab.com/crosswire-bible-society/CzeBKR/-/blob/master/kralicka.py
[4] https://cs.wikisource.org/wiki/Speci%C3%A1ln%C3%AD:Exportovat_str%C3%A1nky/Bible_kralick%C3%A1_(1918)
[5] https://en.wikipedia.org/wiki/Bible_of_Kralice
[6] http://archive.org/details/biblsvatanebvec00socigoog
[7] https://cs.wikipedia.org/wiki/Jan_Karafi%C3%A1t
-- 
http://matej.ceplovi.cz/blog/, @mcepl at floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5  BC1D 7920 5802 880B C9D8
 
The ratio of literacy to illiteracy is a constant, but nowadays
the illiterates can read.
    -- Alberto Moravia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: E09FEF25D96484AC.asc
Type: application/pgp-keys
Size: 3102 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240602/1288add8/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 216 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20240602/1288add8/attachment.sig>


More information about the sword-devel mailing list