<div dir="ltr"><div>Sorry Cyrille, I'll keep the repository in my Github personal account for the time being.</div><div><br></div><div>The main reason is that the scraper is still evolving in a legal grey area, by allowing people to save and convert copyrighted contents - since I intend to provide parser configuration yaml files for as many websites as I can, to eventually make more and more bibles usable in AndBible and the sword ecosystem.<br></div><div>I've done enough research to be confident I'm safe as per French law, but as I integrate parsers for more bibles from websites in other countries, there might be complaints. If that happens, it's much better if such complaints target me alone as it's my personal project, and do not affect CrossWire as a whole - especially since CrossWire does not really operate under French jurisdiction and thus might not be as protected as I am.<br></div><div>As Donna said, it's perfectly fine if you want to keep a fork elsewhere, but I'd suggest making it private, not publicly affiliated with CrossWire.<br></div><div><br></div><div>In addition to that, we had a discussion on the Github vs Gitlab topic a few months ago (cf. <a href="http://crosswire.org/pipermail/sword-devel/2024-February/049943.html">http://crosswire.org/pipermail/sword-devel/2024-February/049943.html</a> ), and I still believe that having some lively OSIS and Sword related projects on Github will improve the visibility of the Sword ecosystem to attract new developers in the long run, more so than Gitlab.</div><div><br></div><div>(On that topic, my proposal to take over and rejuvenate the GitHub crosswire project, specifically the jsword repo, and adding a new repo for the OSIS specification, still stands.)</div><div><br></div><div>Cheers,</div><div><br></div><div>Anraud<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le dim. 2 juin 2024 à 16:33, Fr Cyrille <<a href="mailto:fr.cyrille@tiberiade.be">fr.cyrille@tiberiade.be</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><u></u>
<div>
Hi Arnaud,<br>
What do you think to move bible-scraper from github repo to our
gitlab repo? I did this but not with the last commits. I make you
dev on it. <a href="https://gitlab.com/crosswire-bible-society/bible-scraper" target="_blank">https://gitlab.com/crosswire-bible-society/bible-scraper</a><br>
<br>
<div>Le 02/06/2024 à 11:46, Arnaud Vié a
écrit :<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div>Thank you both for your interest !</div>
<div><br>
</div>
<div>> What about commentary?<br>
<div>> <a href="https://www.awmi.net/reading/online-bible-commentary/" target="_blank">https://www.awmi.net/reading/online-bible-commentary/</a></div>
</div>
<div><br>
</div>
<div>Not yet, I'm really focusing on bibles for the time being -
that's a lot of work already !<br>
</div>
<div>But nothing prevents adapting the solution to commentaries
in the future, I'll keep that idea in mind :-)</div>
<div><br>
</div>
<div>> If you want to use CzeBKR as your test case, I am
ready to help<br>
> you with any testing or Czech issues or whatever </div>
<div><br>
</div>
<div>Thanks a lot !</div>
<div>I've just pushed a scraper configuration for this bible : <a href="https://github.com/UnasZole/bible-scraper/blob/master/src/main/resources/scrapers/GenericHtml/KralickaWikisource.yaml" target="_blank">https://github.com/UnasZole/bible-scraper/blob/master/src/main/resources/scrapers/GenericHtml/KralickaWikisource.yaml</a></div>
<div>Main books were easy to parse - deuterocanonical books
extracted from a different manuscript were a bit messier.</div>
<div>I made a few assumptions (I interpret italics in verse as
translation additions, and side notes in deuterocanonical
books as section titles, etc.)</div>
<div>Feel free to test it : after checking out and building the
repository, you should just need to run for example:</div>
<div><br>
</div>
<div>> ./run.sh scrape -s GenericHtml -i KralickaWikisource
-b Ps -c 1 -w USFM</div>
<div><br>
</div>
<div>Cheers,</div>
<div><br>
</div>
<div>Arnaud<br>
</div>
<div><br>
</div>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Le dim. 2 juin 2024 à 08:50,
Matěj Cepl <<a href="mailto:mcepl@cepl.eu" target="_blank">mcepl@cepl.eu</a>>
a écrit :<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On
Sun Jun 2, 2024 at 1:09 AM CEST, Arnaud Vié wrote:<br>
> I'm open to any kind of feedback or suggestions of
course !<br>
> In particular :<br>
><br>
> - if you have any specific website in mind that you
would like to be<br>
> able to build sword modules from, let me know, we
can try to add it.<br>
> (Currently I only included a few French websites,
but I'm interested to add<br>
> some other languages).<br>
<br>
Sword module CzeBKR is sourced from the Czech WikiSource [1]<br>
and there seems to be the official way [2] how to get source<br>
in some hopefully more useful formats (plain text, RTF,
HTML,<br>
EPubs). I was using my own home-grown Python script [3], but
it<br>
seems like with all web-scrapping scripts it rotten away
(that<br>
script is under some of kind of very free open source
license,<br>
let’s say MIT/X11 … I am going to add the proper LICENSE
file<br>
momentarily). It started at [4] (look at the source view),
but it<br>
doesn’t seem to be that useful anymore.<br>
<br>
> - And if you are knowledgeable about the
intellectual property laws in<br>
> other countries, I'm interested : currently, I've
added a section to the<br>
> README explaining why the usage of the scraper on
any public website is<br>
> allowed in France with references to the related
texts, but it would<br>
> probably be useful to have similar information for
users from other<br>
> countries.<br>
<br>
I am absolutely certain, there are no problems with CzeBKR:<br>
<br>
1. It is WikiSource, so we have somebody else to blame
;)<br>
2. The original Bible of Kralice [5] is from the
sixteenth<br>
century and it is absolutely in the public domain.<br>
3. Source for the WikiSource was a scan [6] of the book<br>
from 1918, without any authors shown. The works of
only<br>
possible editor of that Bible I know about [7] (and
he is<br>
not shown on the title page, but he was working in
the<br>
early 20th century with the International Bible
Society on<br>
the revision of the Bible) are under the Bern
Convention<br>
(death in 1929 + 75 years) in the public domain as
well.<br>
4. We are in EU as well.<br>
<br>
If you want to use CzeBKR as your test case, I am ready to
help<br>
you with any testing or Czech issues or whatever.<br>
<br>
Blessed Sunday!<br>
<br>
Matěj<br>
<br>
[1] <a href="https://cs.wikisource.org/wiki/Bible_kralick%C3%A1_(1918)" rel="noreferrer" target="_blank">https://cs.wikisource.org/wiki/Bible_kralick%C3%A1_(1918)</a>
<br>
[2] <a href="https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29" rel="noreferrer" target="_blank">https://ws-export.wmcloud.org/?lang=cs&title=Bible_kralick%C3%A1_%281918%29</a><br>
[3] <a href="https://gitlab.com/crosswire-bible-society/CzeBKR/-/blob/master/kralicka.py" rel="noreferrer" target="_blank">https://gitlab.com/crosswire-bible-society/CzeBKR/-/blob/master/kralicka.py</a><br>
[4] <a href="https://cs.wikisource.org/wiki/Speci%C3%A1ln%C3%AD:Exportovat_str%C3%A1nky/Bible_kralick%C3%A1_(1918)" rel="noreferrer" target="_blank">https://cs.wikisource.org/wiki/Speci%C3%A1ln%C3%AD:Exportovat_str%C3%A1nky/Bible_kralick%C3%A1_(1918)</a><br>
[5] <a href="https://en.wikipedia.org/wiki/Bible_of_Kralice" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Bible_of_Kralice</a><br>
[6] <a href="http://archive.org/details/biblsvatanebvec00socigoog" rel="noreferrer" target="_blank">http://archive.org/details/biblsvatanebvec00socigoog</a><br>
[7] <a href="https://cs.wikipedia.org/wiki/Jan_Karafi%C3%A1t" rel="noreferrer" target="_blank">https://cs.wikipedia.org/wiki/Jan_Karafi%C3%A1t</a><br>
-- <br>
<a href="http://matej.ceplovi.cz/blog/" rel="noreferrer" target="_blank">http://matej.ceplovi.cz/blog/</a>,
@mcepl@floss.social<br>
GPG Finger: 3C76 A027 CA45 AD70 98B5 BC1D 7920 5802 880B
C9D8<br>
<br>
The ratio of literacy to illiteracy is a constant, but
nowadays<br>
the illiterates can read.<br>
-- Alberto Moravia<br>
<br>
_______________________________________________<br>
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a><br>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" rel="noreferrer" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a><br>
Instructions to unsubscribe/change your settings at above
page<br>
</blockquote>
</div>
</div>
<br>
<fieldset></fieldset>
<pre>_______________________________________________
sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org" target="_blank">sword-devel@crosswire.org</a>
<a href="http://crosswire.org/mailman/listinfo/sword-devel" target="_blank">http://crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
<br>
<div>-- <br>
Vous aimez la Bible ? Vous êtes étudiant en théologie ? Utilisez
l'application libre <a href="https://xiphos.org/" target="_blank">Xiphos</a> ou <a href="https://andbible.github.io/" target="_blank">Andbible</a> et accédez aux
textes sources, à des commentaires, des dictionnaires et beaucoup
d'autres fonctionnalités... Me contacter pour des traductions en
français.</div>
</div>
</blockquote></div>