[sword-devel] Introducing the Bible Scraper

Matěj Cepl mcepl at cepl.eu
Sat Nov 2 14:17:21 EDT 2024


On Sun Jun 2, 2024 at 11:46 AM CEST, Arnaud Vié wrote:
> Thanks a lot !
> I've just pushed a scraper configuration for this bible :
> https://github.com/UnasZole/bible-scraper/blob/master/src/main/resources/scrapers/GenericHtml/KralickaWikisource.yaml
> Main books were easy to parse - deuterocanonical books extracted from a
> different manuscript were a bit messier.
> I made a few assumptions (I interpret italics in verse as translation
> additions, and side notes in deuterocanonical books as section titles, etc.)
> Feel free to test it : after checking out and building the repository, you
> should just need to run for example:
>
>> ./run.sh scrape -s GenericHtml -i KralickaWikisource -b Ps -c 1 -w USFM

Comparing Genesis and Ruth, it seems that it is perfect, to
be honest, even much better than my scripts. I haven’t tested
deuterocanonical books yet, because I hadn’t them before at
all. I will build new version of the Crosswire module and let you
know what I find.

Thank you very much, it looks awesome!

Blessings,

Matěj

-- 
http://matej.ceplovi.cz/blog/, @mcepl at floss.social
GPG Finger: 3C76 A027 CA45 AD70 98B5  BC1D 7920 5802 880B C9D8
 
All men's miseries derive from not being able to sit in a quiet
room alone.
  -- Blaise Pascal

-------------- next part --------------
A non-text attachment was scrubbed...
Name: E09FEF25D96484AC.asc
Type: application/pgp-keys
Size: 3102 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20241102/7a5aed57/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 216 bytes
Desc: not available
URL: <http://crosswire.org/pipermail/sword-devel/attachments/20241102/7a5aed57/attachment.sig>


More information about the sword-devel mailing list