[sword-devel] Wanted : Sword utility to convert MarkDown to GenBook

Greg Hellings greg.hellings at gmail.com
Thu Feb 21 12:14:09 MST 2019


On Thu, Feb 21, 2019 at 12:55 PM David Haslam <dfhdfh at protonmail.com> wrote:

> Ah - but though MarkDown is largely used for presentational purposes,
> there’s a sense in which it is almost as well structured as (e.g.) USFM.
> It’s just that the structures are more general than particular to Bibles.
>

Comparison to USFM is completely irrelevant. Also, claiming that MD is
structured is almost laughable. It has syntax, but that syntax is very
loose and requires no sense of hierarchy at all in the way it's being used
in the current example document.


> Take headings, e.g.
>
> # is equivalent to \s1
> ## is equivalent to \s2
> ### is equivalent to \s3
>

Yes, but those markers are not used semantically. They're not even used
hierarchically in the linked text. Just look at the title card which does
from # to ### to ## to ### again. The ToC and Introduction are set off with
# alone. But subsequent chapters are set off with ## for their numbering, #
for their title, and ### for what I assume is the date they were written
in/concerning.

Markdown is not a hierarchical language, and it doesn't even pretend to
impose or encourage stuff. # is not equivalent to a section heading. It's
equivalent to <h1> in HTML. Interestingly, there are exactly 6 levels of
heading defined in both HTML and Markdown. This isnt a coincidence -
Markdown was created so that it could be readily converted into HTML.
Although it's possible to convert it into anything you want (especially
anything presentation-focused), it was created specifically to mimic and
follow HTML's formatting. Not structure, etc, which is why md doesn't have
a hierarchy. It doesn't even allow you to define header and metadata the
way HTML does.


> Likewise some of the other MarkDown features.
>
> It’s much easier to edit than HTML.
>

Agreed, but not important for creating a converter.

In fact, that’s the whole raison d’être.
>
> IMHO, it ought to be feasible to go straight from an MD file to one of the
> input formats we can use with a Sword utility for module build.
>

Yes, ThML


> Using HTML as an intermediate stage seems rather counterproductive to me.
>

The goal isn't to use HTML as an intermediate stage. It's to use it as the
final destination. ThML is, essentially, a superset of HTML. Rendering the
md into HTML using well-known libraries and spitting out the result into an
IMP file, where each IMP section takes its name from the input file that
generated it from your repo would result in exactly what you're looking
for, I imagine.

Anything more complicated than that would require the converter be modified
for every input document to understand its own particular semnatics
regarding header definitions (there are two styles, btw, of header
definition) and more, because Markdown is a Wild West of presentation-only
markup.

--Greg

>
> David
>
> Sent from ProtonMail Mobile
>
>
> On Wed, Feb 20, 2019 at 17:30, Greg Hellings <greg.hellings at gmail.com>
> wrote:
>
> As with most such proposals, going from a presentational markup to a
> semantic markup is neither straightforward, nor guaranteed any measure of
> particular success. Even just looking at the first few documents at the URL
> you've provided, the automation into OSIS would be non-trivial. The first
> two documents are completely bespoke, indicating author, title, table of
> contents, etc using completely arbitrary presentational markup with no
> inherent semantic meaning that I can discern.
>
> The individual chapters and so forth would be relatively straightforward,
> even though the markup is, again, completely arbitrary. As such, any
> conversion utility written for this would be completely bespoke to the
> particular documents being translated.
>
> Further, I could very much be mistaken but I thought that GenBooks were
> not available from OSIS. I thought OSIS was only used for
> Scripture/Commentary markup that follows a chapter:verse arrangement. I
> thought we were stuck using IMP format with raw, ThML, or RTF input for
> formatting. If so, conversion from these files into a format like that
> would be very straightforward. MD->HTML renderers already exist, and one
> would just need to iterate the files, read their content into such a
> renderer, and output the resulting IMP format file.
>
> --Greg
>
> On Wed, Feb 20, 2019 at 4:59 AM David Haslam <dfhdfh at protonmail.com>
> wrote:
>
>> Here’s an example of a source text in MarkDown format.
>>
>> https://github.com/DavidHaslam/From-Death-Into-Life
>>
>> FROM DEATH INTO LIFE
>> by William Haslam (1818-1905)
>>
>> [no relation]
>>
>> If we could readily convert this to OSIS by a suitable script, then we
>> could distribute the work as a GenBook module.
>>
>> It’s just one sample of stuff I was involved with earlier this century
>> well before I became a CrossWire volunteer.
>>
>> Who would like to volunteer?
>>
>> Best regards.
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>>
>> On Sat, Feb 16, 2019 at 17:22, David Haslam <dfhdfh at protonmail.com>
>> wrote:
>>
>> We could probably vastly increase our available resources if someone were
>> able to develop a Sword utility to convert MarkDown file[s] to a GenBook
>> module.
>>
>> Anyone up to this?
>>
>> Best regards
>>
>> David
>>
>> Sent from ProtonMail Mobile
>>
>>
>>
>> _______________________________________________
>> sword-devel mailing list: sword-devel at crosswire.org
>> http://www.crosswire.org/mailman/listinfo/sword-devel
>> Instructions to unsubscribe/change your settings at above page
>
>
>
> _______________________________________________
> sword-devel mailing list: sword-devel at crosswire.org
> http://www.crosswire.org/mailman/listinfo/sword-devel
> Instructions to unsubscribe/change your settings at above page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190221/2e0b0e98/attachment.html>


More information about the sword-devel mailing list