[sword-devel] RTF in conf files
Jaak Ristioja
jaak at ristioja.ee
Thu Apr 25 20:33:42 EDT 2024
When I tried to write a similar parser some years ago (or rewrite the
libsword parser(s) in Sword++), I discovered to my dismay that the wiki
page is quite insufficient. The lack of a formal specification for the
configuration format leads to various serious ambiguities or questions
when wanting to write a parser. Some examples:
* How should different parsing errors be handled?
* What are the phases for parsing? Should the output of each phase be
a single string, or a list of strings parsed separately by next phases
(e.g. lines in case of continuations)?
* Should continuations be handled in a phase before or after parsing
RTF? How should "\\\\\n\n" be parsed?
* How to include a literal backslash? If escaped, in which phase of
parsing?
* Should official Microsoft RTF syntax rules be used for RTF control
word tokenization and semantics? Which version(s) of RTF exactly? The
rules on the Crosswire wiki page might differ from RTF specs.
* The wiki page states that "using the actual UTF-8 character is
preferred" to RTF "\u" escapes, but the RTF syntax only allows 7-bit
ASCII characters. Does this mean that all UTF-8 characters should be
converted to "\u"-style RTF escapes before handing off to the RTF
parser? Since the "\u" escapes can only handle code points U+0000 to
U+FFFF, how should other UTF-8 code points beyond U+FFFF be handled?
The original libsword implementation also seemed to suffer from various
issues and was not of much help to me, thus I eventually ended up
abandoning this effort.
J
On 16.04.24 10:20, domcox wrote:
>
> Only a very small, restricted subset of RTF markup is supported, see:
> https://wiki.crosswire.org/DevTools:conf_Files#RTF
>
>
> "David \"Judah's Shadow\" Blue" <yudahsshadow at gmx.com> writes:
>
>> I'm working on an info command to display some basic info about
>> modules, and I
>> ran into the fact that, at least in the About entry, the conf file can
>> contain
>> RTF formatting. As it stands I strip out \pard, replace \par with \n, and
>> strip out the tag portion of any anchor/link tags found. My question
>> is, are
>> there any other tags that are likely to appear in conf entries that I
>> should
>> be either handling or stripping (since my front end does no formatting
>> of text
>> whatsoever)?
>
>
More information about the sword-devel
mailing list