[sword-devel] RTF in conf files

Jaak Ristioja jaak at ristioja.ee
Thu Apr 25 20:33:42 EDT 2024


When I tried to write a similar parser some years ago (or rewrite the 
libsword parser(s) in Sword++), I discovered to my dismay that the wiki 
page is quite insufficient. The lack of a formal specification for the 
configuration format leads to various serious ambiguities or questions 
when wanting to write a parser. Some examples:

   * How should different parsing errors be handled?
   * What are the phases for parsing? Should the output of each phase be 
a single string, or a list of strings parsed separately by next phases 
(e.g. lines in case of continuations)?
   * Should continuations be handled in a phase before or after parsing 
RTF? How should "\\\\\n\n" be parsed?
   * How to include a literal backslash? If escaped, in which phase of 
parsing?
   * Should official Microsoft RTF syntax rules be used for RTF control 
word tokenization and semantics? Which version(s) of RTF exactly? The 
rules on the Crosswire wiki page might differ from RTF specs.
   * The wiki page states that "using the actual UTF-8 character is 
preferred" to RTF "\u" escapes, but the RTF syntax only allows 7-bit 
ASCII characters. Does this mean that all UTF-8 characters should be 
converted to "\u"-style RTF escapes before handing off to the RTF 
parser? Since the "\u" escapes can only handle code points U+0000 to 
U+FFFF, how should other UTF-8 code points beyond U+FFFF be handled?

The original libsword implementation also seemed to suffer from various 
issues and was not of much help to me, thus I eventually ended up 
abandoning this effort.

J

On 16.04.24 10:20, domcox wrote:
> 
> Only a very small, restricted subset of RTF markup is supported, see:
> https://wiki.crosswire.org/DevTools:conf_Files#RTF
> 
> 
> "David \"Judah's Shadow\" Blue" <yudahsshadow at gmx.com> writes:
> 
>> I'm working on an info command to display some basic info about 
>> modules, and I
>> ran into the fact that, at least in the About entry, the conf file can 
>> contain
>> RTF formatting. As it stands I strip out \pard, replace \par with \n, and
>> strip out the tag portion of any anchor/link tags found. My question 
>> is, are
>> there any other tags that are likely to appear in conf entries that I 
>> should
>> be either handling or stripping (since my front end does no formatting 
>> of text
>> whatsoever)?
> 
> 



More information about the sword-devel mailing list