[sword-devel] RTFHTML filter bugs

Jaak Ristioja jaak at ristioja.ee
Mon May 19 15:09:26 MST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

1) According to http://www.crosswire.org/wiki/DevTools:conf_Files the
\u control word should be followed by a 16-bit signed integer. The
wiki page doesn't mention this, but I assume it is in ASCII in decimal
form.

The RTFHTML filter code appears to incorrectly parse the following
strings:

  "\u-999999" -> getUTF8FromUniChar(48577)
  "\u-99999" -> getUTF8FromUniChar(31073)
  "\u-0001" -> getUTF8FromUniChar(65535)
  "\u-00" -> getUTF8FromUniChar(0)
  "\u-0" -> getUTF8FromUniChar(0)
  "\u00" -> getUTF8FromUniChar(0)
  "\u001" -> getUTF8FromUniChar(1)
  "\u99999" -> getUTF8FromUniChar(34463)
  "\u-" -> getUTF8FromUniChar(0)
  "\u--" -> getUTF8FromUniChar(0)
  "\u--2" -> getUTF8FromUniChar(0)
  "\u-a" -> getUTF8FromUniChar(0)

I think all these should instead fail.

2) In case an exception is thrown, text might contain a partial result
or the original value.

3) For control word \pard (and similarly for \par and \qc) it
incorrectly parses \pardx as \pard and "x", where it should instead
fail due to an invalid control word \pardx.

4) \par incorrectly appends a newline.

5) "a\qc b" is converted to "a<center> b", but should instead be
"a<center>b</center>" (' ' RTF delimiter output, missing HTML
</center> tag)

6) "a\par b" is converted to "a<p/> b", but should probably be
"<p>a</p><p>b</p>" (' ' RTF delimiter output, missing HTML <p> and
</p> tags.

7) Weird combinations of \par, \pard and \qc result in broken HTML
fragments or HTML fragments with unbalanced start and end tags.

8) Unsupported control sequences do not cause the function to fail,
but are passed to output as plain text (including the backslash).

8) Unescaped '{', '}' and '\' characters are not handled properly (to
pass these from RTF one would need to use the control symbols "\{",
"\}" and "\\" respectively).

Maybe I'll get around to fix this someday during daytime. To save me
extra work, I'd appreciate any comments on this before I start any
coding, especially if the Sword library needs to deviate from the RTF
specifications.


Blessings,
Jaak

PS: I'm glad there are no memory errors in this function. :)
PPS: Please forgive me for having studied formal languages.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQgcBAEBAgAGBQJTeoEUAAoJELozJlbjIn798A1AALYn7ogi0Q3QvLPq998aj5R8
dMW/iAPIRPgmvrqpccTaaYxbP60E5Pm6Yf3XEFR6KkP01QQtM/v6S7Bxmmo28ewr
En3ZMzhldHDQUXKuaP5+Rp8ndw81SjlyeVYZlQlpcm/gWBzJpjZ4CFJuePH5/iwp
1kn3WwRJM5mp2nejOC+JIRgL8RDvEMwowSHWFKESI//YoJzS6tWKQskGI65dWngb
PYFzMpllpJpQhMKspDXh6sbJT43UlX/Kvh9G/JDrp5PUeJbBLO4xcs9kd+lbK9fP
XKCxeN6Ih63p4AR/PkwJQqYW1m/i/xSdMcozfOF5nkGyVGqW9XcLS9NEVLT4JzYg
PaU1ZiuhjxNIsF28x6ewSDadPExkOyXMDMRqHC23udPtQt4P9QMYwwsDTBn77mzt
sCK/WL486Rewl2wWJGwTaYG8HieFQF0/ZsrKFGlB7u3zzJx608SdiXxvt/w29keo
0UPzl0se0imAhSLEbwHe4keS7SGofncoCU4u1bacfRMnngCf2irpyGElFfYrlH32
bPhIQBG4pZp3noHM8O6cv/w5xCtE0nZ9ROV4pI1xzPFB4yDiCDV/LXLYV0RCHW92
/fteZAYYLqC/BQvyRi/eZ0XAM+a0L2rdm+ggFI/Vcq+VfT/gjv7UfzcwsfS/J1eA
NawubrlcvuH430K4pNIPPbwfybwV6eNkt6YbffE4cgOhFGUtMWuph6cVEn/Ic0cY
MlDR+t7p0PNQGZ0KeqpEkydhLEiQGbUPfmtTYRY64ZrwiSRT3ouHsgO88/G/Ehvt
jTce6S4XY43Bp6sAu5mjdD4+ObSWbAMBwMN92tlQ0yZ5ctvx4qVLEV/ld/QBjayG
ryzjZ0zP3uclEvDAuP/aUsX1ocS1tW7heMeyqC0tb8oUslTf9kwjx/VAZLQZyvqy
a1uYDgrHqVslKYc2BffFns33tfkia4+8Y6NkoVOmuB0wdOnCSm+QbEJT11bJVh7+
UwL/g5ih2c0/xQgvBF5sGvOANy2hJGFulehZ4qcjcsFw3YQFHnUIobnjoxuXkta1
uB690Wol18v0Xkf9+19tYx65+3h6iss/2Qw9FhiJyVFS++a3Z70NSlbC2MJz+TH2
HCp0Z+qiikP/FohZbz5hru9luTPx7uM44AGI1MFRjj1275CMWeEAZCEx4pZUkL/G
5xWDDCxN0FJorkuI3yUw7CKcN6c7hcAM5iOMO91SgpS5vIco0/H2BTVl8XDO1tt6
ngbYuGEhZhHNExn6RRk1KIOx08USJ9i+iPqB8dVT8tDGK+VAF/9M95uEhZy5d9g0
NhbpMx1EPgVk/E3+VNKBB1zgxsnkvjzCnR+F65h8A+aeDj4jvrHowIkqcdL45IVX
cWjuYmVe3uOlDMLF/q2X3Rh4tOTtGQA1ApJdfXBDzj//hFudDNgb0OJjLTuyg2tG
xgn6qPfcNcO9WKbiqBhU20FQnTUiMyEMF1pW/4OckJ3fIe86V3JhIkP5w1l6F5K4
7npniPO9gXTfDAFDbNEwwiCb2ejVPqMjRUdI/PJwvpXXRNJIiAc3+jRhhJ8xdipY
2SFnWugLkR0bC8i/Lbf9djpUSTwuxgb+GcXUCpA1S1pfWECPwL+jzQAIAGwIV3ly
dk6XlyNrmFkpC9s+/dbKfStXbGmy6tSbSACBJHyXq2OaERsbQsbXkp1DyljuIbG/
raOoq1ewuoc0Ie/6C8RA/QUcY+uvszsw/HVs3W4eUtc+YDUX+p3+ptZBE+wL4lHX
f67P5++gsI/IajT/a+cOm6tzkVPpJjdJW0yN1tAoCeAdEsP8fs7JnmOX0MddkGAK
bZyPnRYqC8tNjyvp656cYf3250W2dlkjWQQ122WjjLYRdiPIimEs2rm8IlpvIT5G
u4ejUnsfq+js1GBUyv7O3WZilDOZMFU26W6rCOvhCdwMu95Hwvqmqm7ofCJ+vbSZ
O7QkkApB54koKX3H8FjiBdeqSbk9/Ej2WVUvhEI6MwrFX4vDQR9RkRtW8tH/iQey
elV5ABcN+sLSgclgrVFXle03SkZrjWZzbKZ84k6W6g5Od9vKj9gTiKaPzddd3EK+
KbN/RtQmZcT77ceABHzdOQ0HKe6L7GI56Q3Y1eV66v6xL4QwBgroYA4Tg4dy7Ddk
TcKvUInyEXZRM1A3vkUQk5mZvatHmnOwVyi0PTVyO3isuFLoNwIp9xDhEZJsDd5B
qHHnjmlVtpE0SzD8EVrKAJAO4/fllZKd/hzv14rUSZ7ORl7PRdSzO5933dw+v6Bb
Nut2uIfzAAW1xeadYtWufE50qDVraWS+oy9Iyeat0RRdxEx7+luz7iuvTDcaUa00
+Wygu4bWGCLvO3EpEq0JK/1H3Twa2xc6FR9T1Bg8CJVsVGCizfxD0WXQuoLzOzpb
uYlaEX18UoomDHFo+8JrCZwGKBgSlUqwehhUA75Yh/S/DqfZnYzK6RUekvms0We6
dNcP8H5OY+f3rCcKF2FY1Gz6QE03GmrguRxVS2TIRPUo90XuMBMxQSihC7LLHA3d
cjQC6biOUZPq1RoeRs6xx+aLgmS0BZgYwqUl7H5RCauDx8N51On39ZWAkDXZTd1O
p0L+a526J2AjK19PKjB/OcdJcFyQBQgO6abCcBZ2ooWhFsxL4JgBX75w+WAsSBmE
kol3waKHsVC23TvPG2NoNHeh48RZfDrGy0hYIk2tymfW0KhAwpu6Ou03BlojHR4j
zl1NPiRW9SjvMEvpZtZF
=Mrt1
-----END PGP SIGNATURE-----



More information about the sword-devel mailing list