[sword-devel] RTFHTML filter bugs

Jaak Ristioja jaak at ristioja.ee
Thu May 22 02:39:01 MST 2014


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 21.05.2014 19:44, Greg Hellings wrote:
> On May 21, 2014 8:00 AM, "Jaak Ristioja" <jaak at ristioja.ee> wrote:
>> So this means that actually we want non-standard RTF (someone
>> should update the wiki). Should we assume UTF-8? Are you sure we
>> don't have any modules with ISO-8859-something encoded values?
> 
> The wiki states that the Unicode character is preferred,  at least
> for conf files, over the RTF escaped value. Specifically it must be
> Unicode encoded as UTF 8 or CP1252.

Do I get this right, that before parsing any (possibly RTF)
configuration fields, we must parse the Encoding= field to detect the
encoding for all other fields?

IMHO most (!!!) valid UTF-8 is valid CP1252. For example,
  11000000 10000001
is a valid UTF-8 bytestream, but not a valid CP1252 bytestream, because
the last byte (0x81) is not defined in CP1252. Additionally,
  10000000 00100001
is a valid CP1252 bytestream (euro sign € and exclamation mark !), but
not a valid UTF-8 bytestream, because UTF-8 characters CAN NOT begin
with 10xxxxxx. However,
  11010101 00100001
is both a valid UTF-8 bytestream (1 character) and a CP1252 bytestream
(2 characters), but
  10000001 10000001
is neither valid CP1252 nor valid UTF-8.

> Did you literally split the individual bytes of the euro character 
> around the other bytes?  What possibly valid encoding permits that?
> Is that a valid UTF 8 sequence? If not, then the file fails to be
> UTF 8 encoded and the engine either will error or otherwise behave
> in undefined ways due to invalid input.

Yes I did literally split that. No valid encoding permits that. But of
course we should not assume all user input is valid. To prevent
undefined behaviour, crashes and exploits etc. If the Sword project
wants to allow code with "undefined behaviour" (with respect to the
C++) standard, I do not want to be part of this project.

I suggest we be strict in all parsing, because it could yield in
security issues, as I presented in another thread on this list. If we
want to allow non-conforming user-input, we should at minimum output a
warning, but still do parsing in a secure manner which does not cause
undefined behaviour or provide an attack vector.

Jaak
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)

iQgcBAEBAgAGBQJTfcWyAAoJELozJlbjIn79OnE//2jy91KMUBgxm4JaePhPfTnV
fise05REobJsRiXyqnQO4QjXtZ/Y+c+6i9FGljhTB9irZAU0la9+Jl3xL3vOrDDr
KLD++29WfOx7JxDdkvq1s2Lv4wTBRXAYf6e/WmHpKZZwErcllNgUxV2M/Ztcw/7j
W3AwWOF4fKKzMGnDCzNkQ/+pNOmcxgMj+tNkuODJf1RdvNnZgijmEIaR1pkMszTm
+AUS5zZXKc2EWsf6Yv5osjuhsqa9diNkjcq5tNgVsowjOZVR236yKMjnr4IcazIH
jx1wws4f+J2e9meAqycRjYeT++sMCU0OdsgkDwU92Qkb2EkS31cIRxPWCb6+hu+f
u+IbUgKjK4rsJ9dfu7A9uK80+F9LzBHsJZIlox1aXsWvZC/+nvWDyX6oCI/ZnNoG
/6Cm/85Znxedp5pJPgXDqxvTWKFyhLdHFKsXcB9y2PLbCSopUsGx9Wq7zvkm5p8n
mJeyJybhuXN6X+4peNYzwMSZx6dEPMY9ICIaxfaP8s4l1J7p1TNU/3SdYCVCi7us
zBZt5F7SoyIaXv5OyOtdEIjbzZ/v5tBPXGakoNmaa25rI7/UaSqS75B8EXXa4Xpw
rQprSwBSjkehdG5dMEjUGEPFKgvyUNgJR9B9rQo7BBAkmDhFj5r9PgshTsTR3W4L
vU365lrci90RngYVbjWSA4bpd7AQiN1Yv+a+fBgHLJ+msVaDpgN54cBQaEUVy1G1
vSvpUMQZ1TQkwafTGFgV1g6dTLMLSP/S9L6aOuTclM7LE/A4nYCrPhB2H0m08/Ob
fyapRmF2erIya6AHfPhwU40SdbtM2MrJ7Pr05hudpCwSYyMoVsWA1Nz5A5hp0vpc
beHz70KNAcLbElS+4hnffLxn7n1+FzGrsRSwy0gWsUB0Ib6RwIiIztB2Wv79PdkQ
yl/DBjVrzZ3yZiu/Sssqus+uvzY/UOxGyLptpWyKGPxA5G80K9dwfrbh6cTSZqFk
pD4JziH1hECukGnOX1tr09YViWb09pcxNIQqWv01i5YdgVhbEiRXnOH//f8hH2gd
dDUZBpPd8NKHrhBkPoBH6jvu2fItx/Nn48IGgz5L6h9T3ZcfEN7sPmkkE3V3nWtW
vMoyqCc7/7K2LhD5pjDsvyhjyyNwRaerd3D7jq7/lQEK1Os8MHk7L7c5petUV+j7
tbhUkUTet8y0M2pEcEeNiQRbh4QCN5hDWJMLmEJ2ze7kCuJ0MMdefVB5vPxoCfq1
2XuSZpxrEUhiA8/bL6X2Jc4roS2oqiUnsZHzfGcK2y0/g3Maji1mJKFrlp7tAIJ/
ViJL2Ctom+pNUWLd0mboRgkl82kc7uI5pEj3luqyytbfKTwoY842QFQliRNdpGtx
s3jSgzSGlYf8OdaECutSe1bs0KBlTELzZXRLpNRruVUYoLCy6xFU/3xAInWRewHM
B3O55rf+UyKl2GvEiZwBS1TIzAcyK6xPmrgDlgryhEgzldRyzWKVSqJylsiK/Cd9
sXZ0EH4np0WfkRBh/2RmoAJQqi97ZyV2gZ7MhgYlF/4k+rz8XVjBLAXeOsRj6Zi7
W/SYV9rOpI0H6g8RAm5CLK3zXqxCLeVjb5lY7mhXzIzRAtJv2k+RvlRuaSmDgSAv
nvQOXO4PBTd3DzX4rn4dPH572/lU1zIXQ3nTOGVR7zasx9e21TxJMqNDTrotZHVJ
DiJPlxRbUAsJfJgxzC1w0gOZL5texIqyf7mb9YD93Gj6QDwmU/CATy65hs+Fvy73
XH5ZCSHVra+MgruYe9rc6jzXImqAOLMqC8EPJ6MI22hniNUgb6HE0NhkUatcHa8l
vPwuRV2qYFfB/XgAWdIBFXZ0bU+sHtn1Sh9KFslKN8WdSK9wGmqFrjsVnlwiHM9g
H92zY3x4eQvgbmeTSj/NsMoNGGYF06aeTTa34kGn7ZNZJ/t6lbsNLy1/kLI4ICHd
M/4MuiMhdbg/gW4+LRePQZIp/YKLR+H4YFx+DqAQQsylNIcZ37JN53ZAZRFA6fJd
jTtwBrr17u2gzT6Ire8AfN/q3tp2Q196wlRYn/4W9fQFVNMgvJlcz4TcbNfiz4XF
QSPdIgE8T1lmRW7JEY9jX7lxl9dcS3gVVpu5Bn9fJJB0dAJfq5DmCOh2maPZk1NW
dNPfd9h56k98AacKj984qqlrDreleC5S8IBNqDDD0IuETqAXU76uIOVjQBQ1ecH0
bEb1F/Bf7CBnGObrfLicXIJ8bewcqmtJNNettzjnBzBzV+ULXR3cb87vvAqr5AUR
MhhyW+Ljc7sGmi10d9nzU7y3JppDA/5FQ2EgghhQORBzHCQZrRVYUOzLF04MJrDL
xqDMYi5Ct25LNpXT4FWgZzshN9PkMYjGLi+CHHoxR0ObbGmc0dkNMAsZyNbrlutG
TXj8u+O8VXaHS2Qr+KXSCVkWxaXhLGv0eVZAYtP6Fl0h5vTe5MkOUXxFN1hSJQso
wLQp2XSXNxGb0JPDVfxgNdA/yz86zh1e008HpnlXtEIkuSVzoF7QSkfsaqFv71wb
CwdWu59yeAKc8h1Ur3qbFr/7gzeitxRT07OvZ7P10toaKan5EUbxnomKlNpm9DTz
H09Wc06pUQEkUZssFdxwIu8sdIT6BPeQuCnkpcJBvKNr7CTwvsxD5Aj50c9uwYqI
mR9DRnowbbkZVs4t9qjKFX56AQe0l24QLqAqy5NYpgMRaaTVC5bz2AlD915pjvmN
JzxZFt6xKFezgv+dsP6m
=W0Oi
-----END PGP SIGNATURE-----



More information about the sword-devel mailing list