<p dir="ltr"><br>
On May 21, 2014 8:00 AM, "Jaak Ristioja" <<a href="mailto:jaak@ristioja.ee">jaak@ristioja.ee</a>> wrote:<br>
><br>
> -----BEGIN PGP SIGNED MESSAGE-----<br>
> Hash: SHA1<br>
><br>
> So this means that actually we want non-standard RTF (someone should<br>
> update the wiki). Should we assume UTF-8? Are you sure we don't have any<br>
> modules with ISO-8859-something encoded values?<br>
></p>
<p dir="ltr">The wiki states that the Unicode character is preferred, at least for conf files, over the RTF escaped value. Specifically it must be Unicode encoded as UTF 8 or CP1252.</p>
<p dir="ltr">> If we choose any ASCII superset encoding we have to consider at least<br>
> the two points:<br>
><br>
> * Since the RTF control words and delimeters are specified in ASCII<br>
> only, we need to decide whether how the bytes of the superset act as<br>
> delimeters and parts of "RTF" control words. For example, whether the<br>
> Unicode letter, number, spacing, punctuation, control etc characters<br>
> constitute parts of RTF control words or act as delimiters.<br>
><br>
> * In case of encodings where characters may consist of multiple bytes<br>
> (e.g. the variable-length UTF-8) we must consider the character<br>
> bondaries. We can't just pass through any non-ASCII byte values. For<br>
> example, the following bit sequence wouldn't make sense:<br>
><br>
> 11100010 01011100 10000010 01110001 10101100 01100011<br>
></p>
<p dir="ltr">Did you literally split the individual bytes of the euro character around the other bytes? What possibly valid encoding permits that? Is that a valid UTF 8 sequence? If not, then the file fails to be UTF 8 encoded and the engine either will error or otherwise behave in undefined ways due to invalid input. </p>
<p dir="ltr">--Greg</p>
<p dir="ltr">> which is an UTF-8 encoded Euro sign, €, interleaved with bytes of the<br>
> ASCII string "\qc". It just doesn't make sense, whereas the following<br>
> sequences would be correct:<br>
><br>
> 11100010 10000010 10101100 01011100 01110001 01100011 (€\qc)<br>
> 01011100 01110001 01100011 11100010 10000010 10101100 (\qc€)<br>
><br>
> So depending on the encoding it were correct to detect such cases,<br>
> otherwise we end up with invalid Unicode output.<br>
><br>
> Blessings,<br>
> Jaak<br>
><br>
> On 21.05.2014 15:19, Chris Burrell wrote:<br>
> > I believe some conf files have direct unicode (rather than escaped<br>
> > sequences) in them and that is preferred.<br>
> ><br>
> > On 20 May 2014 23:28, "Jaak Ristioja" <<a href="mailto:jaak@ristioja.ee">jaak@ristioja.ee</a><br>
> > <mailto:<a href="mailto:jaak@ristioja.ee">jaak@ristioja.ee</a>>> wrote:<br>
> ><br>
> > I've never done BiDi, but I'm not sure I need to take that into account<br>
> > while fixing the RTF parsing. As I currently understand it, this<br>
> > particular piece of code does not support any part from the RTF spec<br>
> > dealing with bidirectional text handling. Hence all BiDi information<br>
> > contained in the configuration file strings (e.g. About=) is contained<br>
> > either in the plain ASCII text or the \u<num> Unicode escapes which this<br>
> > algorithm should pass through unmodified.<br>
> ><br>
> > ...except for HTML entities which should actually be escaped. This bug<br>
> > in the algorithm I previously failed to notice. Additionally I forgot<br>
> > that non-ASCII characters in the input string should also lead to<br>
> > parsing failure.<br>
> ><br>
> > Jaak<br>
> ><br>
> ><br>
> > On 20.05.2014 21:01, David Haslam wrote:<br>
> > > Take care with Right to Left languages such as Hebrew.<br>
> > ><br>
> > > i.e. After any patches to the filter, please include some testing<br>
> > for BiDi<br>
> > > text in the About= field and others.<br>
> > ><br>
> > > David<br>
> > ><br>
> > ><br>
> > ><br>
> > > --<br>
> > > View this message in context:<br>
> > <a href="http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html">http://sword-dev.350566.n4.nabble.com/RTFHTML-filter-bugs-tp4653969p4653970.html</a><br>
> > > Sent from the SWORD Dev mailing list archive at Nabble.com.<br>
> > ><br>
> > > _______________________________________________<br>
> > > sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
> > <mailto:<a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>><br>
> > > <a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
> > > Instructions to unsubscribe/change your settings at above page<br>
> > ><br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
> > <mailto:<a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>><br>
> > <a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
> > Instructions to unsubscribe/change your settings at above page<br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
> > <a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
> > Instructions to unsubscribe/change your settings at above page<br>
> ><br>
><br>
> -----BEGIN PGP SIGNATURE-----<br>
> Version: GnuPG v2.0.22 (GNU/Linux)<br>
><br>
> iQgcBAEBAgAGBQJTfKM/AAoJELozJlbjIn79gXpAAMxwoq17dvVzCikAplQUjON0<br>
> xDJXlDFfKK14w8xj11NSUvJEPjVWlwTi82WzEplQBKfkxtFY09010ZB5IKotEtSP<br>
> dcJMjzc4FmuJmPifB7s3gtEOQ81OThMArlnq/aFHvGj6+5D8qjFkQiqOzSJeaORS<br>
> C8dPobXSnJkJ/g3zKCdJf/k5msphFbmuIQOD4Ovco2ZHHlukL8QNd8pt3RcPN4Hy<br>
> BMxYx9glw3+YJK5Jj63isdsmOGLeRory3PDcHZoPJzu8zssW78Chlsgoh+xWlfkn<br>
> zI5PdP1ARhq7K/kUnPp7jXx3LDFiEbmPjrNBi/A03k+n7s2oZWdxm9uBfEEq5VpB<br>
> DpdCA19msaEE+fOWOyAAvvZstnCxYrrd01j+HxXUGoA4JHBBVQo01H5udfOdbiBu<br>
> nSI5M0GUKBjSSfLSmrh2oTC0qniVMRw4t+IAIJU1chjfBCsoNAx6xTiDE8x+hpjd<br>
> A+s8wvgBU0gNbqeOMvWXkHeOWSu7O0oPEp0vVl+6fUPPFDHGR1+2vPXLnCcbASwj<br>
> pEJwls9IBis7touUlIt4stlois1Imtw8zKGXXU8h0UmSgRHK0G2Ck8clNptClkMY<br>
> +9xP+TGXZI0q+WlzA7M4aD2puQAiJ0iJTm/kV+QGF/1RiaWNGWTG7Oxfufz5XdDn<br>
> xqTrAkYoVw3a+ZRgZPs4YbyK3ysVqncvAOFKuqLcEEwiA4zEYztGxPMAhcypQJFH<br>
> n6ORlF3/Kmkukj3eapanznmcvoZ+H/APKNWmo2b+TZ10WABCtZVDO+pd1Ed+l2U5<br>
> EytGhMYEqNSMqV109k3It9Ll7a8GVQa6k7AX8/BSXlh6/GaaoIzkSgGJBFAU8Zsj<br>
> dW7u6O7wBOTBmE+lUUrwA3igveDhTDhzjORE7Ek74xkhoNVwh1DmqWwJGZbIGb5R<br>
> 47yWwxql4pqS4jq3M+TM8SUZaeY/NTjRTn+WLFBGahKVH5Gg/NiB6onfBBRLyYwK<br>
> iorFYngEhpKDNJBPp8rfSIg4NxhbupwG9B1Bbrdg6Kj+E+kGsXDuDkBWQEgf1Jwv<br>
> 3XbiDBEjUf2wr4TdbUx9GrwrBNP7q9YW0RmbQGlvIahVwtr3/PJGhiU/kS47fAZf<br>
> HQMac1US7eYgtW5hzH/YG+41cCI9J0byZBEuSJS2GuSd0LD0Of4bPLxyOxiXqvTU<br>
> kwSPIQwsBOZpFIA5Qfc35x5KxVqCGUYBvXhglpZtZGlGr8uIPpshc1gz9ukCejuz<br>
> 754upiYTlCzocKpvPbER9QpMZFYb+iDTdc4bU8whmxkP8ATKSDQmYIqUS2ohLKV8<br>
> co5X0741kRaG5oNOBBrM7kn/9nWgFNspFBkJAvGLbD8h6R8S11cu7INrXzJjxv/e<br>
> bCAxGXb2UQXXUe18FCYeqUvl5VdQOQt3f7gja3XbitCKkJjUA6i7t1+5vjuMQsAY<br>
> NFliiFxNeNjNE4hIIpvA7G3N+2t0W8IjGsystXm6ONN0lM78eLZLLlsrfkPi8NgR<br>
> Nydc78zEJfGr8APkiYleIYTi6ftgtDrI9927wNWqgIPqO4vqA1TZngX8wx6YPJou<br>
> uF8cSnI0PlcOfEKtsBgZedOpbZlqAt61wvMGMW0YUfiL5LhuP95KQekqDMMBDCQX<br>
> mGMehJHRJ5PvoDt8485lGOWdwXn6T7PlakZ1UCtYeMV0Nx2PfPBfU7bnCwSRFQKg<br>
> vpUhPCkW5qpvlkBLOpPLwkqcZGiSyLL/YSGp6cVExeeQVHc2hI169zGY9dUHBEMN<br>
> CaKwI9Wjn5V95bax3gsMlHnY9c1TB/6yLWnVEJAilm5ijgWW5KxstWoJMd/OptY8<br>
> QvbsOA7K36HfwOwNCblQCGbUrPjikhXTw8ew1aap4OHqGIKUWCMm3z/eHOPRU5mD<br>
> Ce2Z86vwYb9T2PcyqUiZOs1WW9TBZx70Hr2JQmRwgMyWpT4DERjofP83IA8vxZdP<br>
> 9uKT4j+EBUGoI2zGgE2lapLL/VWrzt6OBMv5iUmR4OIFLdnHevAAy5w53c4+tWjs<br>
> SNmjAz8tW5FWiVFR99FQBN6KWXIjKdJGQl+zccOlE0zBQe2grnqFmUeuuBbPiojb<br>
> Wch+hqrKDX/VLr/gIP9EErMJ7ZvZ7st+gwPZlFwC7Evf3OCrUnRYIbMI6iLGLoZ6<br>
> c9YLbK67hj1Ho+X99XTeoQj8l2V14TSRCFZBmO7Os5L2kXOEiw0yeV8Dn87LJPFp<br>
> 4VcfgFGLi9FRnI36K4+h5JWoyhrGhNHrHsO60Xs2U3a02fRfeUgn/T1Xf0xXbVMC<br>
> gX8zJ3aC15pUy/dJaqJ4HIszzPe5ErO7J9GB7AhjVnx8pEE0xayoJkA4VM0YF8Lk<br>
> b/IF04rm/dNlsLL7zRzdGpr2uo9esMzFJDYcHnhInhaE7t2iGR4+cgUdRJKA7NJW<br>
> ZumxNz3a1EjeZHRLqRxfT8O6Cc55hG4GwVO7JxUnXJtRMx+ENXZslf4ExGdhcTdf<br>
> ntjsfngGemyKYv8aMJ9pDlLFVyR+91xSpFp8QYRDtcP14y5Dfh/jh4Kmdu0BqTzt<br>
> Wt0KUUZQlx8Qu8XJbatPiieDmjtQ8HPmhsHQAA+QmLzrhEmakrAjTfpWq5eNYQeQ<br>
> ei6tawFllPyuNrez2BOP3nfXuSBlfn2+yBfi3H1mJc8urrFwDtt/zqTHdoOtyCNO<br>
> PVaqMROmVzgdKg7yyXTBek3UBe8TxMWigvepRvxkGlmMZQkW42/5ft0269esY/bw<br>
> tuy57vDPyvQfrJzpN62y<br>
> =RNpJ<br>
> -----END PGP SIGNATURE-----<br>
><br>
> _______________________________________________<br>
> sword-devel mailing list: <a href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a><br>
> <a href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a><br>
> Instructions to unsubscribe/change your settings at above page</p>