[sword-devel] Unicode and newlines

Victor Porton sword-devel@crosswire.org
Sun, 23 Jun 2002 19:02:53 +0600


Why in non Unicode mode (when !isUnicode()), you do special processing for 
'\n' and '\r' (in preptext()) such as '\n'->' ', while in Unicode mode you 
don't call preptext() and leave '\n' and '\r' as is?

My guess: may be even under "broken-line" Windows in UTF-8 line end is encoded 
the same as in Unix, to by "\n\r", may not it?

I first created the '\n' replacement system which is described in my earlier 
messages. I recently found that when module is encoded in UTF-8 the things are 
different.

Replacement '\n'->' ' on reading module entries (by getRawEntry()) creates 
problems with editing, as the pretty formatting of XML is lost. No such 
problem with UTF-8. (However I haven't yet checked whether empty lines in 
entries work even with UTF-8.) Are we going to completely switch to UTF-8? 
What is the long term solution of the problem with broken formatting and 
related ones?
-- 
Victor Porton (porton@ex-code.com)