<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=UTF-8" http-equiv="Content-Type">

</head>

<body bgcolor="#ffffff" text="#000000">

<font face="Helvetica, Arial, sans-serif">For what it's worth, speaking

as a newbie who had to work through how to get a module to a place that

it was worth sending in, I think having osis2mod as "general purpose

and common as possible" is a good idea. Newbies still need to test

their source before sending it in, and headaches with osis2mod can

deter newbies from getting a file ready to send in. Also, if osis2mod

is easier to use, fewer questions about how to use it are likely to pop

up on sword-devel.<br>

<br>

Daniel</font><br>

<br>

Chris Little wrote:

<blockquote cite="mid:47C39D84.6020006@crosswire.org" type="cite">

  <pre wrap="">

DM Smith wrote:

  </pre>

  <pre wrap=""><!---->  &gt; I mostly agree. But once I know that the module is NFC, I'd rather not

  </pre>

  <blockquote type="cite">

    <pre wrap="">take the hit. I must have made the KJV into a module 100 or more times 

before I got it right.

    </pre>

  </blockquote>

  <pre wrap=""><!---->

What would you think of making normalization the default and using a 

switch to turn it off? It doesn't particularly matter for me, but I'm 

thinking of a complete newbie trying to make a module. The defaults 

should be as general purpose and common as possible. Then again, since 

we build from source for our releases (not from submitted compiled 

modules), perhaps it doesn't matter either way.

  </pre>

  <blockquote type="cite">

    <blockquote type="cite">

      <pre wrap="">Second, your comment about needing UTF-8 input makes me think we should

go ahead and add encoding conversion to the importers as well, possibly

with automatic charset detection.

      </pre>

    </blockquote>

    <pre wrap="">I'd like to see OSIS modules also be UTF-8.

What mechanism were you thinking of for automatic charset detection? I 

have a buggy routine to detect whether something is UTF-8, 7-bit ascii 

or other. We could use that (once I fix it).

As to automatic charset detection, could we require that every input to 

osis2mod have:

&lt;?xml version="1.0" encoding="UTF-8"?&gt;

or

&lt;?xml version="1.0" encoding="cp1252"?&gt;

and use whatever is the value for the encoding attribute?

    </pre>

  </blockquote>

  <pre wrap=""><!---->

I planned to just use ICU's charset detection. It takes a bunch of text, 

runs some heuristic algorithms on it, and uses that to guess the 

charset. It supports most of the common standard charsets.

--Chris

_______________________________________________

sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>

<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>

Instructions to unsubscribe/change your settings at above page

  </pre>

</blockquote>

<br>

<pre class="moz-signature" cols="72">-- 

PMBX license 1502 

</pre>

</body>

</html>