<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="Helvetica, Arial, sans-serif">For what it's worth, speaking
as a newbie who had to work through how to get a module to a place that
it was worth sending in, I think having osis2mod as "general purpose
and common as possible" is a good idea. Newbies still need to test
their source before sending it in, and headaches with osis2mod can
deter newbies from getting a file ready to send in. Also, if osis2mod
is easier to use, fewer questions about how to use it are likely to pop
up on sword-devel.<br>
<br>
Daniel</font><br>
<br>
Chris Little wrote:
<blockquote cite="mid:47C39D84.6020006@crosswire.org" type="cite">
<pre wrap="">
DM Smith wrote:
</pre>
<pre wrap=""><!----> > I mostly agree. But once I know that the module is NFC, I'd rather not
</pre>
<blockquote type="cite">
<pre wrap="">take the hit. I must have made the KJV into a module 100 or more times
before I got it right.
</pre>
</blockquote>
<pre wrap=""><!---->
What would you think of making normalization the default and using a
switch to turn it off? It doesn't particularly matter for me, but I'm
thinking of a complete newbie trying to make a module. The defaults
should be as general purpose and common as possible. Then again, since
we build from source for our releases (not from submitted compiled
modules), perhaps it doesn't matter either way.
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">Second, your comment about needing UTF-8 input makes me think we should
go ahead and add encoding conversion to the importers as well, possibly
with automatic charset detection.
</pre>
</blockquote>
<pre wrap="">I'd like to see OSIS modules also be UTF-8.
What mechanism were you thinking of for automatic charset detection? I
have a buggy routine to detect whether something is UTF-8, 7-bit ascii
or other. We could use that (once I fix it).
As to automatic charset detection, could we require that every input to
osis2mod have:
<?xml version="1.0" encoding="UTF-8"?>
or
<?xml version="1.0" encoding="cp1252"?>
and use whatever is the value for the encoding attribute?
</pre>
</blockquote>
<pre wrap=""><!---->
I planned to just use ICU's charset detection. It takes a bunch of text,
runs some heuristic algorithms on it, and uses that to guess the
charset. It supports most of the common standard charsets.
--Chris
_______________________________________________
sword-devel mailing list: <a class="moz-txt-link-abbreviated" href="mailto:sword-devel@crosswire.org">sword-devel@crosswire.org</a>
<a class="moz-txt-link-freetext" href="http://www.crosswire.org/mailman/listinfo/sword-devel">http://www.crosswire.org/mailman/listinfo/sword-devel</a>
Instructions to unsubscribe/change your settings at above page
</pre>
</blockquote>
<br>
<pre class="moz-signature" cols="72">--
PMBX license 1502
</pre>
</body>
</html>