[sword-devel] GenBook osisID and URIs
Chris Little
chrislit at crosswire.org
Tue May 13 16:35:38 MST 2008
Karl Kleinpaste wrote:
> (As it happens, GnomeSword understands sword:// and bible://
> equivalently, but I suspect we should do away with the latter.)
I had thought that BibleCS handled bible:// too, but when I checked
earlier, it didn't.
I'm open to adding bible://. It's certainly an easy addition. But I
don't know whether we would gain anything from it.
>>> Josephus:The_War_of_the_Jews/.Book_1/.Chapter_2/.Section_3/
>
> That's profoundly icky.
>
>> I think simply
>> sword://Josephus/The War of the Jews/Book 1/Chapter 2/Section 3
>> should work, or
>> sword://Josephus/The%20War%20of%20the%20Jews/Book%201/Chapter%202/Section%203
>
> I have URLs like this in actual use...
> sword://Josephus/%2FThe+Antiquities+of+the+Jews%2FBook+17%2FChapter+2%2FSection+4
> ...because embedded `/' makes me nervous and `+' is the URL space character.
I guess this comes down to parsing, which we'll probably want to build
into the Sword API to ensure uniform handling across frontends. In other
words, the application gets "sword://{module(s)}/{key(list)}", calls a
URI parser, which hands back a list of modules and a list of keys, and
the application does whatever it likes with that information. And we'll
want to do a function to perform the reverse, too, with module + key
list --> URI.
The embedded '/' wouldn't cause much of a problem except as the first
character of the key (in GenBooks). So, we could either percent-encode
the '/' characters or just ensure that we don't include the leading '/'.
I don't think it matters which we pick, but stripping the leading '/'
certainly lends greater readability. (A third possibility would be to
just encode the initial '/' when encoding URIs. That would make the
unlikely case of dictionary keys with a leading '/' safe as well.)
The current URI RFC (3986) actually specifies that spaces are to be
encoded by %20, but we should probably bear in mind that various
applications (like older web browsers) might use the older style '+' to
encode space. So in decoding, we'll want to turn '+' into space, and on
encoding, we'll want to turn space into %20 and '+' into %2B.
And since we haven't discussed it yet, we would also need to
percent-encode all other non-safe URI characters. Then we can pass UTF-8
character strings through URIs, albeit completely unreabably.
--Chris
More information about the sword-devel
mailing list