[sword-devel] XHTML vs HTML (was: Color in osis)

Nathan Phillip Brink ohnobinki at ohnopublishing.net
Mon Mar 18 10:35:03 MST 2019


On Mon, Mar 18, 2019 at 06:18:31AM -0400, DM Smith wrote:
> > On Mar 17, 2019, at 8:48 PM, Nathan Phillip Brink <ohnobinki at ohnopublishing.net> wrote:
> > 
> > Hi Karl,
> > 
> > On Sun, Mar 17, 2019 at 07:47:22PM -0400, Karl Kleinpaste wrote:
> >> On 3/17/19 2:44 PM, Nathan Phillip Brink wrote:
> >>> It sounds like you’re trying to render XHTML using an HTML parser.
> >> If there is a way to make WebKit /*not*/ operate in an XHTML mode, I'm
> >> not aware of it.
> > 
> > Sorry, I may be going a bit off-topic by pursuing this.
> > 
> > I am quite sure you are running WebKit in HTML mode and not XHTML
> > mode.
> > 
> > I don’t have an easy way to directly test WebKit per se. But I can
> > demonstrate what I am describing and have tested it with Midori-0.5.11
> > which claims to be WebKit. The following also works in any modern
> > browser (IE, Edge, Firefox, Chrome, or Mobile Safari to name a few).
> > 
> > I have defined CSS rules which set anything with a class of div2 to be
> > green and used it for both of the following files. In each file, I
> > have text, an empty div div1, more text, an empty div div2, and more
> > text. Each empty div uses the self-closing syntax “<div/>”:
> > 
> > HTML: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/index.html
> > 
> > XHTML/XML: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/index.xhtml <http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/index.xhtml>
> 
> I noticed that not only did you have the xml pre-processor as the first line
> <?xml version="1.0”?>
> But you also had the namespace declaration for xhtml.
> <html xmlns="http://www.w3.org/1999/xhtml”>
> 
> Is the later necessary to put the parser into XML mode?

No. The parser is put into XML mode when Content-Type is set to
something like application/xml or application/xhtml+xml. However, once
the parser is in XML mode, the renderer will require a namesapce to be
set to render things properly. See below.

Also, I am not sure about the effect of the XML declaration. I think
that matters more when something is trying to sniff the MIME type
without other information (e.g., you get a bytestream without any HTTP
headers). But if there is a Content-Type of text/html, the XML
declaration will be ignored and the content parsed as HTML like you
are experiencing in Xiphos.

> Are either really needed if you are delivering it with a Content-Type:text/xhtml?

The XML declaration should not be needed to put the browser into XML
parsing mode. Also, the XML declaration is not required. However, the
XML spec says that an XML document “SHOULD” include an XML
declaration:
https://www.w3.org/TR/2008/REC-xml-20081126/#sec-prolog-dtd . And this
would help identify the type of document if it is not packaged in a
MIME message or the response headers of HTTP were lost.

My understanding of namespaces is not strong and based mostly on
experimentation. But I think it is intended to support the
“eXtensibility” of “eXtensible Markup Language”. This is because
the namespace allows different langauges to be expressed in a single
file at once. For example, XHTML supports this with embedded SVG and
XSLT supports this to allow writing verbatim output fragments mixed
with XSLT directives.

For XHTML, a browser uses the namespace to determine the rendering
mode. If the browser parses the document in XML mode, it seems to
always use the namespace to determine the rendering mode of the parsed
document. This is the case regardless of whether the Content-Type is
set to application/xml or application/xhtml+xml. See the following
examples:

unnamespaced XML: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/xml-unnamespaced.xml

namespaced XML: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/xml-namespaced.xml

unnamespaced XML served as application/xhtml+xml: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/xml-unnamespaced.xhtml

So, what would be relevant to your project is that, if you choose
correct its behavior by switching from an HTML parser to an XHTML/XML
parser rather than preprocessing the XHMTL into HTML, you must ensure
you do not strip the namespace when producing fragments if you want
WebKit to include the default HTML styling rules specified in
https://www.w3.org/TR/2017/REC-html52-20171214/rendering.html#the-css-user-agent-style-sheet-and-presentational-hints
which, for example, the vanilla CSS provides styles which render h1
and h2 with a larger font and p as being a block-level element rather
than inline, etc. Normal XML manipulation libraries will make it hard
to accidentally “lose” the namespace.

-- 
binki

Don’t forget to check for missing or extraneous apostrophes!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190318/fa267ed8/attachment-0001.sig>


More information about the sword-devel mailing list