[sword-devel] XHTML vs HTML (was: Color in osis)

Sun Mar 17 17:48:12 MST 2019

Hi Karl,

On Sun, Mar 17, 2019 at 07:47:22PM -0400, Karl Kleinpaste wrote:
> On 3/17/19 2:44 PM, Nathan Phillip Brink wrote:
> > It sounds like you’re trying to render XHTML using an HTML parser.
> If there is a way to make WebKit /*not*/ operate in an XHTML mode, I'm
> not aware of it.

Sorry, I may be going a bit off-topic by pursuing this.

I am quite sure you are running WebKit in HTML mode and not XHTML
mode.

I don’t have an easy way to directly test WebKit per se. But I can
demonstrate what I am describing and have tested it with Midori-0.5.11
which claims to be WebKit. The following also works in any modern
browser (IE, Edge, Firefox, Chrome, or Mobile Safari to name a few).

I have defined CSS rules which set anything with a class of div2 to be
green and used it for both of the following files. In each file, I
have text, an empty div div1, more text, an empty div div2, and more
text. Each empty div uses the self-closing syntax “<div/>”:

HTML: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/index.html

XHTML/XML: http://cdn.ohnopub.net/cdn/binki/sword-devel/xhtml-vs-html/index.xhtml

If you visit the first link, then WebKit is reading the markup using
an HTML parser. This causes the behavior you describe of self-closed
“<div/>” tags being treated as equivalent to opening “<div>”
tag. The resulting document tree has the trailing text inside div2,
making it green. If you visit the second link, then WebKit is reading
the markup in XHTML (XML) mode because Content-Type is set. This
causes it to parse it as an XML document and then render the resulting
DOM as a second step. The resulting document tree has the trailing
text after div2 as a direct descendent of body. Thus the text is not
part of div2 and does not become green.

The HTML parser is special and you are seeing the effects of using an
HTML parser to parse XHTML instead of using an XML or XHTML parser to
parse XHTML.

HTML does not have a self-closing tag syntax. For elements which would
normally be self-closed in XML/XHTML such as br and link, it relies
instead on a list of void elements which the parser itself knows
about: https://www.w3.org/TR/html/syntax.html#void-elements . Whenever
it encounters one of these elements, it immediately closes it out in
the document tree and doesn’t give it any children.

The HTML parser is required to basically ignore the forward-slash in
“<div/>” or “<br/>”. This was intended to allow websites to slowly
transition to XHTML if proper care was taken. The result is that
“<div/>” will be treated the same as “<div>” when using an HTML
parser to parse the file. See rule 6 regarding SOLIDUS at
https://www.w3.org/TR/html/syntax.html#start-tags and 0x2F under step
4 at https://www.w3.org/TR/html/syntax.html#get-an-attribute .

If you want to get the expected results, you should either process the
XHTML into HTML before feeding it to WebKit or set WebKit to parse the
file as XHTML instead of HTML.

-- 
binki

Don’t forget to check for missing or extraneous apostrophes!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: Digital signature
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20190318/969e2e7e/attachment.sig>