HTML5 new Doctype and Charset
The HTML syntax of HTML5 requires a doctype to be specified to ensure that the browser renders the page in standards mode. The doctype has no other purpose and is therefore optional for XML. Documents with an XML media type are always handled in standards mode. [DOCTYPE]
The doctype declaration is
and is case-insensitive in the HTML syntax. Doctypes from earlier versions of HTML were longer because the HTML language was SGML-based and therefore required a reference to a DTD. With HTML5 this is no longer the case and the doctype is only needed to enable standards mode for documents written using the HTML syntax. Browsers already do this for
For the HTML syntax of HTML5, authors have three means of setting the character encoding:
At the transport level. By using the HTTP
Content-Typeheader for instance.
- Using a Unicode Byte Order Mark (BOM) character at the start of the file. This character provides a signature for the encoding used.
metaelement with a
charsetattribute that specifies the encoding within the first 1024 bytes of the document. For instance,
could be used to specify the UTF-8 encoding. This replaces the need for
although that syntax is still allowed.
For the XML syntax, authors have to use the rules as set forth in the XML specifications to set the character encoding.