UTF-16是ASCII的超集吗?如果是,根据HTML标准,UTF-16为什么与ASCII不兼容?

According to the Wikipedia article on UTF-16, "...[UTF-16] is also the only web-encoding incompatible with ASCII." (at the end of the abstract.) This statement refers to the HTML Standard. Is this a wrong statement?

我主要是C#/ .NET开发人员,.NET以及.NET Core在内部使用UTF-16表示字符串。我可以肯定UTF-16是ASCII的超集,因为我可以轻松编写显示所有ASCII字符的代码:

public static void Main()
{
    for (byte currentAsciiCharacter = 0; currentAsciiCharacter < 128; currentAsciiCharacter++)
    {
        Console.WriteLine($"ASCII character {currentAsciiCharacter}: \"{(char) currentAsciiCharacter}\"");
    }
}

Sure, the control characters will mess up the console output, but I think my statement is clear: the lower 7 bits of a 16 bit char take the corresponding ASCII code point, while the upper 9 bits are zero. Thus UTF-16 should be a superset of ASCII in .NET.

I tried to find out why the HTML Standard says that UTF-16 is incompatible to ASCII, but it seems like they simply define it that way:

ASCII兼容编码是不是UTF-16编码的任何编码。

我找不到任何解释为什么UTF-16在其规格中不兼容。

我的详细问题是:

  1. UTF-16实际上兼容ASCII吗?还是我想念这里的东西?
  2. 如果兼容,为什么HTML标准说它不兼容?也许是因为字节顺序?
评论
xeos
xeos

ASCII是7位编码,并存储在一个字节中。 UTF-16使用2个字节的块(ord),这使其立即不兼容。 UTF-8使用一个字节的块,并且用于拉丁字母与ASCII匹配。 IOW,UTF-8设计为向后兼容ASCII编码。

点赞
评论