Utf-8

(UCS transformation format 8) преобразование UCS, формат 8; код UTF-8 . Для пересылки по сети, каждый 16-битовый символ Unicode переводится в 1-, 2- или 3-байтовую последовательность. Поскольку для кодирования символов ASCII используются коды от 00 до 7Аh, то в UTF-8 они кодируются одним байтом и, таким образом, UTF-8 обратно совместим с ASCII. Значения Unicode от 80 до 7FFh кодируются в UTF-8 двумя байтами, а от 800h - тремя байтами Смотри также: wide character

Utf-8

(UCS transformation format 8) An ASCII-compatible multibyte Unicode and UCS encoding, used by Java and Plan 9. The Unicode character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of 16-bit words. Such strings can contain bytes like '0' or '/' which have a special meaning in filenames and other C library function parameters. In addition, the majority of Unix tools expects ASCII files and can't read 16-bit words as characters without major modifications. For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc. The ISO 10646 Universal Character Set (UCS), a superset of Unicode, occupies a 31-bit code space and the obvious UCS-4 encoding for it (a sequence of 32-bit words) has the same problems. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching etc. work as expected. UTF-8 is defined in RFC 2279.