0%

utf-8

U u

Transcription

    • US Pronunciation
    • US IPA
    • US Pronunciation
    • US IPA

Definition of utf-8 word

  • noun Technical meaning of utf-8 (character)   (UCS transformation format 8) An ASCII-compatible multibyte Unicode and UCS encoding, used by Java and Plan 9. The Unicode character set occupies a 16-bit code space. The most obvious Unicode encoding (known as UCS-2) consists of a sequence of 16-bit words. Such strings can contain bytes like '\0' or '/' which have a special meaning in filenames and other C library function parameters. In addition, the majority of Unix tools expects ASCII files and can't read 16-bit words as characters without major modifications. For these reasons, UCS-2 is not a suitable external encoding of Unicode in filenames, text files, environment variables, etc. The ISO 10646 Universal Character Set (UCS), a superset of Unicode, occupies a 31-bit code space and the obvious UCS-4 encoding for it (a sequence of 32-bit words) has the same problems. The UTF-8 encoding of Unicode and UCS avoids the problems of fixed-length Unicode encodings because an ASCII file encoded in UTF is exactly same as the original ASCII file and all non-ASCII characters are guaranteed to have the most significant bit set (bit 0x80). This means that normal tools for text searching etc. work as expected. UTF-8 is defined in RFC 2279. 1

Information block about the term

Parts of speech for Utf-8

noun
adjective
verb
adverb
pronoun
preposition
conjunction
determiner
exclamation

See also

Matching words

Was this page helpful?
Yes No
Thank you for your feedback! Tell your friends about this page
Tell us why?