Answer by Phil P for UTF-8 bit representation

UTF-8 is self-synchronising. Something examining the bytes can tell if it's at the start of a UTF-8 character, or part-way through one.

Let's say you have two characters in your scheme: 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

If the parser picks up at the second octet, it can't tell that it's not to read the second and third octets as one character. With UTF-8, the parser can tell that it's in the middle of a character and continue ahead to the start of the next one, while emitting some state to mention the corrupted symbol.

For the edit: if the top bit is clear, UTF-8 parsers know that they're looking at a character represented in one octet. If it is set, it's a multi-octet character.

It's all about error recovery and easy classification of octets.

Answer by Phil P for UTF-8 bit representation

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Windows Update / Microsoft Update の接続先 URL について

Zenhiser Solitude WAV-FANTASTiC

On VM opening : "Cannot find a valid peer process to connect to" or...

User Profile Disks: Disk has the same disk identifiers as one or more disks...

Recurring Inspection process with Batch

DPM のテープコロケーションについて

Queries Reports for SAP B1

Change mtu port in HP 1920-24g

RE: Inventory is closed for physical and financial transactions until 11/30/2012

How to assign the custom BDXXX scripts to NPCs?

Khammam Rural Mandal Sarpanch Upa-Sarpanch Mobile Numbers List Khammam...

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Bureau of Internal Revenue: Regional Offices (Directory)

Brunei achieves another medical milestone

WSUS throwing 13002, "Client computers are installing updates with a higher...

DHCP & DNS Dynamic Updates (yes another question)

Synthetic FibreChannel Port: Failed to start reserving resources with Error...

Uamuzi uliofikiwa na Simba baada ya kunyang’anywa pointi 3 na TFF

PENDING STATUS