3.3.5 Character encoding

What is a character set?

A character set is a collection of characters that a computer can recognise and use. Each character is assigned a unique code.

7-bit ASCII

The “American Standard Code for Information Interchange”

ASCII has 128 characters (0-127), using 7 bits for each character. It was primarily used for English characters and control codes, but is not very common anymore since the introduction of unicode.

You do not need to know the whole table. Just remember the values for uppercase and lowercase A, and if there’s a question, you can figure out the rest. The capital form of a lowercase letter is the lowercase letter’s character number in decimal minus 32 (97 ‘a’ - 32 = 65 ‘A’)

Untitled

8-bit Extended ASCII

Extended ASCII is a version that supports representation of 256 different characters. This is because extended ASCII uses 8 bits to represent a character as opposed to 7 in standard ASCII.

Extended ASCII inherits normal ASCII’s first 128 values, then adds on 128 more, as below.

Untitled

Unicode

A universal character encoding standard, with over 144,000 characters (compared to ASCII’s 128 and Extended ASCII’s 256)

Why? To support characters from all languages (like Japanese, French and so on), including special symbols and emojis.

Unicode was formed to ensure consistent encoding across different systems, after people in places like Japan, when computers got there, had to make their own confusing character sets for Japanese characters, which was difficult to maintain in global communication.

Advantage over ASCII: Can represent a far greater range of characters.

Go look at https://unicodelookup.com/ if you want to see what the values are for any certain characters. You won’t need to know any specific Unicode characters, just remember it uses the same codes as ASCII for the first 128 characters (0-127).