HTML Encoding (Character Sets)
To display an HTML page correctly, a web browser must know which character set to use.
The HTML charset Attribute
The character set is specified in the <meta>
tag:
Example
<meta charset="UTF-8">
The HTML5 specification encourages web developers to use the UTF-8 character set.
UTF-8 covers almost all of the characters and symbols in the world!
data:image/s3,"s3://crabby-images/e1faf/e1faf7e1ee0baf41e257cd32659214b8a10771e5" alt="Unicode Web growth"
The ASCII Character Set
ASCII was the first character encoding standard for the web. It defined 128 different characters that could be used on the internet:
- English letters (A-Z)
- Numbers (0-9)
- Special characters like ! $ + - ( ) @ < >.
The ANSI Character Set
ANSI (Windows-1252) was the original Windows character set:
- Identical to ASCII for the first 127 characters
- Special characters from 128 to 159
- Identical to UTF-8 from 160 to 255
<meta charset="Windows-1252">
The ISO-8859-1 Character Set
ISO-8859-1 was the default character set for HTML 4. This character set supported 256 different character codes. HTML 4 also supported UTF-8.
- Identical to ASCII for the first 127 characters
- Does not use the characters from 128 to 159
- Identical to ANSI and UTF-8 from 160 to 255
HTML 4 Example
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
HTML 5 Example
<meta charset="ISO-8859-1">
The UTF-8 Character Set
- is identical to ASCII for the values from 0 to 127
- Does not use the characters from 128 to 159
- Identical to ANSI and 8859-1 from 160 to 255
- Continues from the value 256 to 10 000 characters
<meta charset="UTF-8">
Full HTML Character Set Reference.
Differences Between Character Sets
The following table displays the differences between the character sets described above:
Numb | ASCII | ANSI | 8859 | UTF‑8 | Description |
---|---|---|---|---|---|
32 | space | ||||
33 | ! | ! | ! | ! | exclamation mark |
34 | " | " | " | " | quotation mark |
35 | # | # | # | # | number sign |
36 | $ | $ | $ | $ | dollar sign |
37 | % | % | % | % | percent sign |
38 | & | & | & | & | ampersand |
39 | ' | ' | ' | ' | apostrophe |
40 | ( | ( | ( | ( | left parenthesis |
41 | ) | ) | ) | ) | right parenthesis |
42 | * | * | * | * | asterisk |
43 | + | + | + | + | plus sign |
44 | , | , | , | , | comma |
45 | - | - | - | - | hyphen-minus |
46 | . | . | . | . | full stop |
47 | / | / | / | / | solidus |
48 | 0 | 0 | 0 | 0 | digit zero |
49 | 1 | 1 | 1 | 1 | digit one |
50 | 2 | 2 | 2 | 2 | digit two |
51 | 3 | 3 | 3 | 3 | digit three |
52 | 4 | 4 | 4 | 4 | digit four |
53 | 5 | 5 | 5 | 5 | digit five |
54 | 6 | 6 | 6 | 6 | digit six |
55 | 7 | 7 | 7 | 7 | digit seven |
56 | 8 | 8 | 8 | 8 | digit eight |
57 | 9 | 9 | 9 | 9 | digit nine |
58 | : | : | : | : | colon |
59 | ; | ; | ; | ; | semicolon |
60 | < | < | < | < | less than |
61 | = | = | = | = | equals sign |
62 | > | > | > | > | greater than |
63 | ? | ? | ? | ? | question mark |
64 | @ | @ | @ | @ | commercial at |
65 | A | A | A | A | Latin A |
66 | B | B | B | B | Latin B |
67 | C | C | C | C | Latin C |
68 | D | D | D | D | Latin D |
69 | E | E | E | E | Latin E |
70 | F | F | F | F | Latin F |
71 | G | G | G | G | Latin G |
72 | H | H | H | H | Latin H |
73 | I | I | I | I | Latin I |
74 | J | J | J | J | Latin J |
75 | K | K | K | K | Latin K |
76 | L | L | L | L | Latin L |
77 | M | M | M | M | Latin M |
78 | N | N | N | N | Latin N |
79 | O | O | O | O | Latin O |
80 | P | P | P | P | Latin P |
81 | Q | Q | Q | Q | Latin Q |
82 | R | R | R | R | Latin R |
83 | S | S | S | S | Latin S |
84 | T | T | T | T | Latin T |
85 | U | U | U | U | Latin U |
86 | V | V | V | V | Latin V |
87 | W | W | W | W | Latin W |
88 | X | X | X | X | Latin X |
89 | Y | Y | Y | Y | Latin Y |
90 | Z | Z | Z | Z | Latin Z |
91 | [ | [ | [ | [ | left square bracket |
92 | \ | \ | \ | \ | reverse solidus |
93 | ] | ] | ] | ] | right square bracket |
94 | ^ | ^ | ^ | ^ | circumflex accent |
95 | _ | _ | _ | _ | low line |
96 | ` | ` | ` | ` | grave accent |
97 | a | a | a | a | Latin small a |
98 | b | b | b | b | Latin small b |
99 | c | c | c | c | Latin small c |
100 | d | d | d | d | Latin small d |
101 | e | e | e | e | Latin small e |
102 | f | f | f | f | Latin small f |
103 | g | g | g | g | Latin small g |
104 | h | h | h | h | Latin small h |
105 | i | i | i | i | Latin small i |
106 | j | j | j | j | Latin small j |
107 | k | k | k | k | Latin small k |
108 | l | l | l | l | Latin small l |
109 | m | m | m | m | Latin small m |
110 | n | n | n | n | Latin small n |
111 | o | o | o | o | Latin small o |
112 | p | p | p | p | Latin small p |
113 | q | q | q | q | Latin small q |
114 | r | r | r | r | Latin small r |
115 | s | s | s | s | Latin small s |
116 | t | t | t | t | Latin small t |
117 | u | u | u | u | Latin small u |
118 | v | v | v | v | Latin small v |
119 | w | w | w | w | Latin small w |
120 | x | x | x | x | Latin small x |
121 | y | y | y | y | Latin small y |
122 | z | z | z | z | Latin small z |
123 | { | { | { | { | left curly bracket |
124 | | | | | | | | | vertical line |
125 | } | } | } | } | right curly bracket |
126 | ~ | ~ | ~ | ~ | tilde |
127 | DEL | ||||
128 | | euro sign | |||
129 | | | | NOT USED | |
130 | | single low-9 quotation mark | |||
131 | | Latin small f with hook | |||
132 | | double low-9 quotation mark | |||
133 | horizontal ellipsis | ||||
134 | | dagger | |||
135 | | double dagger | |||
136 | | modifier letter circumflex accent | |||
137 | | per mille sign | |||
138 | | Latin S with caron | |||
139 | | single left-pointing angle quotation mark | |||
140 | | Latin capital ligature OE | |||
141 | | | | NOT USED | |
142 | | Latin Z with caron | |||
143 | | | | NOT USED | |
144 | | | | NOT USED | |
145 | | left single quotation mark | |||
146 | | right single quotation mark | |||
147 | | left double quotation mark | |||
148 | | right double quotation mark | |||
149 | | bullet | |||
150 | | en dash | |||
151 | | em dash | |||
152 | | small tilde | |||
153 | | trade mark sign | |||
154 | | Latin small s with caron | |||
155 | | single right-pointing angle quotation mark | |||
156 | | Latin small ligature oe | |||
157 | | | | NOT USED | |
158 | | Latin small z with caron | |||
159 | | Latin Y with diaeresis | |||
160 | no-break space | ||||
161 | ¡ | ¡ | ¡ | inverted exclamation mark | |
162 | ¢ | ¢ | ¢ | cent sign | |
163 | £ | £ | £ | pound sign | |
164 | ¤ | ¤ | ¤ | currency sign | |
165 | ¥ | ¥ | ¥ | yen sign | |
166 | ¦ | ¦ | ¦ | broken bar | |
167 | § | § | § | section sign | |
168 | ¨ | ¨ | ¨ | diaeresis | |
169 | © | © | © | copyright sign | |
170 | ª | ª | ª | feminine ordinal indicator | |
171 | « | « | « | left-pointing double angle quotation mark | |
172 | ¬ | ¬ | ¬ | not sign | |
173 | | | | soft hyphen | |
174 | ® | ® | ® | registered sign | |
175 | ¯ | ¯ | ¯ | macron | |
176 | ° | ° | ° | degree sign | |
177 | ± | ± | ± | plus-minus sign | |
178 | ² | ² | ² | superscript two | |
179 | ³ | ³ | ³ | superscript three | |
180 | ´ | ´ | ´ | acute accent | |
181 | µ | µ | µ | micro sign | |
182 | ¶ | ¶ | ¶ | pilcrow sign | |
183 | · | · | · | middle dot | |
184 | ¸ | ¸ | ¸ | cedilla | |
185 | ¹ | ¹ | ¹ | superscript one | |
186 | º | º | º | masculine ordinal indicator | |
187 | » | » | » | right-pointing double angle quotation mark | |
188 | ¼ | ¼ | ¼ | vulgar fraction one quarter | |
189 | ½ | ½ | ½ | vulgar fraction one half | |
190 | ¾ | ¾ | ¾ | vulgar fraction three quarters | |
191 | ¿ | ¿ | ¿ | inverted question mark | |
192 | À | À | À | Latin A with grave | |
193 | Á | Á | Á | Latin A with acute | |
194 | Â | Â | Â | Latin A with circumflex | |
195 | Ã | Ã | Ã | Latin A with tilde | |
196 | Ä | Ä | Ä | Latin A with diaeresis | |
197 | Å | Å | Å | Latin A with ring above | |
198 | Æ | Æ | Æ | Latin AE | |
199 | Ç | Ç | Ç | Latin C with cedilla | |
200 | È | È | È | Latin E with grave | |
201 | É | É | É | Latin E with acute | |
202 | Ê | Ê | Ê | Latin E with circumflex | |
203 | Ë | Ë | Ë | Latin E with diaeresis | |
204 | Ì | Ì | Ì | Latin I with grave | |
205 | Í | Í | Í | Latin I with acute | |
206 | Î | Î | Î | Latin I with circumflex | |
207 | Ï | Ï | Ï | Latin I with diaeresis | |
208 | Ð | Ð | Ð | Latin Eth | |
209 | Ñ | Ñ | Ñ | Latin N with tilde | |
210 | Ò | Ò | Ò | Latin O with grave | |
211 | Ó | Ó | Ó | Latin O with acute | |
212 | Ô | Ô | Ô | Latin O with circumflex | |
213 | Õ | Õ | Õ | Latin O with tilde | |
214 | Ö | Ö | Ö | Latin O with diaeresis | |
215 | × | × | × | multiplication sign | |
216 | Ø | Ø | Ø | Latin O with stroke | |
217 | Ù | Ù | Ù | Latin U with grave | |
218 | Ú | Ú | Ú | Latin U with acute | |
219 | Û | Û | Û | Latin U with circumflex | |
220 | Ü | Ü | Ü | Latin U with diaeresis | |
221 | Ý | Ý | Ý | Latin Y with acute | |
222 | Þ | Þ | Þ | Latin Thorn | |
223 | ß | ß | ß | Latin small sharp s | |
224 | à | à | à | Latin small a with grave | |
225 | á | á | á | Latin small a with acute | |
226 | â | â | â | Latin small a with circumflex | |
227 | ã | ã | ã | Latin small a with tilde | |
228 | ä | ä | ä | Latin small a with diaeresis | |
229 | å | å | å | Latin small a with ring above | |
230 | æ | æ | æ | Latin small ae | |
231 | ç | ç | ç | Latin small c with cedilla | |
232 | è | è | è | Latin small e with grave | |
233 | é | é | é | Latin small e with acute | |
234 | ê | ê | ê | Latin small e with circumflex | |
235 | ë | ë | ë | Latin small e with diaeresis | |
236 | ì | ì | ì | Latin small i with grave | |
237 | í | í | í | Latin small i with acute | |
238 | î | î | î | Latin small i with circumflex | |
239 | ï | ï | ï | Latin small i with diaeresis | |
240 | ð | ð | ð | Latin small eth | |
241 | ñ | ñ | ñ | Latin small n with tilde | |
242 | ò | ò | ò | Latin small o with grave | |
243 | ó | ó | ó | Latin small o with acute | |
244 | ô | ô | ô | Latin small o with circumflex | |
245 | õ | õ | õ | Latin small o with tilde | |
246 | ö | ö | ö | Latin small o with diaeresis | |
247 | ÷ | ÷ | ÷ | division sign | |
248 | ø | ø | ø | Latin small o with stroke | |
249 | ù | ù | ù | Latin small u with grave | |
250 | ú | ú | ú | Latin small u with acute | |
251 | û | û | û | Latin small with circumflex | |
252 | ü | ü | ü | Latin small u with diaeresis | |
253 | ý | ý | ý | Latin small y with acute | |
254 | þ | þ | þ | Latin small thorn | |
255 | ÿ | ÿ | ÿ | Latin small y with diaeresis |