All data processed and stored by a digital computer must be represented as numerical values in binary, including character data. A processor can only work with binary data, it has no concept of symbols or characters. To represent character data, an encoding scheme must be used in which each character is represented by an integer value. When displayed or printed, the numerical code is translated using the encoding scheme and the appropriate character is printed or displayed in its graphical representation.
The American Standard Code for Information Interchange or (ASCII) is used to represent character data in modern computers. The ASCII code, which was originally developed in the early 1960's, was based on earlier teleprinter encoding systems. It became a standard in 1967.
The original version, included 128 codes using 7-bits per character as shown in Table 1. The first 32 codes (0 - 31) are control characters that are used to control devices such as printers or to serve as meta data for information stored on devices such as magnetic tape. Today, several of these are used for special purposes with characters and strings and are encoded as an escape sequence (i.e. \n
). The printable characters and those found on the keyboard are encoded in codes 32 - 126. You will notice that all of the digits are in sequence as are the upper-case letters and the lower-case letters. This allows us to easily test a character for inclusion within a range of integer values.
| 0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
|
0
| NUL
| SCH
| STX
| ETX
| EOT
| ENQ
| ACK
| BEL
| BS
| TAB
|
1
| LF
| VT
| FF
| CR
| SO
| SI
| DLE
| DC1
| DC2
| DC3
|
2
| DC4
| NAK
| SYN
| ETB
| CAN
| EM
| SUB
| ESC
| FS
| GS
|
3
| RS
| US
| sp
| !
| "
| #
| $
| %
| &
| '
|
4
| (
| )
| *
| +
| ,
| -
| .
| /
| 0
| 1
|
5
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
| :
| ;
|
6
| <
| =
| >
| ?
| @
| A
| B
| C
| D
| E
|
7
| F
| G
| H
| I
| J
| K
| L
| M
| N
| O
|
8
| P
| Q
| R
| S
| T
| U
| V
| W
| X
| Y
|
9
| Z
| [
| \
| ]
| ^
| _
| `
| a
| b
| c
|
10
| d
| e
| f
| g
| h
| i
| j
| k
| l
| m
|
11
| n
| o
| p
| q
| r
| s
| t
| u
| v
| w
|
12
| x
| y
| z
| {
| |
| }
| ~
| DEL
|
|
|
Abbreviations
NUL
| Null (\0 )
|
| VT
| Vertical tab (\v )
|
| SYN
| Synchronous idle
|
SOH
| Start of heading
| FF
| Form feed (\f )
| ETB
| End of transmission block
|
STX
| Start of text
| CR
| Carriage return (\r )
| CAN
| Cancel
|
ETX
| End of text
| SO
| Shift out
| EM
| End of medium
|
EOT
| End of transmission
| SI
| Shift in
| SUB
| Substitute
|
ENQ
| Enquiry
| DEL
| Data link escape
| ESC
| Escape (\e )
|
ACK
| Acknowledge
| DC1
| Device control 1
| FS
| File separator
|
BEL
| Beep (\a )
| DC2
| Device control 2
| GS
| Group separator
|
BS
| Backspace (\b )
| DC3
| Device control 3
| RS
| Record separator
|
HT
| Horizontal tab (\t )
| DC4
| Device control 4
| US
| Unit separator
|
LF
| Line feed (\n )
| NAK
| Negative acknowledge
| DEL
| Delete/idle
|
Table 1. The ASCII Code Chart
The ASCII code was extended in 1986 to a full 8-bits per character, which added codes for mathematical symbols, foreign-language characters, and basic box drawing shapes such as
¼
| ½
| ¶
| ¬
| £
| ©
| Â
| Æ
| Ë
| ×
| Ø
| ║
| ├
| ╚
|
The ASCII code was built around the Latin alphabet, which limits its use for representing the non-Latin alphabets used by the majority of the world's population. In 1988, a consortium of hardware and software manufacturers began developing a uniform encoding scheme called Unicode that is capable of encoding text in essentially all written languages of the world. Unicode is a 16-bit alphabet that is backwards compatible with ASCII and the Latin character set. Today, it defines over 100,000 characters. Most programming languages and assembly languages use ASCII code as the default character representation, though most provide some type of special notation for specifying and working with Unicode.