Books and References © Rance Necaise
Hexadecimal RepresentationChapter 2. Data RepresentationFloating-Point Representations

Character Data

All data processed and stored by a digital computer must be represented as numerical values in binary, including character data. A processor can only work with binary data, it has no concept of symbols or characters. To represent character data, an encoding scheme must be used in which each character is represented by an integer value. When displayed or printed, the numerical code is translated using the encoding scheme and the appropriate character is printed or displayed in its graphical representation.

The American Standard Code for Information Interchange or (ASCII) is used to represent character data in modern computers. The ASCII code, which was originally developed in the early 1960's, was based on earlier teleprinter encoding systems. It became a standard in 1967.

The original version, included 128 codes using 7-bits per character as shown in Table 1. The first 32 codes (0 - 31) are control characters that are used to control devices such as printers or to serve as meta data for information stored on devices such as magnetic tape. Today, several of these are used for special purposes with characters and strings and are encoded as an escape sequence (i.e. \n). The printable characters and those found on the keyboard are encoded in codes 32 - 126. You will notice that all of the digits are in sequence as are the upper-case letters and the lower-case letters. This allows us to easily test a character for inclusion within a range of integer values.

0 1 2 3 4 5 6 7 8 9
0 NUL SCH STX ETX EOT ENQ ACK BEL BS TAB
1 LF VT FF CR SO SI DLE DC1 DC2 DC3
2 DC4 NAK SYN ETB CAN EM SUB ESC FS GS
3 RS US sp ! " # $ % & '
4 ( ) * + , - . / 0 1
5 2 3 4 5 6 7 8 9 : ;
6 < = > ? @ A B C D E
7 F G H I J K L M N O
8 P Q R S T U V W X Y
9 Z [ \ ] ^ _ ` a b c
10 d e f g h i j k l m
11 n o p q r s t u v w
12 x y z { | } ~ DEL
Abbreviations
NUL Null (\0)   VT Vertical tab (\v)   SYN Synchronous idle
SOH Start of heading FF Form feed (\f) ETB End of transmission block
STX Start of text CR Carriage return (\r) CAN Cancel
ETX End of text SO Shift out EM End of medium
EOT End of transmission SI Shift in SUB Substitute
ENQ Enquiry DEL Data link escape ESC Escape (\e)
ACK Acknowledge DC1 Device control 1 FS File separator
BEL Beep (\a) DC2 Device control 2 GS Group separator
BS Backspace (\b) DC3 Device control 3 RS Record separator
HT Horizontal tab (\t) DC4 Device control 4 US Unit separator
LF Line feed (\n) NAK Negative acknowledge DEL Delete/idle
Table 1. The ASCII Code Chart

The ASCII code was extended in 1986 to a full 8-bits per character, which added codes for mathematical symbols, foreign-language characters, and basic box drawing shapes such as

¼ ½ ¬ £ © Â Æ Ë × Ø

The ASCII code was built around the Latin alphabet, which limits its use for representing the non-Latin alphabets used by the majority of the world's population. In 1988, a consortium of hardware and software manufacturers began developing a uniform encoding scheme called Unicode that is capable of encoding text in essentially all written languages of the world. Unicode is a 16-bit alphabet that is backwards compatible with ASCII and the Latin character set. Today, it defines over 100,000 characters. Most programming languages and assembly languages use ASCII code as the default character representation, though most provide some type of special notation for specifying and working with Unicode.

Hexadecimal RepresentationChapter 2. Data RepresentationFloating-Point Representations
Page last modified on September 05, 2021, at 12:04 PM