Floating-Point Representations

Character Data

All data processed and stored by a digital computer must be represented as numerical values in binary, including character data. A processor can only work with binary data, it has no concept of symbols or characters. To represent character data, an encoding scheme must be used in which each character is represented by an integer value. When displayed or printed, the numerical code is translated using the encoding scheme and the appropriate character is printed or displayed in its graphical representation.

The American Standard Code for Information Interchange or (ASCII) is used to represent character data in modern computers. The ASCII code, which was originally developed in the early 1960's, was based on earlier teleprinter encoding systems. It became a standard in 1967.

The original version, included 128 codes using 7-bits per character as shown in Table 1. The first 32 codes (0 - 31) are control characters that are used to control devices such as printers or to serve as meta data for information stored on devices such as magnetic tape. Today, several of these are used for special purposes with characters and strings and are encoded as an escape sequence (i.e. \n). The printable characters and those found on the keyboard are encoded in codes 32 - 126. You will notice that all of the digits are in sequence as are the upper-case letters and the lower-case letters. This allows us to easily test a character for inclusion within a range of integer values.

	0	1	2	3	4	5	6	7	8	9
0	NUL	SCH	STX	ETX	EOT	ENQ	ACK	BEL	BS	TAB
1	LF	VT	FF	CR	SO	SI	DLE	DC1	DC2	DC3
2	DC4	NAK	SYN	ETB	CAN	EM	SUB	ESC	FS	GS
3	RS	US	sp	!	"	#	$	%	&	'
4	(	)	*	+	,	-	.	/	0	1
5	2	3	4	5	6	7	8	9	:	;
6	<	=	>	?	@	A	B	C	D	E
7	F	G	H	I	J	K	L	M	N	O
8	P	Q	R	S	T	U	V	W	X	Y
9	Z	[	\	]	^	_	`	a	b	c
10	d	e	f	g	h	i	j	k	l	m
11	n	o	p	q	r	s	t	u	v	w
12	x	y	z	{	\|	}	~	DEL

**Abbreviations**
NUL	Null (`\0`)	VT	Vertical tab (`\v`)	SYN	Synchronous idle
SOH	Start of heading	FF	Form feed (`\f`)	ETB	End of transmission block
STX	Start of text	CR	Carriage return (`\r`)	CAN	Cancel
ETX	End of text	SO	Shift out	EM	End of medium
EOT	End of transmission	SI	Shift in	SUB	Substitute
ENQ	Enquiry	DEL	Data link escape	ESC	Escape (`\e`)
ACK	Acknowledge	DC1	Device control 1	FS	File separator
BEL	Beep (`\a`)	DC2	Device control 2	GS	Group separator
BS	Backspace (`\b`)	DC3	Device control 3	RS	Record separator
HT	Horizontal tab (`\t`)	DC4	Device control 4	US	Unit separator
LF	Line feed (`\n`)	NAK	Negative acknowledge	DEL	Delete/idle

Table 1. The ASCII Code Chart

The ASCII code was extended in 1986 to a full 8-bits per character, which added codes for mathematical symbols, foreign-language characters, and basic box drawing shapes such as

║

├

╚

The ASCII code was built around the Latin alphabet, which limits its use for representing the non-Latin alphabets used by the majority of the world's population. In 1988, a consortium of hardware and software manufacturers began developing a uniform encoding scheme called Unicode that is capable of encoding text in essentially all written languages of the world. Unicode is a 16-bit alphabet that is backwards compatible with ASCII and the Latin character set. Today, it defines over 100,000 characters. Most programming languages and assembly languages use ASCII code as the default character representation, though most provide some type of special notation for specifying and working with Unicode.

Hexadecimal Representation

Chapter 2. Data Representation

Floating-Point Representations