1.3 Data types and variables
1.3.1 Symbols and keywords
From code listing , we see examples of C syntax. The following is a list of some basic symbols.
Basic symbols
Symbol | Description |
---|---|
// |
line comment, compiler will ignore the line |
/* ... */ |
block comment, compiler will ignore what in between |
# |
start of preprocessor element |
; |
statement terminator |
, |
list separator |
() |
parenthesis of function parameter/argument list and algebraic expressions |
{} |
scope of a program block |
A program block consists of a sequence of statements. A statement can be a declaration, an assignment, a function call, a selection, or a repetition statement.
Keywords (reserved words)
C has 32 keywords. We put them into five categories:
Category | Keyword |
---|---|
Basic data types | char, int, float, double, short, long, signed, unsigned, void |
Define data types | typedef, struct, union, enum |
Modifiers | const, auto, static, extern, volatile, register |
Flow control | if, else, switch, case, default, goto, for, while, do, break, continue |
Function | return, sizeof |
1.3.2 Data types
There are four basic and build-up concepts on data representation in programming: data type -> variable -> data structure -> algorithm. Any programming language has to address these concepts. We will study these concepts in C. First, we look into data type.
What is a data type?
A data type (or simply type) defines
- how a certain type of data values is represented in programs,
- the number of bytes used to represent data values and how a data value is represented in binary format and stored in memory, and
- what operations are applied to the data values and how data values are operated in the operations.
How a data type is used?
Programmers use data type to inform the compiler about what type of data is used, stored and how much space it requires in the memory, and what operations can be applied to the data values. The compiler then generates instructions to allocate memory blocks for the data. At runtime, the instructions instantiate the data memory blocks and write/read data value to/from the memory blocks.
Data types in C
C provides a set of basic (also called primitive, primary, or fundamental) data types, specified by keywords: char
for character type, int
for integer type, float
for single precision floating number type, and double
for double precision floating number type. Arithmetic operations can be applied to the basic data types. Some of these operations are supported by processor instructions.
C use keywords short
, long
, signed
and unsigned
as modifiers for int
type, and unsigned
as a modifier for char
type, to represent more basic data types. For example long int
represents a long integer type. The keyword void
is used to indicate any data type.
C provides methods to define extended (also referred to as non-primitive, secondary, or derived) data types using keywords typedef, struct, union, enum
together with pointers and arrays. A hierarchy of extended data types can be built bottom up starting from the basic data types. We will learn the methods of constructing the extended data types in Lesson 3.
Data type size
Each data type has a size
, i.e. the number bytes needed to store a data value of the type in memory. Each addressable memory cell has 1 byte consisting of 8 bits. A bit is the unit of data storage, a byte (8 bits) is the unit of addressable data storage, i.e., the size of a memory cell. A word is the unit of transferring data between CPU registers and memory, i.e., the size of a register.
A data object of a certain type consists of one or more bytes in a region of data storage in the execution environment. The contents of data objects represent the data values of the type.
The keyword sizeof
is used to get the size of a data type. For example, sizeof(char)
gives 1, meaning that the char
type uses 1 byte, and sizeof(float)
gives 4, meaning that the float
type uses 4 bytes.
Because a data type has a size, the number of different values it represents is limited. For example, the single character type char
has 1 byte (8 bits), it can present at most 28 different values. So a data type always has a range of data values it represents.
Memory block
When a data type has more than one bytes, it needs contiguous memory cells to store the value of the type. For example, float
has 4 bytes, it uses 4 memory cells with continuous addresses. A group of contiguous memory cells is referred to as a memory block. A memory block is usually defined by the lowest address of its memory cells and the number of cells (number of bytes).
Basic data types in C
Table 3 shows the commonly used basic data types and their sizes and ranges. Note that the size of int
is system/platform dependent. It has size 2 for 16-bit systems, or size 4 for 32-bit and 64-bit systems. In this course, we will use 4 as the default size for int
type.**
Data type /Keyword | Size in bytes | Range |
---|---|---|
chat | 1 | -128 to 127 |
unsigned char | 1 | 0 to 255 |
int | 4 | -231+1 to 231-1 |
unsigned int | 4 | 0 to 232-1 |
float | 4 | 3.4E-38 to 3.4E+38 |
double | 8 | 1.7E-308 to 1.7E+308 |
Let’s look into the char
and int
types.
char type
The char
type has 1 byte, it represents integers from -127 to 127. C uses ASCII (American Standard Code for Information Interchange) (supplementary link) to map the integers from 0 to 127 to characters. The following is the ASCII table, in which each row represents the code of a character in Dec (decimal), Oct (octal), Hex (hexadecimal), Bin (binary), character, and description.
Dec Oct Hex Bin Char Description 0 000 00 00000000 NUL "null" character 1 001 01 00000001 SOH start of header 2 002 02 00000010 STX start of text 3 003 03 00000011 ETX end of text 4 004 04 00000100 EOT end of transmission 5 005 05 00000101 ENQ enquiry 6 006 06 00000110 ACK acknowledgment 7 007 07 00000111 BEL bell 8 010 08 00001000 BS backspace 9 011 09 00001001 HT horizontal tab 10 012 0A 00001010 LF line feed 11 013 0B 00001011 VT vertical tab 12 014 0C 00001100 FF form feed 13 015 0D 00001101 CR carriage return 14 016 0E 00001110 SO shift out 15 017 0F 00001111 SI shift in 16 020 10 00010000 DLE data link escape 17 021 11 00010001 DC1 device control 1 (XON) 18 022 12 00010010 DC2 device control 2 19 023 13 00010011 DC3 device control 3 (XOFF) 20 024 14 00010100 DC4 device control 4 21 025 15 00010101 NAK negative acknowledgement 22 026 16 00010110 SYN synchronous idle 23 027 17 00010111 ETB end of transmission block 24 030 18 00011000 CAN cancel 25 031 19 00011001 EM end of medium 26 032 1A 00011010 SUB substitute 27 033 1B 00011011 ESC escape 28 034 1C 00011100 FS file separator 29 035 1D 00011101 GS group separator 30 036 1E 00011110 RS request to send/record separator 31 037 1F 00011111 US unit separator 32 040 20 00100000 SP space 33 041 21 00100001 ! exclamation mark 34 042 22 00100010 " double quote 35 043 23 00100011 # number sign 36 044 24 00100100 $ dollar sign 37 045 25 00100101 % percent 38 046 26 00100110 & ampersand 39 047 27 00100111 ' single quote 40 050 28 00101000 ( left/opening parenthesis 41 051 29 00101001 ) right/closing parenthesis 42 052 2A 00101010 * asterisk 43 053 2B 00101011 + plus 44 054 2C 00101100 , comma 45 055 2D 00101101 - minus or dash 46 056 2E 00101110 . dot 47 057 2F 00101111 / forward slash 48 060 30 00110000 0 49 061 31 00110001 1 50 062 32 00110010 2 51 063 33 00110011 3 52 064 34 00110100 4 53 065 35 00110101 5 54 066 36 00110110 6 55 067 37 00110111 7 56 070 38 00111000 8 57 071 39 00111001 9 58 072 3A 00111010 : colon 59 073 3B 00111011 ; semi-colon 60 074 3C 00111100 < less than 61 075 3D 00111101 = equal sign 62 076 3E 00111110 > greater than 63 077 3F 00111111 ? question mark 64 100 40 01000000 @ "at" symbol 65 101 41 01000001 A 66 102 42 01000010 B 67 103 43 01000011 C 68 104 44 01000100 D 69 105 45 01000101 E 70 106 46 01000110 F 71 107 47 01000111 G 72 110 48 01001000 H 73 111 49 01001001 I 74 112 4A 01001010 J 75 113 4B 01001011 K 76 114 4C 01001100 L 77 115 4D 01001101 M 78 116 4E 01001110 N 79 117 4F 01001111 O 80 120 50 01010000 P 81 121 51 01010001 Q 82 122 52 01010010 R 83 123 53 01010011 S 84 124 54 01010100 T 85 125 55 01010101 U 86 126 56 01010110 V 87 127 57 01010111 W 88 130 58 01011000 X 89 131 59 01011001 Y 90 132 5A 01011010 Z 91 133 5B 01011011 [ left/opening bracket 92 134 5C 01011100 \ back slash 93 135 5D 01011101 ] right/closing bracket 94 136 5E 01011110 ^ caret/circumflex 95 137 5F 01011111 _ underscore 96 140 60 01100000 ` 97 141 61 01100001 a 98 142 62 01100010 b 99 143 63 01100011 c 100 144 64 01100100 d 101 145 65 01100101 e 102 146 66 01100110 f 103 147 67 01100111 g 104 150 68 01101000 h 105 151 69 01101001 i 106 152 6A 01101010 j 107 153 6B 01101011 k 108 154 6C 01101100 l 109 155 6D 01101101 m 110 156 6E 01101110 n 111 157 6F 01101111 o 112 160 70 01110000 p 113 161 71 01110001 q 114 162 72 01110010 r 115 163 73 01110011 s 116 164 74 01110100 t 117 165 75 01110101 u 118 166 76 01110110 v 119 167 77 01110111 w 120 170 78 01111000 x 121 171 79 01111001 y 122 172 7A 01111010 z 123 173 7B 01111011 { left/opening brace 124 174 7C 01111100 | vertical bar 125 175 7D 01111101 } right/closing brace 126 176 7E 01111110 ~ tilde 127 177 7F 01111111 DEL delete
For example, character A
has ASCII code value 65 in decimal (Dec), 101 in octal (Oct), 41 in hexadecimal (Hex), and 1000001 in binary (Bin).
We see that character a
has ASCII code 97. The difference of a’s and A’s ASCII code values is 32. So we can get the code of a
by adding 32 to the code of A
, i.e., 97 = 65+32. Similarly, we can get the code of A
by subtracting 32 from the code of a
, i.e., 65 = 92-32. This is how letter case conversion works.
You should know the conversions between decimal, octal, hexadecimal, and binary representations. The following example shows the conversions of decimal 65 to binary, octal and hexadecimal representations.
A
in source code programs as follows.
'A' -- character `A` 65 -- Dec 0101 -- Oct 0x41 -- Hex
The compiler will convert any of these char representations to its binary format in machine representation. For example, in C programming, statements char c = 'A';
, char c = 65;
, char c = 0101;
, and char c = 0x41;
are equivalent. Note that with the octal expression, the digits have to be 0 to 7. For example, 08 is not a valid octal expression. Similarly, the digits of hexadecimal expressions have to be 0, 1, …, 9, A (or a), B (or b), C (or c), D (or d), E (or e), F (or f). For example, 0x4Aa is a valid hexadecimal expression and 0x4AG is not. C does not support binary expression in source code programming.
int type
The int
data type is platform/system dependent. It has 2 bytes in 16-bit systems, and 4 bytes in 32-bit and 62-bit systems. We use 32-bit system as our default system, then the int data type has 4 bytes. It represents integers from -231+1 to 231-1. The first bit from left is a sign bit, 0 for positive and 1 for negative. For example, decimal integer 1890259661 is represented in 4 bytes as 01110000 10101011 00010010 11001101. Decimal integer -1890259661 is represented in 4 bytes as 11110000 10101011 00010010 11001101. Their difference is the first bit on the left side.
Little-endian and Big-endian
How do you store the 4 bytes of an int type number in a memory block of size 4? There are two ways to place the 4 bytes: little-endian and big-endian. Little-endian stores the least significant byte in the smallest address memory cell. Big-endian stores the most significant byte in the smallest address cells. The little-endian method is commonly used.
Figure 1 shows how the little-endian and big-endian arrangement of 01110000 10101011 00010010 11001101 in memory, where the smallest memory cell address of the memory block is 1000.

C supports decimal, octal and hexadecimal expressions for the int
type value representation in programming. the following example shows the conversion and the presentations of integer 1890259661.
1890259661 (decimal) C representation: 1890259661 = 01110000 10101011 00010010 11001101 (binary) = 01 110 000 101 010 110 001 001 011 001 101 (binary) = 1 6 0 5 2 6 1 1 3 1 5 (octal) C representation: 01605261131 = 0111 0000 1010 1011 0001 0010 1100 1101 (binary) = 7 0 A B 1 2 C D (hexadecimal) C representation: Ox70AB12CD
In C programming, statements int a = 1890259661;
, int a = 01605261131;
, and int a = Ox70AB12CD;
are equivalent. That is, the compiler will convert the right side number to the same binary number to store in the allocated memory block.
float and double types
C supports the floating/real numbers by two types float
and double
. The float
type has 4 bytes for single precision floating point numbers, which is defined by IEEE 754 standard (supplementaryl link)). The double
type has 8 bytes for double precision floating point numbers, defined by IEEE 754 standard (supplementary link). In this course, you are not required to know the detailed bit patterns and operations of float
and double
types.
In C programming, a floating number can be represented in either floating format or scientific format. For example, 31.4
is the floating point number format, it can also be represented in scientific format as 0.314e2
, meaning that 0.314*102. Statements float f = 31.4;
and float f = 0.314e2;
are equivalent.
1.3.3 Variables
The concepts of variables
A variables is a name (also called identifier) used in programming to represent a data object, which stores data values at runtime. A variables is allocated a memory block at compile time. A variable is instanced as an object in memory at runtime.
Specifically, a variable is a name used in source code programming, referring to a data object of a certain type. It tells the compiler to allocate a memory block for the variable, and use it to set/get values to/from its memory block. The variable’s memory block becomes instanced with an absolute location in computer memory at runtime.
C variable and type
C is a typed programming language. That means a C variable must have a data type. A variable has to be declared a type and initialized (i.e., assigned a value) before it can be used. The variable declaration tells the compiler to assign a relative memory block for the variable. A relative memory block is represented as an offset from a starting point (the scope of the variable) and size (number of bytes) of the block. The value assignment statement tells the compiler to generate instructions to write a value to the memory block. At runtime, the instructions write the value of the data at the absolute memory locations when they are executed.
For example, to use an integer value 10 in a program, we can declare an int variable and initialize it to 10 by statement int x = 10;
. This statement first declares a variable named x
and then sets its value to 10. It tells the compiler to allocate a relative memory block for x and generate instructions to write value 10 to the memory block. The compiler uses a table to remember variable name x and its relative memory block. If a later statement using x, for example printf("%d", x);
, the compiler will get the relative memory location of x from the table, and generate instruction to read the value from the memory block.
Example:
int a; // let compiler allocate 4 bytes memory for variable a
char c; // let compiler allocate 1 byte memory for variable c
float f; // let compiler allocate 4 bytes memory for variable f
a = 2; // let compiler generate instructions to store value 2 to variable a
c = 'a'; // let compiler generate instructions to store 97 (0111001) to variable c
f = 1.41; // let compiler convert 1.41 to 32 bits single precision float number and store it to f
C allows to declare and/or initialize several variables of the same type in one line separated by comma in one statement. For example, int a=1,b=2,c;
is a valid statement.
scope
A scope consists of a sequence of statements, in which identifiers (variable/function names) are declared and used. Generally, a scope consists of a sequence of statements enclosed by a pair of block symbols {}, i.e., starting from { and ending at }. Particularly, the global scope is the whole program, not enclosed in {}. Scopes can be separated, i.e., {…}…{…}, can be nested, i.e., {…{…}…}.
Each variable has a scope. A variable can only be used/accessed by statements after the variable declaration within the scope, including its nested scopes.
A local variable is a variable declared within a code block enclosed by {}, and can be used in the block after its declaration within the block. For example, variables declared in a function are local variables, which can only be used in the function.
A global variable is a variable declared not in any function, so it can be used by any function.
Compilers bind a variable with its scope. The relative memory location of a variable is relative with the beginning of a scope. Two variables in separate scopes can use the same variable name. For example, in code listing, variable a
is a global variable and accessed in the main function by assigning value 1 to a
. Variable b
in the main function is a local variable. Variable names x
and y
are local variables used in both add
and minus
functions.
Variable name convention
C has restrictions on variable naming. C variable names must start with a letter, followed by letters, underscores and numbers. Variable names are case sensitive. For program readability, C programming has two naming convention styles: underscore_style
and camelCaseStyle
. The camelCaseStyle
is also used in C++ and Java programming. The underscore_style
was used in classical C programmings. We will use the underscore_style
in course code examples.
There are different C programming styles used by C programmers, communities and organizations. For example, C Style and Coding Standards.
1.3.4 Constants
Constants are fixed data values in programs. For example, 3.1415926 is the constant Pi for computing circumference and area of circles. Constants can be directly used in source code programs. However, if a constant is used many times in source code, it is not convenient to type the constant value every time it is used. It’s better to use a simple name to represent such a constant, and replace it by an actual constant later on.
Define constants by macro
C preprocessor provides a method to define a constant by name and to replace the name by the constant in the preprocessing step.
Example:
#define PI 3.1415926 // this defines macro PI as 3.1415926, then PI can be used in statements.
float r = 4;
float cf = 2*PI*r;
float area = PI*r*r;
In the preprocessing step of compiling, every appearance of PI in the program will be replaced by 3.1415926. In the compilation step, each occurrence of 3.1415926 will be converted to the single precision representation.
Define constants by read-only variables
C provides an alternative method to define constants, known as constant variables or read-only variables. It uses keyword const in variable declarations. Such constant variables can only be declared and initiated in one statement. A constant variable’s value can not be changed by assignment later on.
Example:
const float pi = 3.1415926;
float r = 4;
float cf = 2*pi*r;
float area = pi*r*r;
In the above example, variable pi
is declared as a constant (read-only) variable. It is not allowed to change the value of pi in the program. For example, if we add statement pi = 3.14;
in the program, it will not pass the compiling. Using this method, the value 3.1415926 will only be converted to single precision representation once in compiling.
1.3.5 Exercises
Self-quiz
Take a moment to review what you have read above. When you feel ready, take this self-quiz which does not count towards your grade but will help you to gauge your learning. Answer the questions posed below.