What is Token?
In Programming languages, a Token is the smallest meaningful unit of a program that the compiler understands. The compiler recognizes tokens during Lexical Analysis. The Lexical Analysis is the first phase of compiler.
In lexical analysis phase, scan the source code and groups the character into meaningful sequence called tokens.
The lexical analyzer removes whitespace and comments from the source code to simplify further processing.
For example, when we write a C program, the compiler breaks it into small pieces, these pieces are called tokens. There are six main types of tokens.
- Keywords.
- Identifiers.
- Constants.
- Strings.
- Operators.
- Special Symbols (Separators) or Punctuation Marks.

| TOKEN TYPE | DESCRIPTION | EXAMPLE |
|---|---|---|
| Keyword | A keyword is a reserved word in C that has a fixed meaning and used for its intended meaning, also it cannot be used as a variable name. | int, if, while, return, auto, extern, break |
| Identifier | An identifier is the name given to a variable, function, or array by the programmer. | x, sum, main, demo |
| Constant | A constant is a fixed value that does not change during program execution. | 98, 3.14, 'A', ‘b’ |
| String | A string is a group of characters written inside double quotes. | "C Language", "Suraku Academy" |
| Operator | An operator is a symbol that performs operations on variables and values. | +, -, *, =, <, !=, && |
| Punctuation (Special Symbol) | A punctuation mark is a symbol used to separate, group, or define the structure of the program. | comma, parentheses, braces. |
HOW TO COUNT NUMBER OF TOKENS:
To count tokens correctly, you must follow some very small but important rules.
- Whitespace characters like space, tab, newline and comments are ignored and not counted as tokens.
- Each keyword, identifier, constant, operator, string, and separator is counted as one token.
- Multi-character operators like ++, --, <=, >=, == are counted as one token.
- Parentheses and brackets are counted separately, and each opening and closing symbol is treated as a different token.
- Semicolon is always one token.
- Comma is one token.
- In a function call, the statement is broken into tokens, so printf("Hi Suresh") is divided into printf, (, "Hi Suresh", and ).
- Preprocessor statements are also broken into tokens, so in #include<stdio.h>, each part like #include, and <stdio.h> is counted separately.
- Each occurrence of a variable is counted separately, so in my+my, the variable my is counted two times.
- Dot (.) and arrow (->) operators are each treated as single tokens in structures and are counted individually.
Example : Count the number of token in statement int num=10;
| Element | Type | Token Count |
|---|---|---|
| int | Keyword | 1 |
| num | Identifier | 1 |
| = | Operator | 1 |
| 10 | Constant | 1 |
| ; | Separator | 1 |
So, total tokens are 5.
Example : Count the number of token in statement printf("Suraku Academy");
| Element | Type | Token Count |
|---|---|---|
| printf | Identifier | 1 |
| ( | Separator | 1 |
| "Suraku Academy" | String | 1 |
| ) | Separator | 1 |
| ; | Separator | 1 |
So, total tokens are 5.
How many tokens are present in the following statement?
int a = b + 5;
(a) 5
(b) 6
(c) 7
(d) 4
Answer : [C]
Tokens in int a=b+5; are int, a, =, b, +, 5, and ; So, the total number of tokens is 7.
Match the following C statements with the correct number of tokens:
| List—I (C Statements) | List—II (Number of Tokens) |
|---|---|
| (i) int a = 10; | (P) 5 |
| (ii) printf("Hi Suresh"); | (Q) 6 |
| (iii) a = b + c; | (R) 9 |
| (iv) x = y++ + --z; | (S) 8 |
(a) (i)—(P), (ii)—(Q), (iii)—(S), (iv)—(R)
(b) (i)—(Q), (ii)—(Q), (iii)—(Q), (iv)—(R)
(c) (i)—(P), (ii)—(P), (iii)—(Q), (iv)—(R)
(d) (i)—(P), (ii)—(P), (iii)—(Q), (iv)—(S)
Answer : [D]
The statement “int a = 10;” contains 5 Tokens: int, a, =, 10, and ;.
The statement “printf("Hi Suresh");” contains 5 Tokens: printf, (, "Hi Suresh", ), and ;.
The statement “a = b + c;” contains 6 Tokens: a, =, b, +, c, and ;.
The statement “x = y++ + --z;” contains 8 Tokens: x, =, y, ++, +, --, z, and ;.
SURAKU BONUS — 1:
Difference Between Constant and Literal:
This is a very common question, and many students confuse these two.
Some values never change, and some values we just write directly. For example, if we write 5, 10, 3.14 and 99, these are values written directly in the program. These are called Literals. Consider the below written two statements.
- int my=55;
- const int my=55;
In the statement int my=55, the value 55 is called a literal because it is the actual value written directly in the program. The symbol my is a variable, which means it is a name used to store a value in memory.
Since my is a variable, its value is not fixed and can be changed later in the program if needed.
- my is variable.
- 55 is literal.
In the statement const int my=55, the variable my becomes fixed because it is declared using the const keyword. This means that once the value 55 is assigned to my, it cannot be changed later in the program. Therefore, my is called a constant, as its value remains the same throughout the execution of the program.
- 55 is a literal.
- my is a constant variable.
| CONSTANT | LITERAL |
|---|---|
| A constant is a value that cannot be changed during program execution. | A literal is the actual value written directly in the code. |
| A constant is usually stored in a variable and defined using const or define. | A literal is directly used in the program without storing. |
| A constant can be declared using const keyword or #define pre-processor directive. | A literal does not need any declaration. |
| A constant gives a name to a fixed value. | A literal is the fixed value itself. |
| const int my=55; | 55, 10, 3.14, 'A', "Suresh", "Suraku Academy" |
SURAKU BONUS — 2:
Punctuation Mark
A Punctuation Mark is a symbol used in writing to clarify meaning, separate ideas, and organize sentences properly.
- Punctuation marks are symbols used in writing.
- They improve clarity, readability, and meaning.
- In programming, they also act as tokens (separators).
- Each punctuation mark is counted as one token.
Important Punctuation Marks in C:
| Symbol | Name | Purpose |
|---|---|---|
| ; | Semicolon | Ends a statement. |
| , | Comma | Separates variables. |
| () | Parentheses | Function call and condition. |
| {} | Braces | Block of code. |
| [] | Brackets | Array declaration. |
| . | Dot | Access structure member. |
| -> | Arrow | Access pointer to structure. |
| : | Colon | Used in labels and switch. |
| # | Hash | Preprocessor directive. |
