NameDateSize

..Today11

cgram.yH A D13-May-202462.6 KiB

check-msgs.luaH A D03-Mar-20244.5 KiB

ckbool.cH A D13-May-20245.8 KiB

ckctype.cH A D21-Mar-20244.4 KiB

ckgetopt.cH A D21-Mar-20245.1 KiB

cksnprintb.cH A D13-May-20249.1 KiB

debug.cH A D11-May-202414.1 KiB

decl.cH A D13-May-202479.3 KiB

emit1.cH A D13-May-202412.4 KiB

err.cH A D09-Jun-202428.9 KiB

externs1.hH A D13-May-202413.1 KiB

func.cH A D13-May-202427.1 KiB

init.cH A D11-May-202422.6 KiB

lex.cH A D13-May-202438.5 KiB

lint1.hH A D11-May-202420.4 KiB

main1.cH A D13-May-20246.5 KiB

MakefileH A D09-Jun-20242.4 KiB

makemanH A D09-Jun-20244.1 KiB

mem1.cH A D05-Dec-20239.5 KiB

op.hH A D02-Apr-20243.2 KiB

oper.cH A D02-Apr-20245.7 KiB

README.mdH A D02-Apr-202410.7 KiB

scan.lH A D13-May-20244.8 KiB

tree.cH A D09-Jun-2024122.3 KiB

README.md

1[//]: # ($NetBSD: README.md,v 1.18 2024/03/31 20:28:45 rillig Exp $)
2
3# Introduction
4
5Lint1 analyzes a single translation unit of C code.
6
7* It reads the output of the C preprocessor, retaining the comments.
8* The lexer in `scan.l` and `lex.c` splits the input into tokens.
9* The parser in `cgram.y` creates types and expressions from the tokens.
10* The checks for declarations are in `decl.c`.
11* The checks for initializations are in `init.c`.
12* The checks for types and expressions are in `tree.c`.
13
14To see how a specific lint message is triggered, read the corresponding unit
15test in `tests/usr.bin/xlint/lint1/msg_???.c`.
16
17# Features
18
19## Type checking
20
21Lint has stricter type checking than most C compilers.
22
23In _strict bool mode_, lint treats `bool` as a type that is incompatible with
24other scalar types, like in C#, Go, Java.
25See the test `d_c99_bool_strict.c` for details.
26
27Lint warns about type conversions that may result in alignment problems.
28See the test `msg_135.c` for examples.
29
30## Control flow analysis
31
32Lint roughly tracks the control flow inside a single function.
33It doesn't follow `goto` statements precisely though,
34it rather assumes that each label is reachable.
35See the test `msg_193.c` for examples.
36
37## Error handling
38
39Lint tries to continue parsing and checking even after seeing errors.
40This part of lint is not robust though, so expect some crashes here,
41as variables may not be properly initialized or be null pointers.
42The cleanup after handling a parse error is often incomplete.
43
44## Configurable diagnostic messages
45
46Whether lint prints a message and whether each message is an error, a warning
47or just informational depends on several things:
48
49* The language level, with its possible values:
50    * traditional C (`-t`)
51    * migration from traditional C to C90 (default)
52    * C90 (`-s`)
53    * C99 (`-S`)
54    * C11 (`-Ac11`)
55    * C23 (`-Ac23`)
56* In GCC mode (`-g`), lint allows several GNU extensions,
57  reducing the amount of printed messages.
58* In strict bool mode (`-T`), lint issues errors when `bool` is mixed with
59  other scalar types, reusing the existing messages 107 and 211, while also
60  defining new messages that are specific to strict bool mode.
61* The option `-a` performs the check for lossy conversions from large integer
62  types, the option `-aa` extends this check to small integer types as well,
63  reusing the same message ID.
64* The option `-X` suppresses arbitrary messages by their message ID.
65* The option `-q` enables additional queries that are not suitable as regular
66  warnings but may be interesting to look at on a case-by-case basis.
67
68# Limitations
69
70Lint operates on the level of individual expressions.
71
72* It does not build an AST of the statements of a function, therefore it
73  cannot reliably analyze the control flow in a single function.
74* It does not store the control flow properties of functions, therefore it
75  cannot relate parameter nullability with the return value.
76* It does not have information about functions, except for their prototypes,
77  therefore it cannot relate them across translation units.
78* It does not store detailed information about complex data types, therefore
79  it cannot cross-check them across translation units.
80
81# Fundamental types
82
83Lint mainly analyzes expressions (`tnode_t`), which are formed from operators
84(`op_t`) and their operands (`tnode_t`).
85Each node has a data type (`type_t`) and a few other properties that depend on
86the operator.
87
88## type_t
89
90The basic types are `int`, `_Bool`, `unsigned long`, `pointer` and so on,
91as defined in `tspec_t`.
92
93Concrete types like `int` or `const char *` are created by `gettyp(INT)`,
94or by deriving new types from existing types, using `block_derive_pointer`,
95`block_derive_array` and `block_derive_function`.
96(See [below](#memory-management) for the meaning of the prefix `block_`.)
97
98After a type has been created, it should not be modified anymore.
99Ideally all references to types would be `const`, but that's still on the
100to-do list and not trivial.
101In the meantime, before modifying a type,
102it needs to be copied using `block_dup_type` or `expr_dup_type`.
103
104## tnode_t
105
106When lint parses an expression,
107it builds a tree of nodes representing the AST.
108Each node has an operator that defines which other members may be accessed.
109The operators and their properties are defined in `oper.c`.
110Some examples for operators:
111
112| Operator | Meaning                                        |
113|----------|------------------------------------------------|
114| CON      | compile-time constant in `u.value`             |
115| NAME     | references the identifier in `u.sym`           |
116| UPLUS    | the unary operator `+u.ops.left`               |
117| PLUS     | the binary operator `u.ops.left + u.ops.right` |
118| CALL     | a function call                                |
119| CVT      | an implicit conversion or an explicit cast     |
120
121As an example, the expression `strcmp(names[i], "name")` has this internal
122structure:
123
124~~~text
125 1: 'call' type 'int'
126 2:   '&' type 'pointer to function(pointer to const char, pointer to const char) returning int'
127 3:     'name' 'strcmp' with extern 'function(pointer to const char, pointer to const char) returning int'
128 4:   'load' type 'pointer to const char'
129 5:     '*' type 'pointer to const char', lvalue
130 6:       '+' type 'pointer to pointer to const char'
131 7:         'load' type 'pointer to pointer to const char'
132 8:           'name' 'names' with auto 'pointer to pointer to const char', lvalue
133 9:         '*' type 'long'
13410:           'convert' type 'long'
13511:             'load' type 'int'
13612:               'name' 'i' with auto 'int', lvalue
13713:           'constant' type 'long', value 8
13814:   'convert' type 'pointer to const char'
13915:     '&' type 'pointer to char'
14016:       'string' type 'array[5] of char', lvalue, "name"
141~~~
142
143| Lines      | Notes                                                       |
144|------------|-------------------------------------------------------------|
145| 1, 2, 4, 7 | A function call consists of the function and its arguments. |
146| 4, 14      | The arguments of a call are ordered from left to right.     |
147| 5, 6       | Array access is represented as `*(left + right)`.           |
148| 9, 13      | Array and struct offsets are in premultiplied form.         |
149| 9          | The type `ptrdiff_t` on this platform is `long`, not `int`. |
150| 13         | The size of a pointer on this platform is 8 bytes.          |
151
152See `debug_node` for how to interpret the members of `tnode_t`.
153
154## sym_t
155
156There is a single symbol table (`symtab`) for the whole translation unit.
157This means that the same identifier may appear multiple times.
158To distinguish the identifiers, each symbol has a block level.
159Symbols from inner scopes are added to the beginning of the table,
160so they are found first when looking for the identifier.
161
162# Memory management
163
164## Block scope
165
166The memory that is allocated by the `block_*_alloc` functions is freed at the
167end of analyzing the block, that is, after the closing `}`.
168See `compound_statement_rbrace:` in `cgram.y`.
169
170## Expression scope
171
172The memory that is allocated by the `expr_*_alloc` functions is freed at the
173end of analyzing the expression.
174See `expr_free_all`.
175
176# Null pointers
177
178* Expressions can be null.
179    * This typically happens in case of syntax errors or other errors.
180* The subtype of a pointer, array or function is never null.
181
182# Common variable names
183
184| Name | Type      | Meaning                                              |
185|------|-----------|------------------------------------------------------|
186| t    | `tspec_t` | a simple type such as `INT`, `FUNC`, `PTR`           |
187| tp   | `type_t`  | a complete type such as `pointer to array[3] of int` |
188| stp  | `type_t`  | the subtype of a pointer, array or function          |
189| tn   | `tnode_t` | a tree node, mostly used for expressions             |
190| op   | `op_t`    | an operator used in an expression                    |
191| ln   | `tnode_t` | the left-hand operand of a binary operator           |
192| rn   | `tnode_t` | the right-hand operand of a binary operator          |
193| sym  | `sym_t`   | a symbol from the symbol table                       |
194
195# Abbreviations in variable names
196
197| Abbr | Expanded                                     |
198|------|----------------------------------------------|
199| l    | left                                         |
200| r    | right                                        |
201| o    | old (during type conversions)                |
202| n    | new (during type conversions)                |
203| op   | operator                                     |
204| arg  | the number of the parameter, for diagnostics |
205
206# Debugging
207
208Useful breakpoints are:
209
210| Function/Code       | File    | Remarks                                              |
211|---------------------|---------|------------------------------------------------------|
212| build_binary        | tree.c  | Creates an expression for a unary or binary operator |
213| initialization_expr | init.c  | Checks a single initializer                          |
214| expr                | tree.c  | Checks a full expression                             |
215| typeok              | tree.c  | Checks two types for compatibility                   |
216| vwarning_at         | err.c   | Prints a warning                                     |
217| verror_at           | err.c   | Prints an error                                      |
218| assert_failed       | err.c   | Prints the location of a failed assertion            |
219| `switch (yyn)`      | cgram.c | Reduction of a grammar rule                          |
220
221# Tests
222
223The tests are in `tests/usr.bin/xlint`.
224By default, each test is run with the lint flags `-g` for GNU mode,
225`-S` for C99 mode and `-w` to report warnings as errors.
226
227Each test can override the lint flags using comments of the following forms:
228
229* `/* lint1-flags: -tw */` replaces the default flags.
230* `/* lint1-extra-flags: -p */` adds to the default flags.
231
232Most tests check the diagnostics that lint generates.
233They do this by placing `expect` comments near the location of the diagnostic.
234The comment `/* expect+1: ... */` expects a diagnostic to be generated for the
235code 1 line below, `/* expect-5: ... */` expects a diagnostic to be generated
236for the code 5 lines above.
237An `expect` comment cannot span multiple lines.
238At the start and the end of the comment, the placeholder `...` stands for an
239arbitrary sequence of characters.
240There may be other code or comments in the same line of the `.c` file.
241
242Each diagnostic has its own test `msg_???.c` that triggers the corresponding
243diagnostic.
244Most other tests focus on a single feature.
245
246## Adding a new test
247
2481. Run `make add-test NAME=test_name`.
2492. Run `cd ../../../tests/usr.bin/xlint/lint1`.
2503. Make the test generate the desired diagnostics.
2514. Run `./accept.sh test_name` until it no longer complains.
2525. Run `cd ../../..`.
2536. Run `cvs commit distrib/sets/lists/tests/mi tests/usr.bin/xlint`.
254