Who Guards the Guardians?

Derek Jones
derek@knosof.co.uk
Knowledge Software Ltd
Farnborough, Hants GU14 9RZ
UK

A compiler is a tool used by software developers to create applications. Compilers exist for many computer languages. The C language is very popular and has now been standardised. Having a standard is one thing, making sure that vendors conform to it is another. In practice vendors have been claiming conformance to ANSI C for some time, some even before a Standard existed (X3.159-1989 became official on 14 December 1989). Some independent means of checking these claims of conformance is obviously needed, if potential purchasers of compilers are interested in making sure that the tools they use are conforming. Recognising this need standards bodies in the US and UK have been going through a procurement exercise to select a test suite to be used for checking claims of conformance to the C Standard, and issue certificates (to those that pass). In the US this body is the National Institute of Standards and Technology (NIST), in Europe the national standards bodies have formed a consortium with the British Standards Institution (BSI) taking the lead role.

The selected validation suite (test programs to check conformance to Standards are not simply called test suites) is very influential. Any compiler vendor failing to correctly process the validation suite is not considered to conform to the Standard and does not obtain a certificate. The lack of a certificate can impact sales. In the US certification of conformance is a requirement for Government contracts. The situation in Europe varies between countries. In the UK it is a requirement in contracts, while in Germany such a requirement would fall foul of their anti-competition rules.

In the US NIST does not check for conformance to ANSI C or ISO C (International Standards Organisation) but to FIPS C (Federal Information Processing Systems). The FIPS for C (FIPS PUB 160) references the ANSI C Standard and includes a few extra requirements on the diagnosis of errors. These diagnostic requirements are not considered to alter the technical content of the ANSI standard.

As a result of a competitive procurement conducted by NIST, Perennial, Inc. (Santa Clara CA) was awarded the contract as the exclusive provider of the C Test System to be used to demonstrate conformance to the US Government's C Standard (FIPS PUB 160).

This article is not about the certification process or the C Standard. Rather it discusses the work that was carried out to insure that the Perennial product, the ACVS (ANSI C Validation Suite) was correct and that it did a thorough job of checking against the C Standard. The validation suite is a collection of programs and like all programs it may contain bugs. Since this particular collection of programs is to be used to check compilers and their libraries for conformance to the Standard it had better be correct.

The Ideal case

Before examining an existing product we should ask ourselves what attributes a validation suite ought to have. My own outlook on this topic is influenced by being a compiler writer. But since it is compiler writers who form the largest portion of the customer base for such suites this view point has merit. Ideally a validation suite should have the following properties:

Check all of the requirements of the language Standard.
Tests should give the same results across all implementations (they should be strictly conforming programs in C terminology).
Should not contain coding bugs.
Should contain a test harness that enables the entire suite to be compiled/linked/executed and a pass/fail result obtained.
The suite should contain a document that explains the process by which the above requirements were checked for correctness.

Item (1) immediately poses the question what are the requirements of the Standard? See box 1 for a discussion of this topic. Having answered this question we then have to consider what form the tests should take. The minimalists would argue that one test should be written for each requirement in the Standard. Others would argue that a validation suite ought to contain sufficient tests to give the compiler and runtime system a good shake out. Either way some means of checking that all of the requirements have been covered is needed.

Item (2) requires a very careful reading of the source of the validation suite looking for non-strictly conforming code. In fact this checking is best carried out automatically. Examples of differences that occur between processors include the size and ranges of scalars (i.e., 16 bit or 32 bit int, near or far pointers), two's compliment versus one's compliment arithmetic and ASCII vs EBCDIC character sets. Differences between environments include record vs stream based I/O and the handling of signals.

Item (3) is best performing by running the tests through a number of implementations and checking the results. Item (4) is easy to state, but creating a test harness that works across many environments is difficult. What is really required to implement this requirement is a simple and consistent interface to the test programs. Tools can then be produced to examine the generated results.

Item (5) is really a confidence measure. Anybody could claim that the first four items had been achieved, but this document shows what work was really done to ensure that practice and theory were consistent. Compiler writers often have a high opinion of their own standards knowledge. So this document should lend credibility to the correctness of the validation suite and thus reduce the number of fault reports raised.

Contents of the ACVS

The ACVS consists of approximately 218,000 lines of source code (70,000 of these are comments and 23,000 are blank lines). This source code is contained in 1,847 files and forms 9,062 test cases.

The files making up the ACVS are organised into directories by section numbering in the Standard. There are three top level directories, Sec2 (Environment), Sec3 (Language) and Sec4 (Library). Section 1 of the Standard is introductory. The files within these directories are further broken down into subdirectories representing subsections of the Standard.

A data driven driver program is supplied to compile, link and run the tests.

The documentation comes in two spiral bound volumes. A description document includes an extensive section by section overview of what is tested. A user guide covers the installation process, what the ACVS driver does and how to use this driver to validate an implementation.

Checking the ACVS

Before the ACVS can be used for certification in anger it has to go through a period of public review. NIST were also aware of the work that had gone on in the UK in the production of a Model Implementation C Checker and as part of the exercise of reviewing the ACVS, NIST commissioned Knowledge Software Ltd. (Farnborough, UK), to check the ACVS for strict conformance to the C Standard and to measure how well it covered the requirements of the Standard.

For the public review vendors were selected as beta test sites, supplied a copy of the ACVS and asked to write a report on what they thought of the suite. By selecting beta test sites spread over a wide range of platforms it was hoped that any Unix bias in the installation and execution of the ACVS would be highlighted. Perennial responded very positively to the feedback received.

Checking the ACVS source code for strict conformance to the Standard was simply a matter of compiling/linking/executing the code with the Model Implementation. Checking for Standards coverage was a more interesting problem and is discussed in the rest of this article.

Model Implementation C Checker

This checker is essentially a C compiler/linker/library and interpreter. It differs from traditional compilers in it's design aims. It only implements those language requirements given in the C Standard, there are no extensions. Any construct that prevents a program from being strictly conforming is flagged. These constructs include constraint and syntax errors, undefined, implementation defined and unspecified behaviour and exceeding minimum limits. This checking occurs at compile, link and runtime (also the library functions do extensive checks on their parameters).

The C Model Implementation is not the first tool of its kind. A Model Implementation was also produced for Pascal (4) and was closely involved in the testing of the Pascal validation suite (3). The NYU Ada compiler was also used in the development of the Ada validation suite.

In theory this Model Implementation is an ideal tool for checking that the validation suite is correct and may be used to help measure coverage of the Standard. But being a program it may also contain bugs. So who says that the Model Implementation is correct?

A lot of work has gone into showing the correctness of the Model Implementation and showing that it implements all of the requirements of the C Standard. But then a lot of work has also gone into showing the correctness and coverage of the ACVS.

Human nature being what it is, we write buggy programs. But different peoples bugs are usually different. The ACVS and Model Implementation were written by different groups of people in different countries, with different perspectives. Hopefully different mistakes will have been made. By using one to check the other there is a reasonable hope that most of the existing bugs, in both, will be found. Of course over time the two groups of people will talk to each other and will start to make common mistakes. But in the initial period there is a good chance that each tool can be used to locate problems in the other.

Measuring coverage

Given a collection of C compiler test programs how is it possible to check how thoroughly they check all aspects of the C language? In the case of the analysis of the ACVS this question was answered indirectly by measuring how well these tests covered the Model Implementation. Various properties of the Model Implementation then enabled a strong argument to be made for showing a direct correlation between coverage of the Model Implementation and the C Standard. For this work one component of the Model Implementation was used; the Model Implementation C Compiler (mcc).

This component provided a method of showing coverage of sections 2 and 3 of the C Standard. Unfortunately the same techniques cannot be applied to section 4, the library. Basically this tends to be data driven rather than code driven.

So what are the properties of mcc that allow it to be used to correlate coverage of it to coverage of the C Standard.

Properties of mcc

No optimisations are performed. Thus there is a one to one mapping between the C source code and the generated intermediate code. It is possible to look at the intermediate code and recreate the original C expressions and statements. Also there is no functionality in the handling of the C language over and above that required by the Standard. This means that there is no source code in mcc that is not required by the C Standard (apart from user interface code). Because it does not do any optimisations the code generation performed by mcc is not context dependent. That is, the code generated for an expression is always the same. So a validation suite does not have to include various permutations on a given test in order to second guess what an optimiser might do.

mcc has been cross referenced to the C Standard. To be exact all if statements in the source of mcc either contain a reference to the Standard (by page and line number) or are marked as refering to an `internal' documentation point. A tool was used to check that all if statements in mcc are adjacent to such a reference. Another tool was used to analyse the cross references and produce a list of page and line numbers in the Standard that were not referenced. This cross referencing enabled us to show that all of the requirements in the Standard were implemented (the first time around we actually found some lines in the Standard that were not referenced, code was duly written to implement these lines). This cross referencing highlighted those modules that were purely housekeeping (few standard references) and thus could be excluded from the analysis. Those modules with a high percentage of references to the Standard were used as the basis for extrapolating the coverage of the C Standard

All of the source of mcc is necessary. Test programs have been written with the aim of causing all source code statements (in fact basic blocks) in mcc to be executed. A basic block is a sequence of statements with one entry and one exit. In practice some basic blocks should never be executed, these include such constructs as internal consistency checks and disc full error messages. The process of writing these tests did uncover basic blocks within mcc that could never be executed. These basic blocks were deleted. Hence the claim that all of the source code of mcc is necessary. Based on the fact that there is no redundant code we can assert that the coverage information is not biased away from the 100% level by unreachable code.

Obtaining the data

Coverage data was obtained by compiling the source of mcc on a Sun 4 with the generation of profiling code switched on in the host C compiler (using the -a option). This profiling information counts how many times each basic block in a program is executed as it is being run. To obtain total coverage information the entire ACVS was compiled with mcc. This gave a count of the number of times that all basic blocks in the source code of mcc were executed.

Having obtained the profiling data we then need to ask which portions of it are important for this analysis. This is where the cross referencing comes in. Those source modules of mcc with a large number of references to the Standard are obviously highly C specific. Those with mostly internal documentation references and few Standard references are not very C specific. In practice there are a collection of source modules with no Standard references (user interface, string handling package, memory management, etc), some modules with a high percentage of Standard references and a few modules in between these. For this analysis we concentrated on those modules that contained a high percentage of Standard references.

Having highlighted those source modules of mcc with a high percentage of Standard references, it is then necessary to look at the actual execution counts. For this analysis the interesting data were those basic blocks that were not executed. These unexecuted basic blocks represent untested portions of mcc. Because mcc does not perform optimisations and has no extensions (it's a minimalist compiler) we can be reasonably certain that these cases also represent potential areas of the C Standard that are not being tested for.

So what were the results?

Compiling the ACVS (version 3.0) with mcc caused 84% of those basic blocks contained in those modules with a high percentage of standard references, to be executed. Does this mean that the ACVS covers 84% of the C Standard? I think that such a statement would be incorrect. The arguments given in this article were designed to show a correlation between coverage of mcc and coverage of the C Standard by the ACVS. An experience with an earlier version of the ACVS highlights the strength of this correlation. On receiving an update to the ACVS some months ago I noted that the coverage of some modules in mcc had decreased. When this was reported to Perennial they found, on checking, that some of the tests had accidently been removed from the distribution.

Mapping statements in the Model Implementation to the C standard and writing tests that cause all of them to be executed shows that one implementation is faithful to its aims. However, the method of implementation is not unique. There are often several ways of designing and coding a given portion of the compiler. By using traditional compiler technology, rather than a high powered mathematical rules based approach, we are at least consistent with the majority of compilers that will be using the ACVS.

Is 84% a meaningful number? Use of the C language has spawned the creation of many C compilers. Sufficient in fact to make it worthwhile for several companies to produce compiler test suites. As fate would have it BSI chose a different supplier (PlumHall Inc, Somers Point, NJ) for the European C validation suite. A similar analysis of the PlumHall suite gave a very similar result to that obtained for the ACVS. So we can at least claim that the two suites are, broadly equally good (or bad).

The 84% figure also compares favourably with the results reported for the Pascal validation suite given in (3).

Postscript

So far I have been discussing the detection of problems in the ACVS. The ACVS was designed to check implementations for conformance to the C Standard. So how did Model Implementation itself do? I have to admit that the ACVS did find bugs in the Model Implementation. It found a total of seven problems, tests that failed but that should have passed. In continuing development work on the ACVS a further two problems have been found in mcc. Prior to this analysis there had been no direct interaction between Perennial and Knowledge Software. We both came to the situation fresh. I would have been happier if no bugs had been found in the Model Implementation. But some consolation can be taken in the fact that many vendors have had the ACVS for a period of time and still fail significantly more than seven tests. Of course all problems located in the Model Implementation were promptly fixed.

The Model Implementation found a number of non-conforming constructs in the ACVS. The majority were very minor in nature and easily missed by human checking. The ACVS now runs cleanly through the Model Implementation. Perennial process new or modified code with the Model Implementation to keep the contents of the ACVS strictly conforming.

The results obtained from coverage analysis provided independent evidence that the ACVS was indeed checking a substantial portion of the language section of the C Standard.

During the last 12 months I have also been impressed by how much effort Perennial has and continue to put into creating a high quality validation suite for C. I think that end users can be confident that any compiler that passes the ACVS is indeed ANSI conformant. On the other hand technically poor compiler vendors had better start worrying.

References

ANSI C Standard, X3J11/89-159
Implementation of ISO/IEC 9899:1990 Programming languages - C.
Ciechanowicz Z.J. & De Weever A.C. "The `Completeness' of the Pascal test suite." Software - Practice & Experience vol 14(5) 463-471 (May 1984).
Welsh J. & Hay A. "A Model Implementation of Standard Pascal." Prentice Hall, 1986.

C Standard requirements

The requirements contained in the C Standard apply either to an implementation or to source code. A validation suite is designed to check the requirements on C implementations. The requirements in the C Standard that apply to implementations can be broken down into three main components (1) the behaviour of legal (conforming in Standards language) constructs, (2) what has to be flagged as illegal (constraint or syntax errors in Standards language) and (3) the gray areas, i.e., those constructs that could fall into either category, depending on the implementation.

The ANSI C Standard itself is divided into four sections (1) Introduction, (2) Environment, (3) Language and (4) Library. A rationale document also accompanies the ANSI Standard giving background information.

In order to be conforming an implementation has to correctly process legal C, flag any occurrences of constraint and syntax errors and document those constructs that are specified as having implementation defined behaviour (one of the gray areas mentioned above). There is no requirement to diagnose any construct other than those that violate constraints or are errors in syntax.

There are actually two kinds of conforming implementations. Conforming hosted implementations target processors running an OS and conforming freestanding implementations target bare boards, i.e., coffee machines. The work described here relates to a conforming hosted implementation.

These requirements provide the framework for the structure of a C validation suite. Such a suite has to contain tests that ensure legal constructs are correctly handled (positive tests). It also has to contain tests with constraint and syntax errors in them (negative tests).

To pass a validation and obtain a certificate an implementation has to correctly compile and execute the positive tests and generate a diagnostic for all of the negative tests.

Terms defined by the C Standard

Constraint. Syntactic and semantic restrictions by which expositions of the language elements is to be interpreted.

Implementation defined behaviour. Behaviour, for a correct program construct and correct data, that depends on the characteristics of the implementation and that each implementation shall document.

Undefined behaviour. Behaviour, upon use of a nonportable or erroneous program construct, of erroneous data, or of indeterminately valued objects, for which the Standard imposes no requirements.

Unspecified behaviour. Behaviour, for a correct program construct and correct data, for which the Standard imposes no requirements.

One subtlety is the occurrence of the word shall. In essence a shall means shall when it occurs in constraint subsections. At other times, when applied to programs, it means should and violating it results in undefined behaviour. This use of the shall word is the cause of much misunderstanding by novice programmers, who are not aware of its implications.

Maximum minimum limits. These are constants that give the maximum minimum value that an object or type might take, i.e., INT_MAX is 32767.

Home