The Open Systems Portability Checker

Derek Jones
derek@knosof.co.uk
Knowledge Software Ltd
Farnborough, Hants GU14 9RZ
UK

ABSTRACT

The Standards for POSIX and C were designed to enable the portability of applications across platforms. A lot of work has gone into checking compilers and environments for conformance to these Standards, but to achieve true applications portability the applications themselves must also conform to the requirements of standards. This paper discusses a tool that checks applications software for conformance to these Standards at compile and linktime as well as the service library interfaces. Any application that can pass through the Open Systems Portability Checker without producing any warnings is a conforming POSIX program and should have a high degree of portability.

Introduction

The Standards for POSIX and C were designed to enable the portability of applications across platforms. To achieve this portability both platforms and applications must conform to the standards. There is no point in having a platform that conforms to POSIX if the application does not itself conform to POSIX and similarly vice versa. To date a lot of work has gone into checking environments and language compilers for conformance to these Standards. But almost nothing has been done to check applications conformance.

This paper describes a tool set, the Open Systems Portability Checker (OSPC), that redresses this imbalance. OSPC is designed to check applications for conformance to the various Open Systems standards. This checking can be performed at compile, link and runtime. The tool set is `soft' in that users can control its output and add support for additional standards.

Accredited standards against which applications can be checked include POSIX.1, POSIX.2, POSIX.4, POSIX.16, PHIGS, GKS and the ANSI C Standard. Industry standards supported include the X/Open portability guide (XPG3) and the AT&T System V Interface Definition (SVR3 and 4). At the user interface level checks can also made against the Microsoft Windows, X-windows (release 3, 4 and 5) and Motif interfaces.

Benefits of Standards conformance

Conforming to standards is rarely seen as an end in itself. The main benefit, to vendors, of conforming to both the C Standard and POSIX is a reduction in porting effort to new platforms. Less effort means reduced porting cost and also the product is introduced to market quicker. This reduction in porting costs is also of advantage to the OEM since it is possible to create a wide portfolio of applications on a new platform more quickly than might otherwise have been possible. Without applications software a new platform is dead in the water.

From the marketing perspective Open Systems are being demanded by users. Use of an independent verification tool to check applications conformance will add weight to any claims of conformance to Open Systems Standards by vendors. From the users perspective demanding such verification is a useful means of ensuring vendor compliance with any Open Systems agreements that they may have.

As a method of distinguishing conforming products from nonconforming products many large OEMs are currently investigating the concept of branded software. This idea is still in the early stages of development. The intent is that software products will be checked for conformance to a set of standards. Those that pass will gain the right to use an associated brand (either a name or logo) in connection with the particular product. Provided the creator of the brand spends enough on advertising to convince users that it means something then vendors will be tempted to register.

Some means of independently and accurately checking conformance could save a considerable amount of time and money later (studies have shown that the later a problem is discovered the more expensive it is to fix).

Why don't applications conform to standards?

Reading the computing press would lead most people to conclude that existing products do conform to Open Systems standards. In practice these claims are nothing more than marketing hype. One of the main reason for not conforming is the newness of the standards. Once a standard has been published it takes a while before vendors start to supply software that conforms to it (although some do claim conformance before being validated; in the case of the C standard some vendors were claiming conformance several years before the standard was published).

Another cause of non-conformance is ignorance. Many developers believe that their applications do conform. The reason for this belief is partly related to poor knowledge of standards and partly due to not using tools that might prove otherwise. Training can go some way towards clearing up the ignorance problem. Because of the broad range of services offered it can take some time for developers to think POSIX. Old, Unix, programmer habits and know-how are easily transferred to a POSIX development environment. But, training in itself is not sufficient. Todays applications programs are very large and complicated pieces of software. Mistakes will be made. What is needed is a tool to double check that the software does indeed conform to standards.

Compilers don't check. The development compiler is only likely to check for constraint and syntax errors, since it is these constructs that a conforming implementation is required to detect (and must detect in order for a compiler to validate). Also users want the development compiler to accept their code without complaint. Some of it might be very old. Another strong force driving the development compiler is quality of code generation, users want optimal code. Flagging constructs that might cause portability problems to other compilers is rarely given any priority.

Portability problems inherent in standards

It is generally believed that standards provide rigid requirements. Standards contain many requirements and many of these only allow for a single way of doing things. But other requirements are often very flexible. This `grey' behaviour is there for a reason, either it is required to support existing practices, or it would be difficult to flag by the compiler or OS.

One of the principles behind the drafting of the C standard was that existing code should not be broken by wording in the standard. This meant that in many cases the behaviour was left undefined or specified as implementation defined. By not specifying what had to be done, compiler implementors were free to make their own decisions. Thus preserving the correctness of existing, old code. This freedom means that C programs can behave differently with different ISO validated C compilers, even on the same machine. There are no requirements on conforming compilers to flag occurrences of these non constraint/syntax errors.

The C Standard committee also recognised that compiler vendors would have to rely on existing tools to link separately compiled units together. Since existing linkers were unlikely to check for cross unit inconsistencies in external variables and functions it was felt that the C Standard should not mandate such checks. The decision to do interface checking is up to the developer. If the appropriate headers are included (and they contain all of the external identifiers referenced) the compiler may do some checking. Standards do not require that this checking be done.

How can applications be checked for conformance/portability?

The terms conformance and portability are often used synonymously. The purpose of creating conforming code is to render it portable. This argument is often driven in reverse to `deduce' that portable code must be conforming. A moments thought shows that one does not necessarily follow from the other. Historically it has been possible to measure software portability, by porting to a variety of platforms. It did not make sense to measure conformance because there were so few standards.

Traditionally one measure of portability was the number of different platforms that an application was available on. This measure arose, in part, because of the lack of any documented, well defined standards to act as a yard stick. Now that standards exist and contain requirements on environments and applications software, it ought to be possible to check that these requirements are met. Since programmers cannot be expected to be familiar with all the intricacies of the relevant standards and the volume of code in a modern application package is very high, manual checking has to be ruled out. This is obviously a job for some form of automatic checking tool.

What needs to be checked

Having shown the benefits of conforming to Open Systems standards we now have to investigate what constructs ought to be flagged and why. There are two main sources of information on constructs that ought to be checked to achieve applications portability:

The text of Standards documents. Here we are interested in applications written in the C language. So the relevant standards are the C language standard (ISO 9899) and the C language bindings provided by the POSIX.1 (ISO 9945-1) standard. The POSIX and C standards define two types of conformance, 1) implementation conformance and 2) application (or source code) conformance. In this paper we are interested in the latter.
Practical experience. The sources for this information tend to be first hand experiences and conversations with developers on problems that they have encountered. Books on software portability are starting to appear. But on the whole these tend to give general guidelines rather than specific details. On problem with specific cases is that they go out of date. As compilers and O/S's evolve problems disappear and new ones appear.

No justification, other than appearing in a standards document, is given for flagging those constructs highlighted in standards. Developers familiar with the standards process will know that the contents of standards are sometimes driven by immediate political needs rather than technical merit. Attempting to weed out the political from the technical issues is likely to cause a lot of headache.

The necessity for checks based on practical experience occurs because we live in an imperfect world. Operating systems and compilers do not fully conform to standards and contain bugs. In some cases these bugs are actually features, they are there for compatibility with previous versions of the software. The justification for flagging these constructs goes along the lines "this construct is not supported/behaves differently on the xyz platform". From this observation we draw the conclusion that truly portable applications have to be written using a subset of the facilities and services described in standards documents.

Overview of the OSPC

It was recognised at an early stage that most existing C programs are a long way from being strictly conforming. The user interface to the OSPC was thus designed to allow a smooth transition from common usage C to conforming standard C. It is possible to tailor the severity of every error message as well as selectively switching them off. This tailoring enables users to convert their code in an incremental fashion. Thus the work load can be spread over a period of time. It is also possible to achieve results quickly, rather than having to wait until all of the work is complete.

Although its function is to check applications, this was not seen as an excuse to execute slowly. Developers do not like to use tools that are slow and cumbersome. Therefore every attempt was made to ensure that the OSPC ran at a reasonable rate.

Support for multiple architectures

As a provider of services POSIX does not concern itself with the underlying computer architecture. On the other hand the C Standard recognises that at their lowest level computers do vary in their implementation. Since OSPCis intended to address all aspects of applications portability this issue of different architectures had to be handled.

The difference involved between being able to port an application to any computing platform and being able to port to a selection of platforms was also appreciated. Creating portable software takes time and effort. The potential diversity platforms that software many run on is very large. Marketing requirements frequently dictate that the software need only run on a particular range of platforms. Thus putting effort into make software portable to other platforms is wasted effort. The OSPC has the capacity to complain about everything that could cause portability problems. However complaining about the worst case situation generally creates a large quantity of output. In order to provide information that is specific to the port at hand, the concept of platform profiles were created. The user selects the source and target platforms. OSPC uses this information to remove those warnings that do not relate to the port being considered.

Platform profiles provide a means of enabling the user to select the portability and standards conformance attributes that a piece of software ought to have. Platform profiles are themselves created from a hierarchy of subprofiles. These subprofiles are grouped according to the system component that control them. They include:

cpu
O/S
Development compiler
Applications Binary Interface (ABI)
Standards documents
Platform specific information

By breaking the profile information up into its constituent parts it is possible to reuse information and provide a finer control over the final profile used. Information on the most common platforms is supplied with the OSPC package.

Since OSPC is user configurable it is possible to create new platform profiles and modify existing ones. It is also possible to add information on new standards, or even company standards.

Components of OSPC

The process of checking applications can be broken down into three phases, mimicking the compile/link cycle of software development. OSPC contains a separate tool for each of these phases.

The source code checker

This is a `traditional' compiler front end. It differs from most front ends in that many of its settings are soft. They are read from configuration files at startup time. A significant amount of effort has gone into showing the correctness of this tool (discussed in more detail below). The compiler has no hard wired internal limits and will handle any size of program, given sufficient memory.

The largest number of different warnings (currently over 1,000) are generated through checking the source code. The volume of generated output is very much dependent on the relative characteristics of the source and target platform.

The cross unit interface checker

The interface checker is essentially a linker that was tailor written for handling C programs. Most linkers perform very little interface checking across translation units. They are usually restricted to complaining about missing symbols. The OSPC linker performs full type checking across C translation units, ie it checks that the same identifier is declared with compatible type in every file in which it is used. It also merges its input into a form suitable for execution by the runtime checker.

It was recognised that developers often require the services of libraries not provided as part of POSIX, ie X windows. The OSPC was thus designed to be user extensible. It is possible to refer to non POSIX library functions, have the interface checked and call them at runtime. There is also a method of specifying what runtime interface checking needs to be performed. It is the interface checkers job to build a runtime system capable of executing the users program, including the required interfaces.

Checking the checker

A tool set that is going to be used to check applications code needs itself to be correct to a very high degree. Because of its background in being used by BSI as a tool for checking the C validation suite (and later by NIST in the same role) a significant amount of work went into the checking and verification of the software on which the OSPCis base This included producing tests that caused 99.6% of all basic blocks in the code to be executed, cross referencing the source code to the C standard and passing both the BSI and NIST C validation suites.

Properties of OSPC

The following properties of the source code checking portion of OSPC have been used to argue an informal proof of correctness of that tool.

No optimisations are performed. Thus there is a one to one mapping between the C source code and the generated intermediate code. It is possible to look at the intermediate code and recreate the original C expressions and statements. Because it does not do any optimisations the code generation performed is not context dependent. That is, the code generated for an expression is always the same.
The source code has been cross referenced to the C standard. To be exact all if statements in the source either contain a reference to the Standard (by page and line number) or are marked as referring to an `internal' documentation point. A tool was used to check that all if statements are adjacent to such a reference. Another tool was used to analyse the cross references and produce a list of page and line numbers in the Standard that were not referenced. This cross referencing enabled us to show that all of the requirements in the Standard were implemented (the first time around we actually found some lines in the Standard that were not referenced, code was duly written to implement these lines).
All of the source code is necessary. Test programs have been written with the aim of causing all source code statements (in fact basic blocks) to be executed. A basic block is a sequence of statements with one entry and one exit. In practice some basic blocks should never be executed, these include such constructs as internal consistency checks and disc full error messages. The process of writing these tests did uncover basic blocks that could never be executed. These basic blocks were deleted. Hence the claim that all of the source code is necessary.

Background of the tool set

All of the tools used in the OSPC were derived from the Model Implementation C Checker. This Model Implementation was designed to check C source programs for strict conformance to the C standard. Extending these tools to include POSIX mainly involved the writing of additional runtime interface checks.

Model Implementations have been produced for Pascal and Ada. In March 1989 the British Standards Institution signed an agreement with Knowledge Software to produce one for C. The Model Implementation was formally validated by BSI in August 1990 (it was the joint World first validated C compiler).

In October 1990 a static analysis tool, QA-C, was created and licenced to Programming Research. This tool includes software metrics, lint like features and includes a X windows interface.

Using OSPC

OSPC was designed to integrate seamlessly into the users development environment. As such it follows the compile/link cycle and can thus make use of existing makefiles and other related development tools.

Since the development compiler only checks for constraint and syntax errors, there are usually a large number of constructs flagged on the first occasion the application source code is processed. As developers learn which constructs to avoid the number of warnings found in newly developed code drops.

C source that has only previously been compiled with a Unix or K&R compiler usually contains a number of parameter mismatches on function calls. These mismatches can be caused by an incorrect number of parameters or incorrect type of parameters. This usually comes as a surprise to the developers since the software was previously `working'. Other problems that commonly occur in software that is being ported for the first time include use of host specific functionality (typically library calls) and use of numeric literals rather than symbolic constants in calls to system services.

The cross unit interface checking tool works on the output produced, for each input file, by the source code checker. Checks are made to ensure that the types of external objects and functions agree. Typical problems flagged include mismatches between function calls and definition and objects declared differently across units. Once again mismatches between function call and definition are common sources of error (often caused by the appropriate headers not being included).

Most tools have a method of switching off warnings, either by category or particular cases. In the case of our own tools the concept of source and target platform was introduced, in order to reduce the number of `uninteresting' message generated. By telling the tools which platform should act as a reference environment and knowing the target platform it is possible to filter out those features that are common to both platforms. Profiles for an `unknown' platform, POSIX and ISO C are available for those users who want the create maximally portable applications (giving `unknown' as the source platform maximises the numbers of warnings generated for a given target platform). Even with the use of platform profiles it is still sometimes necessary to switch off particular warnings. OSPC allows warnings to be switched of by severity level, message number or by where problems occur (ie in a system header). The error reporting mechanism is completely `soft' in that the text of all messages can be altered. The messages can also include standard references (optionally switched on via a command line options).

To get the best out of OSPC requires the correct balance in configuring the platform profiles. The configuration of a computing platform is rarely as straight forward as its users have been lead to believe. On closer examination many platforms have very different characteristics than was first thought. Some tuning of platform profile configurations is invariably necessary. This initial investment is worth it. Developers learn a lot about the platform that they are targeting and about the platform on which the software originally ran. The warnings from OSPC are also much more specific and relevant to the job at hand.

Conclusion

Applications conforming to Open Systems standards offer a reduction in porting costs/time. The only reliable method of verifying that applications software conforms to the requirements of these standards is to use some form of verification tool at all stages of the development and testing of the program. The benefits of such verification include confidence that the software is conforming and will port to other environments and marketing advantages in being able to backup claims of Open Systems conformance.

The latest version of the OSPC is over 150,000 lines of C. It has been used to process itself and has been ported to Linux, Sparc, MC68000, MC88000, RS/6000, HP/UX, Sequent and Intel 386/486 (DOS and Unix) platforms. Source code licensees have also ported to i860, MIPS and VAX.