Deducing an Applications use of API's

Derek Jones
derek@knosof.co.uk
Knowledge Software Ltd
Farnborough, Hants
UK

Introduction

API's (Application Program Interface) have become the method by which vendors define the software interface to their products. The product could be a piece of hardware, a third party library or even an operating system.

Users of applications often need to know which API's an application relies on (for instance when purchasing hardware and software separately). Managers of development teams would probably like to know that only the defined API is being used and that the interface rules laid down in the specification are being followed (to reduce the likelyhood of their product becoming tied to a particular version, or vendors implementation of an API).

To date, those people not intimately familiar with the source code of the application have had no way of finding out what APIs are being used and if the defined API is being followed.

Now a new option has been added to OSPC (Open Systems Portability Checker). The -api option. Switching this option on, prior to processing the source of an application, will produce a listing of the API's used by the application and a summary of API specification violations.

This article discusses the issues involved in checking API usage and the type of information produced for users to examine. We adopt the conventions used in the C language.

What might an API define?

Information can be passed through an interface via function calls or external variables. To hide implementation details symbolic names (macros) are often used to represent special numeric values and structures are used to hold a collection of variables in one object.

The names of these functions, objects, macros and types are defined in one or more header files, to be included within the developers source code.

To be of use the API must define more than the C syntax. It must define the properties of these names. For instance the external xyz represents a status flag and can have any of the values given by the macros A, B or C.

Which API's are used?

An API checker has to do two things to answer this question.

It needs to scan the applications source looking for all uses of external identifiers.
It needs a database of API's and the identifiers they define, against which identifiers used in an application can be matched.

All references to identifiers are matched against the contents of the API database, or other parts of the application (one unit may refer to an identifier defined in another unit, not an API). A match against an identifier contained in an API flags that API as being used (cases where different API's define the same identifier are rare and can usually be resolved by looking at the context, ie included headers and the use made of the identifier).

Identifiers that are not contained in another unit of the application or the API database are regarded as referring to an unknown API (they could equally be referring to vendor extensions or a particular API).

Optional components

An API is sometimes broken down into core and optional components. For instance the real time portion of POSIX has 16 optional components. The availability of these components can be tested for using feature test macros within the application source code.

To be useful, any report of API usage has to list those optional components of an API that are used by the application.

Are the interface conventions obeyed?

It is no good making use of the facilities provided by an API if the interface conventions are not followed. The whole purpose of an API is to isolate implementation details from the application. An applications that does not follow the specified interface rules is likely to have problems when using a new version of a library implementing the API, or the application is moved to a different platform.

So as well as finding out which identifiers are used, it is also necessary to check that they are used correctly.

Interface requirements specified in APIs

API's specify a number of different requirements for correct usage. Commonly seen requirements include:

Parameters
- Symbolic values must be used as arguments
- Types of arguments must be compatible with a defined type
Function return values
- Value has a properties, ie is positive, is negative
- Value may only be compared against symbolic values
Feature test macros
- Used to check availability of optional constructs
Variable types
- Fields available in structs
- No requirement on layout of fields
- No requirement that the type be scalar
Not always constant
- Macros need not evaluate to a compile time constant
Headers
- Must be included

Function calls

An API function may accept input arguments, return a result or set status flags (for instance errno).

Functions that may perform various operations usually take an argument specifying which operation to be perform. For instance in:

   fseek(file_ptr, 4, SEEK_CUR);

the third argument tells fseek that the seek is to be performed relative to the current file position. SEEK_CUR is a macro whose value will chosen by each implementation. The call:

   fseek(file_ptr, 4, 1);

relies on an implementation choosing a SEEK_CUR value of 1. As such it does not obey the API specification, even though it will work on one or more implementations.

Some API's have more complicated requirements. For instance the POSIX function open may take one of three values (O_RDONLY, O_WRONY, O_RDWR) combined with zero or more other values (O_APPEND, O_NONBLOCK, O_NOCTTY, O_TRUNC, O_CREAT, O_EXCL). An API checker must ensure that the argument is created using the correct boolean or of these macros.

Checking a variable that is passed as an argument is substantially more difficult. It requires full flow analysis to track the symbols assigned to that variable. The current release of OSPC does not perform such analysis in this context.

APIs also define the types of the function parameters. Provided the host compiler supports function prototypes then the arguments given in calls to API functions will be checked at compile time. It is ok to pass an argument of a different arithmetic type because the compiler will automatically insert a cast to the required type. For instance if size_t has a unsigned long type, passing an int argument will work because of the implicit cast inserted by the compiler. Thus the developer does not have to worry about inserting casts to size_t for all appropriate arguments.

Passing an incorrect non-arithmetic type will cause the compiler to generate a compile time error. It is useful for an API checking tool to check that the arguments are compatible with the declared parameters, but not essential.

Some company coding standards require that arguments passed to functions are `strongly compatible' with the argument type. That is the named types must matched. But this is a coding standards requirement, not an API requirement (because of the implicit casts inserted by the compiler).

API functions may also return values. These values may represent individual quantities or particular properties, such as positiveness. For instance printf returns the number of characters printed or a negative value if an error occurred.

  if (printf("abc") == 3) /* OK */
     ;
  if (printf("xyz") == -1)
     ;

The first example is checking the number of characters written against the expected value, as allowed by the API. The second is assuming a particular value for the property of negativeness. One implementation may return -1, another -2, another an arbitrary negative value. The correct test would be:

  if (printf("xyz") < 0)
     ;

here the relational operator is testing for the negative property.

A grey area of checking involves functions that return a limited range of values. For instance the tm_sec field of a struct tm may take on values between 0 and 61. Is the following code fragment relying on an implementation defined extension or is it a coding bug?

  if (t.tm_sec > 61)
     ;

OSPC assumes that it is a coding problem and does not flag this code as not conforming to the API.

Like arguments, return values may sometimes be symbolic.

  if (fflush(file_ptr) == EOF) /* OK */
     ;
  if (fflush(file_ptr) == 1)
     ;

The second example is incorrect because it assumes a value for the symbol EOF.

Also relational operators may not be used in those cases where all the values returned by an API function are symbolic.

Status flags

Status flags set by API calls are usually there to provide additional information. For instance errno might be set to some symbolic value to indicate the type of a particular failure. Few API's require status flags to be checked by the application.

OSPC has the ability to detect that applications are checking status flags after an API call. However, it makes the assumption that such checking must occur in the first conditional statement after the API function call. This issue is regarded as a coding standard issue, not an API specification requirement.

Use of objects defined in API's

Like function return values, objects often have limits placed on the values they may contained. errno is an example of such an object. It may be reset to zero by the application, or it may be tested (using an equality operator) against a variety of symbolic names.

Optional constructs

Use of an optional API construct must be protected by a feature test macro. For instance POSIX specifies that the function setuid is only available if the feature test macro _POSIX_SAVED_IDS is defined. The developer thus has to write the code:

#ifdef _POSIX_JOB_CONTROL
   setuid(23);
#else
   do_something_else(23);
#endif

here the code is checking for the availability of setuid and taking alternative action if it is not available.

Optional constructs may be any identifier declared or defined by the API.

Developers that are unaware they are using optional constructs have set a future trap in the porting of their application. Users of packages also need to be aware of any optional constructs required by an applications when specifying hardware or third party libraries.

Use of headers

Headers are the means by which identifiers defined by in API may be made visible to the application. In some cases the header must be included because it contains information that cannot be obtained elsewhere (for instance the values choosen by the implementation for symbolic names). Sometimes it is possible for a developer to declare a subset of the API without including the header.

Headers are necessary if symbolic macros and types are referenced from the application source. For instance in the example involving fseek above the header stdio.h needs to be included so that the compiler can obtained the value of the macro SEEK_SET choosen by the implementation.

An example where it is ok to declare a function explicitly, rather than including the header string.h, is the strerror function. Because its API specification only uses C predefined types, char *strerror(int errnum). However, memset could not be so declared in the users source without including the string.h header (if the header is included why explicitly declare it anyway). This is because the declaration of memset needs a type from that header, size_t. The developer may declare memset with a particular predefined type instead of size_t, but that will only work on implementations where that type is used to represent size_t. (The C API specifies the type void *memset(void *s, int c, size_t n)).

Incorrect header contents

A problem that sometimes arises with API headers is that they do not accurately reflect the requirements contained in an API. Fortunately the most common problem, incorrectly specified parameter arithmetic types, does not affect the performance of a checking tool. If, for instance, a vendors version of string.h declared the third parameter of memset to take an unsigned int argument the interface is not broken from the applications point of view, provided size_t is also declared to have type unsigned int. The compiler vendor is at fault for not upgrading its headers to conform to the C standard (first published in 1989 by ANSI and as an ISO standard in 1992).

Other problems often seen include syntax violations (text after a #endif not included within comment delimiters for example) and incorrect numeric value for macros (floating point values inaccurate in the last digit).

Use of API defined types

An API may define types to allow implementations to adapt themselves to different hardware (usually different sized scalar types) or to combine together similar variables in one place (a struct).

APIs rarely define the ordering of fields within a struct, although implementations are usually given liberty to add additional fields to structs. Applications that rely on ordering of fields or make use of implementation specific fields are going beyond the specification given in the API.

struct fields

Initialisation of struct objects, via an initialiser, is one example where an ordering of fields is implied. So the construct:

   div_t local_var = {1, 2};

must be explicitly expanded out to (assuming the above assumed this order):

   div_t local_var;
   local_var.quot=1;
   local_var.rem=2;

An example of an implementation adding additional fields to a structure is struct dirent. On a Sun platform the code fragment:

   if (dirent_obj.d_reclen == 3)
      ;

would happily compile. Other platforms are likely to complain that the field d_reclen does not exist (it is not in the POSIX or XPG API's).

Type need not be scalar

An API occasionally leaves the specification of a particular type wide open. An example is the fpos_t typedef specified in the C standard, which simply states "... which is an object type capable of recording all the information needed to specify uniquely every position within a file." On many systems this type is a scalar. So the code:

if (fp_1 == fp2) /* two variables of type fpos_t */
   ;

works. But C does not allow the == operator to be applied to struct types. This code fragment would fail to compile on a platform that defined fpos_t to be a struct (in fact there is no portable way of comparing two objects of arbitrary type for equality).

Symbolic name need not be constant

API's use symbolic macro names to represent values that may vary between implementations. Developers sometimes assume that because macros are used the value will be a constant literal. This is sometimes not the case. For instance, of all the macros used to describe properties of the floating point representation, in the C standard, only one, FLT_RADIX, is required to be a constant expression. On many implementation the other macros are indeed constant expressions, but they are not required to be.

The code fragment:

   #include <float.h>
   int number[FLT_DIG];

relies on FLT_DIG, the number of decimal digits in a number that can be exactly represented in a float, being a constant expression. If it is an expression that must be evaluated, as above, at runtime the compiler will not be able to compile the application.

Implementation specific and future problems

API's often reserve specific names for future releases of their specification. They also allow implementations to add additional names to headers, provided those names obey a few restrictions.

The presence of these reserved names effectively constrains an application from defining identifiers with those names. An application containing such a definition could fail to compile on certain platforms, or with later versions of the API (because of duplicate or inconsistent definitions).

Some reserved names are easy to avoid by the application developer, for instance those starting with double underscore. Others might be considered more contentious. For instance all macros starting with E (capital E) are reserved by the C standard if the header errno.h is included, and all external identifiers starting with the three characters str are reserved in all cases by the C standard.

Experience has shown that applications often contain definitions of many identifiers whose names clash with those reserved by API's. The definitions could be changed to use alternative names, but in many cases the effort involved is disproportional to the time and effort needed to modify existing code.

Insisting that all names defined by an application not clash with those reserved by the API's used is impractical. Instead an API checking tool should list all definitions that do clash, along with a count of the number of references to them. Applications vendors might also undertake not to add new definitions to this existing list.

Identifiers specified to have properties

The C standard defines errno to "... expands to a modifiable lvalue that has type int ... It is unspecified whether errno is a macro or an identifier declared with external linkage." This is an example of an API defining properties of an interface rather than C syntax for its implementation. The POSIX standard says "... which is defined as extern int errno;" An implementation specification.

This kind of API object specification, using properties rather than C syntax is not very common.

Ideally an API checking tool would know about the different properties defined by an API and flag discrepancies. From the tools point of view such special cases are just that, special cases. In the example above it was reasoned that few applications rely solely on the C standard, most also include POSIX (or an API based on POSIX). So checks suggested by the specification given in the C standard are not carried out by OSPC

Use of identifiers of unknown status

Once the entire application has been processed all referenced identifiers are known. Resolving these identifiers against those defined by the application and those defined by the known API's may leave some unaccounted for.

These unaccounted identifiers are assumed to belong to either an unknown API or extensions to a known API.

In the case of variables and macros there will be a declaration in one of the included headers. The name of the header may give clues to the status of the identifier.

Functions need not be declared prior to use. In this case the compiler will create a default declaration of extern int f(), where f is replaces by the name of the function. So there may not be a header name to refer to for guidance. Once again incorrectly written headers can confuse the analysis. It is not unknown for vendors to supply headers with some function declarations missing, even though code implementing that function is available in a library that can be linked against.

A checking tool can do no more than list identifiers whose status is unknown. This list may contain hints as to their likely status, for instance by giving the name of the header in which any declaration occurred.

Identifier specified by several API's

Sometimes a newer API will add functionality to an interface defined by an earlier API, or define what was previously undefined behaviour. For instance the C standard says that the rename function may be used to change the name of a file. But C has no concept of directory structure, so it does not include any specification for handling directories, it assumes a flat file system. POSIX defines a directory structure and adds to the specification to rename to describe how directories are to be handled.

It is not always possible to deduce the use being made of an API interface from static analysis of the source. OSPC takes the view that if an identifier from an API is referenced then that API is used, irrespective of the number of API's involved.

Can all referenced API's be detected?

No, they cannot. Consider the case of an API that only defines macros and types. Let us assume that information on this API is not available to OSPC. Who is to say that the included header, used to access the defined names, is not part of the application, rather than an API? An example of of a header that only contains macros and typedefs is stddef.h, from the C standard.

In the case of objects and functions their definition is contained in a library, not in the source making up an application. Such a library has the opportunity to modify an object and functions may access host specific information.

Does it matter that use of an API may go undetected? Perhaps not. The developer has the option of taking the headers containing the macro and type definitions and making them part of the application source tree (if they are not available on a given platform). Of course the developer then has to take over responsibility for ensuring that the definitions are correct for each new platform.

In the ideal case the OSPC database contains information on an API's that an application uses.

Information output by an API checking tool

End users need a list of the API's used, along with a summary of any discrepancies. Software developers would probably like any discrepancies to be pin pointed, simplifying the job of isolating and fixing them.

Being able to output a list of applicable API's relies on having a database of information about what each API contains. This is turn requires an API to be documented, which, unfortunately is not always the case (X11 being an example, where even the headers provided can vary between platforms, let alone the header contents).

An API database takes time to build. Effort is also required to maintain it. The best approach to maintaining an API database is to make it easy for software vendors to create and maintain information on their own APIs, for distribution to customers. API descriptions contained in standards documents such as C and POSIX being provided by the company supplying the checking tools.

Information summary

API's used
- Optional components used
Violations of the defined interface
- Type of violation and number of occurrences
Reserved ids used
- API that reserves them
Identifier name and number of references to it
Identifiers referenced that are not in a known API
- included header
- functions
- external identifiers
- macros