Checking C code containing embedded SQL

Derek Jones
derek@knosof.co.uk
Knowledge Software Ltd
Farnborough, Hants GU14 9RZ
UK

ABSTRACT

The ISO Standard for SQL supports the embedding of code in 3GLs such as C. These two languages have different design aims and their users come from very different cultures. This paper describes the experience of adding full SQL/2 support to an existing C static analyser, OSPC (Open Systems Portability Checker). Obvious integration issues, such as uninitialised variable checking needed little effort. The enforcement of many coding conventions adopted by customers required implicit information contained in the source to be made explicit. Further complexity is added by developers making use of vendor supplied libraries to make C calls to the database server. These calls are not visible as embedded SQL but can affect the results obtained from the server.

Introduction

The Standards for C, SQL and POSIX were designed to enable the portability of applications across platforms. To achieve this portability both platforms and applications must conform to the standards. The Open Systems Portability Checker (OSPC) is a tool that checks applications, written in C, for possible portability problems. This checking is performed statically.

OSPC has been available for over three years and the latest version has the ability to detect possible portability and general coding problems with a high degree of accuracy. A very important property of any tool is to provide warning messages that pinpoint the problem area and give exactly describe the construct causing the message. Providing good warning messages in code that has been through some form of automatic processing prior to analysis is very hard. Users also want to be able to prioritize warnings. So they can fit the number of problems to be solved into the time available.

The SQL/2 standard supports the embedding of SQL statements in other languages. In the case of C a number of vendors provide translators that take files containing a mixture of C and SQL, converting the SQL into a series of C declarations and statements. The resulting file is then processed by a compiler in the normal fashion.

In March 1994 work started on upgrading OSPC to support the handling of embedded SQL in C. Previously users had had to preprocess the SQL into C, before using OSPC. Such preprocessing resulted in additional warnings being given for the generated code and in placed degraded the quality of warnings given for the hand written C.

Implementation

The existing C tool used traditional compiler techniques and it was decided that the SQL enhancement would reuse as many data structures and code as possible. By far the hardest job was the creation of a LALR(1) grammar for input into a parser generator. Typing in the syntax from the standard resulted in over 4,000 shift/reduce and 3,000 plus reduce/reduce conflicts. A man month of effort cut this down to one shift/reduce conflict (a still to be resolved issue involving two high level constructs and left close parenthesis). With over 800 productions the grammar is large.

Lexical analysis was complicated by many vendors not supporting the full set of keywords, adding their own keywords and even allowing the use of host identifiers with the same spelling as keywords.

The SQL standard specifies the form that host variable declarations must take by providing a subset of the appropriate language standards variable declaration syntax. Rather than duplicate syntax and associate actions OSPC modified the grammar so that the C declarations were processed by the existing C grammar and associated action.

The main problem in testing the software was lack of test cases. Many examples were typed in from books and further tests were suggested by attempts to achieve 100% statement coverage. Potential customers provided some code and a little was obtained from various sites on the internet. There is very little publically available C code containing embedded SQL, although there is plenty that does not contain SQL. There is a validation suite for SQL, however this is currently targeted at Entry level. Work is being carried out by NIST to add tests for Intermediate level (Knowledge Software have checked a beta version of these new Intermediate level tests against OSPC).

Vendor extensions

Vendor extensions are a major problem. Vendors have added many, disparate extensions to the language. Most extensions involve additional syntax rules. These create a burden on an already large grammar. Fortunately for us a small number of vendors have cornered most of the market. The decision was made to concentrate on those frequently used vendor extensions supported by the major vendors. Oracle have the largest market share and were given priority.

Discrepancies between a vendors product documentation and implementation added further confusion

SQL/3

Work is well advanced on creating an object oriented SQL standard. Since most vendors are a long way from fully implementing the SQL/2 standard one might question the wisdom of such a rapid revision of the standard. Several constructs were left out of SQL/2 on the grounds that they were too advanced. However, users wanted them, so vendors implemented them, object oriented came along and who wants to get left behind?

Triggers, the ability to associate an action with the modification of the value of a database column, are the most widely implemented SQL/3 construct. Because these constructs are new they have yet to make a substantial impact on the code that OSPC processes. So we have decided to wait before adding any real checks in these areas.

What to check

The project requirements soon evolved from simply handling embedded SQL as a means of improving the quality of warnings given in the C code to doing portability and coding style checks on the SQL.

Sources of information on constructs that ought to be checked to achieve applications portability included:

Don't compilers and preprocessors do checking?

The development C compiler is only likely to check for constraint and syntax errors, since it is these constructs that a conforming implementation is required to detect (and must detect in order for a compiler to validate).

One of the principles behind the drafting of the C standard was that existing code should not be broken by wording in the standard. This meant that in many cases the behaviour was left undefined or implementation defined. By not specifying what had to be done, compiler implementors were free to make their own decisions. Thus preserving the correctness of existing, old code. So in general compilers are silent on those constructs whose behaviour may vary across implementations. This freedom means that C programs can behave differently with different ISO validated C compilers, even on the same machine. There are no requirements on compilers to flag occurrences of these non constraint/syntax errors.

The same is true of SQL preprocessors to an even greater degree. Most of the development work done by database vendors relates to database specific issues, ie., query performance and administrative support tools. Vendors have added many extensions to SQL, many before the standard had equivalent constructs; they are also still trying to catch up with all of the requirements contained in the latest standard. SQL developers tend to be tied much more firmly to database vendors than C developers to compiler vendors.

Flagging SQL

The first and most obvious constructs to flag were those specified by the standard under leveling rules. Users wishing to port to a SQL preprocessor that only supported a given standards level would obviously like to know in advance if they were making use of constructs beyond that level. In practice most vendors are still struggling to get from Entry level to Intermediate level, let alone to Full level.

The US Government created a FIPS (Federal Information Processing Standard) for SQL. FIPS 127-2 ties down some of the constructs that are left implementation defined by the standard and also sets minimum limits on the availability of some constructs. The FIPS standard has teeth in Government procurement. So this document is a worthwhile source of constructs that ought to be flagged.

OSPC already contains a lot of knowledge about C constructs that although perfectly conforming, are likely to be the result of programmer error (lint like constructs). While studying SQL similar types of constructs were spotted. For instance adding a list of values to a table without specifying the column names, relying on current knowledge of the actual order, is dependent on the column ordering remaining unchanged. Such usage is flagged with the suggestion that the column names be explicitly code into the statement.

SQL queries specify requirements on the items required, ie,. all women over 6 ft, and leaves the job of satisfying these requirements to the database engine. It is possible to make non-sensical queries. OSPC has no built in database knowledge or ability to perform complex set operations (which is what many SQL queries boil down to). Instead it contains some hard coded patterns describing possibly pointless queries, or subqueries. For instance each column name specified in the HAVING clause must occur within a statistical function or must occur in the list of columns named in the GROUP BY clause.

Like all developers SQL programmers want their applications to run quickly. Because they specify requirements via a query rather than by an implementation, scope for clever algorithms is limited. Most of the efficiency savings are to be obtained by proper organisation of the database. Vendors do provide query optimisers. But there is still scope for avoiding inefficient queries. One general rule is to place the most restrictive query first, this reduces the work for subsequent SQL checks. Another is to replace tests using relational operators with the BETWEEN operator. Making such recommendations is strictly outside of the scope of a portability checker. But once SQL support is available such checks are easy to implement and encourage developers to use the tool (and buy it in the first place).

Vendor extensions

Use of all vendor extensions to SQL are flagged. Where possible a standard construct is suggested. Because so many extensions are done via additional syntax some flagging (where the SQL grammar has not be extended to support the vendor extensions) takes the form of a syntax error disguised with as a warning message relating to vendor extensions.

The SQL datatype VARCHAR is used to declare variable sized objects. Vendors implement it differently. Some treat it as a scalar type, others treat it as a struct (with associate fields that users can access in their C code).

Calls to inbuilt library routines were relatively easy to support. The occurrence of calls to these library routines can be help to identify the vendor, or version of vendors product, used to process a particular file (the desire to reuse code and programmer job changes can mean that such information can get lost).

The C/SQL interface

Embedded SQL has no control structures, this is provided by C. It is possible to treat each embedded SQL statement as a standalone entity. Its only interaction with the outside world being via C statements and via a global data area (usually implemented as a struct).

Experience has shown that a static checker needs to have at least a 60% accuracy rate before users will consider it a success. One of the first priorities was to reduce the number of false warnings emitted by the uninitialised variable checking. OSPC takes a worst case approach to this problem, so the number of false warnings was already considered high. Embedded SQL may also read from C variables and checking was added to ensure that values had been assigned to them.

A single SQL statement may return many values. An indicator variable may be associated with each value returning host (C language) variable. This indicator variable (which is itself a C variable) is set to a special error value if no answer was returned in its associated host variable. Developers need to check the indicator variable in the C code before accessing the returned value; rather like checking errno after a library call. Also like errno, many developers fail to do the check. Some companies also have naming conventions that tie together two C variables as indicator and result variable respectively. Such conventions need to be enforced.

A database may contain a variety of datatypes. When writing 'pure' SQL developers do not need to worry about the size and datatype of variables. The SQL interpreter (many vendors provide a basic like interactive query interface) looks after the details. When interfacing to C developers have to make sure that the host variables used to in communicate values have the correct size and datatype. Surprisingly few of the vendors products perform much checking in this area. An obvious candidate for OSPC.

SQL queries may be static or dynamic. Dynamic SQL happens by creating strings and passing them to the SQL server at runtime. There are also a number of API's (ie ODBC) providing similar functionality. Fortunately dynamic SQL is limited to database queries. It cannot be used to open and close databases. Because each query is a stand alone operation OSPC only needs to be aware of which host variables are referenced or assigned to. The SQL statement being treated as having no other effect.

A major difference between C and SQL is that SQL queries can return variable amounts of data; every row in the database that satisfies the query. C has no inbuilt variable sized object. Embedded SQL solves this problem by creating a new type of construct, Cursors. A Cursor is a query that can only return a single row. A loop, written in C, iterates until all values have been returned. Checking that the Cursor has been correctly set up and that the returned values are correctly processed is simply in theory. However, our implementation uses a single pass to collect all information. So we don't always have all of the necessary information on the C constructs when processing the SQL.

Same concept, different implementation

Using two different languages at the same time can result in programmer error caused by the same concepts being specified differently. One such is the string token. In C this is delimited by the double quote, ", character. In SQL it is delimited by the quote, ', character. OSPC already flagged occurrences of multi-character constants (multiple characters appearing between quote characters). But such flagging, in the C, was dependent on the characteristics of the host and target platform (OSPC has a very sophisticated mechanism for working out the differences between porting platforms when deciding whether or not to flag a given construct). Now we needed to be able to flag such constructs if there was the possibility that the user had intended a C string, rather than a C multi-character constant. More sophisticated analysis of context was needed.

Within OSPC users can switch off warnings by giving their associated error number.

It is unlikely that a developer will be equally familiar with the two languages. Which language do we assume the developer is most at home in? Such information could be used to prioritize the warnings given. The current version assumes that the errors will more likely be made in the C. This has as much to do with C statements typically outnumbering SQL by 7 to 1, as it does with our own uncertainty about how SQL developers think. Future versions may provide a command line option to tell OSPC which language the programmar is most familiar with.

int some_global = 4;

void f0()
{
char val1,
     val2;
int i,
    k1,
    k2,
    pno1;

struct {
       int salary;
       int holiday;
       } emp;

EXEC SQL
   GET descriptor 'ext_cur_name' :i = COUNT ;

/* use of undeclared host variables flagged */
EXEC SQL
   GET descriptor 1 :undeclared = COUNT ;

/* k2 used before it is assigned to, flagged */
EXEC SQL
   SET descriptor 'ext_cur_name' COUNT = :k2;

EXEC SQL
   SET descriptor 'ext_cur_name' VALUE desc_id = :k1;

EXEC SQL
   UPDATE employee
   SET salary = :emp;

/* pno1 used before it is assigned to, flagged */
EXEC SQL
      SELECT SP.SNO
         INTO :val1
         FROM P
         WHERE SP.PNO = :pno1
      ;

some_global=val1; /* val1 was assigned to above */
}

OSPC supports the full SQL/2 standard (ISO/IEC 9075:1992). Most vendors have added their own extensions. The -SQLV

Conclusion

The goal of adding support for full embedded SQL/2 to OSPC was achieved. Checking between the two languages is integrated. Once the SQL checking became available the focus shifted away from being just a way of improving the C warnings to include SQL portability checking.

The very large number of vendor extensions to the SQL standard came as a big surprise. An even bigger surprise was the willingness of developers to be tied to specific vendors products. Portability to other products being seen as requiring to much effort.

Further development of the SQL checker is currently hampered by the lack of availability of a large body of user code. There are many other possible checks that could be performed. But experience with C and other languages has shown that users often do unexpected things and that they do not always write the silly constructs that we implement checks against. A large body of code would allow us to verify some of the unsubstantiated claims being made by developers as to what they do. It would also act as a useful test bed for some of our ideas on what we ought to be checking for.

As with all static checking the limiting factor on code improvement is the developers willingness to do something about the warnings generated. Time constraints, user pressure for more features and cost are the usual reasons given to ignoring most warnings.


Home

© Copyright 1995,97. Knowledge Software Ltd. All rights reserved;
Last changed 18 Feb 1997