Efficient Type Equivalence Checking

A.L. Brown

Department of Computer Science,
University of Adelaide,
South Australia 5005,
Australia.
email: fred@cs.adelaide.edu.au

1. Introduction

Persistent programming systems are able to protect all data in a persistent store by ensuring that all operations on data conform to a programming language's type system[1]. This is achieved by a mixture of static and dynamic checks appropriate to the supported programming languages[2]. A graph is commonly used to represent recursive types and to support parameterised types[2]. Thus, type equivalence algorithms are required that will operate over the graphs.

The introduction of concurrency and distribution impose certain requirements on type equivalence checking. A concurrent system may involve multiple type equivalence tests over the same representations. The equivalence algorithm has to ensure that individual checks cannot interfere with each other. When distribution is introduced data may migrate between systems requiring a mechanism for checking the type of imported data. If copies of a type representation are passed with data, the necessary checking can be performed. However, this means that the type equivalence mechanisms must be able to cope with multiple copies of a type's representation.

A type equivalence mechanism will now be presented that is based on a graph representation of type. The representation accommodates recursive type definitions including the full range of data types available in a persistent programming language such as Napier88[7]. The mechanism presented is based on a hybrid of two existing Napier88 type equivalence mechanisms[2,8] that preserves their good points whilst solving their short-comings in the context of a concurrent and distributed system.

2. A Type Equivalence Algorithm

To aid understanding, the following description of the type equivalence algorithm will be based on Napier88. All types in Napier88 can be represented by the following simple structure:

rec type TYPE is structure( label : int ; specificInfo : string ; references : list[ TYPE ] )

This is sufficient to uniquely represent any type describable by the Napier88 type system using some straight forward mapping rules. The label field indicates if the type is a base-type or a structure, procedure, vector, etc. The specificInfo contains information such as structure field names, concatenated together with markers to form a single string. This string has an implicit ordering that is part of the type description. A structural type equivalence algorithm over this structure must check for equality of the label and specificInfo fields before recursively checking any representation found in the references list. This is illustrated in Figure 1.

rec let eqType = proc( a,b : TYPE -> bool )
a = b or in_remembered_set( a,b ) or
(a( label ) = b( label ) and a( specificInfo ) = b( specificInfo )
and eqList( a( references ),b( references )))

& eqList = proc( a,b : list[ TYPE ] -> bool )
(a is tip and b is tip) or
(a isnt tip and b isnt tip and eqType( head( a
),head( b ) ) and eqList( tail( a ),tail( b ) ))
Figure 1: A structural type equivalence algorithm in Napier88.

In the presence of recursive type definitions, the type representation will contain cycles. To avoid this problem and ensure that equivalent recursive types can be detected, a remembered set is included in the algorithm. The purpose of the remembered set is to record all pairs of comparisons that have been made. Immediately after the pointer identity check the remembered set is searched for the pair of types being compared and if they are not found they are entered into the set. If they are found then they can be assumed to be equivalent. This will be true if the previous comparison of the two types had completed. However, the comparison may still be in progress. In this case it is safe to assume they are equivalent. If the types are not equivalent for any reason, then the algorithm will fail whilst following another branch of the recursion.

The time complexity of this algorithm is dependent on the implementation of the remembered set. It is also a potential source of error in a concurrent system since the remembered set may be shared by several concurrent type equivalence checks. To achieve a low time complexity and maximise concurrency the following two techniques are employed by the new algorithm.

The remembered set is implemented as a hash table[2]. In order to obtain suitable hash keys, the compiler annotates all TYPEs it creates with an extra field that contains a random number. Within any compilation unit, the random numbers should be unique and the compiler can arrange for there to be only one copy of each TYPE. Consequently, the type equivalence mechanism performs well when comparing types from the same compilation unit, nearly all comparisons would succeed at the identity check. A new, empty hash table is used for every type equivalence check to avoid interference between concurrent type equivalence checks.

Since, for a given compilation unit and therefore each copy of a TYPE, the random keys should be unique, the hash table can be keyed by always using either the first or second TYPE being compared. The end results should be that few collisions will occur and so inserting pairs into the hash table and searching for pairs will be on average O( 1 ). Given this, the overall complexity of a type equivalence check is O( N ) where N is the number of component types being compared. Thus, comparing very large type representations from different compilation units should be relatively efficient.

An alternative approach to the hash table is to build trees of equivalent types. This idea originates in solving the union-find problem[3], solutions to which are essential in many applications of graph theory[9]. In this approach every TYPE has and extra field which can point to a tree root. Initially this will point to a TYPE's own tree root[8]. Since this technique requires modifying TYPEs, it is only used to remember pairs of types once an initial comparison succeeds. Any potential interference between tree merging operations can be ignored since its sole function is to optimise future checks and cannot cause a check to succeed in error.

3. Conclusions

The new type equivalence checking algorithm is able to accommodate recursive types and can be employed in programming languages with type systems of the complexity of Napier88. The algorithm also exhibits desirable properties in the context of persistence, concurrency and distribution. Structured type representations are supported so fully annotated type representations may be used with browsers[4] and hyper-programming environments with a persistent environment[5,6]. Multiple copies of type representations do not impose severe performance overheads. Furthermore, copies of a type's representation are not unnecessarily retained by a global remembered set. TYPEs point to the tree root so the tree cannot cause a TYPE to be retained unnecessarily. Finally, the algorithm is able to operate in a concurrent environment without the need for synchronisation. The use of a synchronisation mechanism is purely an efficiency issue and does not impact on the correctness of the algorithm.

4. References

1. Atkinson M.P., Bailey P.J., Chisholm K.J. Cockshott W.P. & Morrison R. An Approach to Persistent Programming. The Computer Journal, vol. 26, no. 4, 1983, pp360-365.

2. Connor, R.C.H. Types and Polymorphism in Persistent Programming Systems (PhD Thesis), Department of Computer Science, University of St.Andrews, CS/91/3, 1991.

3. Cormen, Leirson, Rivest. Chapter 22, Introduction to Algorithms, MIT Press, 1990, pp440-464.

4. Dearle, A. & Brown, A.L. Safe Browsing in a Strongly Typed Persistent Environment., Computer Journal, Vol. 31, No. 6, (December 1988), 540-545.

5. Farkas, A. & Dearle, A. The Octopus Model and its Implementation. Technical Report PS-17, Department of Computer Science, University of Adelaide. August 1993.

6. Kirby, G.N.C. (Ph.D. Thesis) Reflection and Hyper-Programming in Persistent Programming Systems, University of St.Andrews, 1992.

7. Morrison R., Brown A.L., Connor R. & Dearle A. The Napier88 Reference Manual. Universities of Glasgow and St.Andrews PPRR-77, Scotland, 1989.

8. Crawley, S. A New Napier88 Type Checker, Technical Report, D.S.T.O. Salisbury, South Australia, Australia.

9. Aho A.V. & Ullman J.D. Spanning Trees, Foundations of Computer Science, C Edition, Chapter 9, Computer Science Press, 1995, pp466-484.