Thursday, May 1, 2008

Tradeoffs for Type Inference

Here's a nice essay by Chris Smith: What to Know Before Debating Type Systems. Smith gives a nice introduction to some of the ideas and terminology for programming language type systems, aimed at the perennial static vs. dynamic debate. I quite like that he dismisses the strong vs weak typing distinction as meaningless. Some terminology that I'd have like to have seen is that of untyped languages and type safe languages, since so-called dynamic typing is just that: untyped, but type safe through dynamic checking. I suppose that would have caused a lot of, e.g., Python users to stop reading, which would be a shame.

For the most part, I think Smith is right on with his presentation, although there are a few points where I'd quibble about details. However, I think that type inference warrants some further consideration. As Smith states, "Type inference is generally a big win." I fully agree with that, but there are some details worth making explicit.

First, let's distinguish type inference from what we might call type propagation or local type inference. Type inference is figuring out the types of the expressions from the expressions themselves. Type inference is seen mostly in functional programming languages such as Standard ML, Objective Caml, and Haskell. For the most part, you don't need to write down the type signature for a function, you just define it and the compiler can work out the types. Despite this, it is common to give the type signatures for functions anyway, since the types provide a great deal of information. The type information is useful for understanding how to call and compose functions, which is obviously important for functional programming. Further, declaring the type for a function can be a useful design aid, allowing a correctly typed function stub to be provided that will just raise an exception when called. In Standard ML, such a stub is defined like fun stub x = raise Fail "Not implemented". The inferred type of stub is 'a -> 'b, which is too general to be useful in any real function. By manually giving a type to stub, you can use it to define other functions and the compiler will be able to do more useful checks.

Type propagation is using known types to figure out the types of as many expressions as possible. The latter might take the form of declaring the types for functions and allowing the compiler to work out the types for all the expressions within the function bodies. If you would normally write out the types for functions anyway, you aren't really out much. You might have to write a few more type signatures, since you'd probably skip the signatures for little helper functions, but those types should be easy, anyway. There can also be gains from this approach, since you can use a more expressive type system for which type inference might not be decidable. Such an approach is taken in the local type inference for Scala. There's no guarantee that types can be inferred, but they usually can be so you don't need to write type declarations for every variable. On the other hand, by softening the guarantees for inference, types can be used to do more. It can even fit in with untyped languages for optimization purposes; my understanding is that this was a big part of how Dylan compilers were able to produce fast executables.

As a whole, having type inference provides not just benefits, but also places restrictions on the type system. Improving the type system may well make it worth giving up a general type inference algorithm in favor of weaker type propagation. The marginal cost of annotating some extra functions with types is just not a high price to pay. Turning it around, one really has the worst of both worlds with languages like C, C++, and Java: you have to declare the type of every variable, but it really doesn't provide you with any benefit.

(Via Lambda the Ultimate.)