Common Genius

The online technical home of David Nelson
Welcome to Common Genius Sign in | Join | Help
in Search

Technical Articles

Collection Initializers and Duck Typing

Recently I came across a blog post by Mads Torgersen, the project manager for the C# language team at Microsoft. In it he talks about collection initializers, one of the new language features of the upcoming version 3.0 of C#. Essentially, it allows you to initialize a new collection instance using set-based syntax instead of functional syntax. Check out the post for a more complete description and examples of the feature.

What most interested me from the post was the method by which the compiler will determine how to add new elements to the collection. It does not use the ICollection<T>.Add method, which would be the most obvious method for a collection initializer. Mads' post touches on one of the reaons why not, and I have added two more:

1. Classes built prior to .NET 2.0 do not implement ICollection<T>, so they could not use collection initializers, even if they followed the standard collection pattern and implemented a strongly-typed Add method. A large number of the classes in the .NET Base Class Library (BCL) fall into this category (in fact, there are only 14 public instantiable classes which implement ICollection<T>).
2. Many collection classes can have items arbitarily added to them, but not removed. Examples are Queue, Stack, and XmlNameTable. These classes do not implement ICollection<T>, presumably because the interface requires a Remove(T) method (although the method could have been implemented explicitly, and throw a NotSupportedException; I am not a big fan of this pattern, but it has been used elsewhere in the BCL). Since they don't implement the interface, they would be unable to use collection initializers, even though they behave like collections for the purpose of adding items.
3. Classes that implement ICollection<T> might prefer to use a method other than Add(T). Dictionaries are the prime example; Dictionary<TKey, TValue> implements ICollection<KeyValuePair<TKey,TValue>> (including an explicitly implemented Add(KeyValuePair<TKey, TValue>) method), but uses Add(TKey, TValue) as its primary Add method, which is more usable.

The solution outlined for this problem is an application of "duck typing". Basically, the idea is that a class doesn't have to implement ICollection<T> (or even ICollection for that matter) to be treated like a collection for the purposes of this feature; it just has to look like a collection (hence the term "duck typing": "if it walks like a duck, and talks like a duck..."). This results in a new implementation for collection initializers: instead of implementing ICollection<T>, a class must implement IEnumerable<T> and have a public Add method to use the feature. The compiler essentially expands the collection initializer into a series of Add method calls, and then uses the same method resolution rules used for method calls (its actually a little more complicated than that; it will also look for explicit implementations of ICollection.Add, and a few other special cases). Using this approach eliminates problems 1 and 3 in the list above; any public Add method can be used in collection initializer syntax. It seems that this is an improvement over the use of ICollection<T>.

However, there is an underlying assumption to this implementation which is extremely troublesome, and results in significant drawbacks that, in my opinion, outweigh the positives that the feature brings to the language. The assumption is any class which implements IEnumerable and has a public Add method is a collection.

I will grant that the vast majority of the time, this will be true. But what about when it isn't? There are some cases where such an assumption could produce some highly non-intuitive results. For example, consider this class, a custom wrapper around Delegate called MyMulticastDelegate. It implements IEnumerable<<T> (where T is Delegate), and has a public Add method. However, instances of MyMulticastDelegate, like instances of Delegate, are designed to be immutable; attempts to change them actually create new instances. So, although the compiler would allow syntax such as "MyMulticastDelegate mmd = new MyMulticastDelegate { new EventHandler(MyEventHandler) };", the resulting mmd instance would be empty. The delegate for MyEventHandler was added to a new instance which, although it was returned by the Add method, was discarded by the compiler.

The situation gets worse when you consider that the class in question may not even have been developed in C#. The developer may be completely unaware of the collection initializer syntax in C# and have no idea of the consequences of his design decision. How could he? It is an entirely language-specific feature, yet it is operating on code developed outside the language. This is one of the consequences of creating a system where languages inter-operate: the languages have to play on common ground, and when one decides to go its own way, problems arise.

Some might say "Just don't call the method Add; call it Combine or something instead." That would indeed avoid the problem. But it also points out the root of the flaw in this assumption: it should never be the compiler's job to assign semantic meaning to identifier names. Some languages do this. However, it has always been a strength of C-style languages that the developer has complete control over his own code, and can create whatever implementation he deems appropriate to the circumstances. What if Add makes the most sense in my situation, but my class isn't actually a collection? Or what if it is a collection, but I don't want to call the method Add, because Enqueue or Push would be more appropriate?

Mads points out that this kind of semantic assignment, or what he calls a "pattern based approach," already exists in C#: the foreach statement does not require the target to implement IEnumerable, any GetEnumerator method which returns an IEnumerator instance will do. However, he also notes that "not everybody realizes it." In fact, in my experience most developers don't realize it, and for good reason. It is a foreign concept to C-style development, and it is a slippery slope. Why require the IDisposable interface for the using keyword? Wouldn't any Dispose method do? This is not a good path to head down. Pattern analysis is what static code analysis tools (such as FxCop) are for; it has no place in a compiler.

Is there a better way? Several comments were made on Mads' blog about potential solutions. One potential solution is to create a keyword that denotes a particular method as a collection initializer. This seems appropriate, since keywords are language-specific, and this is a language-specific feature. It would also deal with problem 2 above, which the pattern based approach does not. However, it has a fatal flaw: since there is no support for this feature in the CLR or BCL, the semantics of the keyword would be lost as soon as the class is compiled. Suggestions were also made that attributes be used, or a new interface be created which has an Add method but not a remove method (ICollector was suggested). These also solve problem 2, although an interface would not solve problem 3. Both would both be better than the pattern approach. However, they would also require changes not just to the C# language, but to the BCL as well. Since collection initialization is a language-specific feature, and the C# language team does not have control over the framework as a whole, this could be problematic. All three of these alternatives also suffer from problem number 1: existing classes would not have the necessary characteristics to participate in collection initialization, severly limiting the usefulness of the feature.

So what should be done? In my opinion, collection initialization does not add significant benefit to the language. Most of the time, I don't initialize my collections from a static list; that's what arrays are for, and they already have set-based initialization syntax. And dynamic lists can't use collection initialization anyway. Since the only architecturally "correct" ways of implementing the feature would either require a massive update of the BCL or result in a nearly useless feature, my vote is to put the whole thing on hold until it can be done right, and focus instead on more beneficial areas of language development. Interestingly, the most recent published revision of the C# 3.0 language specification (May 2006) still states: "The collection object to which a collection initializer is applied must be of a type that implements System.Collections.Generic.ICollection<T> for exactly one T." This indicates that this was a relatively recent design decision, which hopefully means there is still time to reverse it.

Published Thursday, December 07, 2006 11:23 PM by dnelson

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

 

Pxtl said:

This sort of behaviour is hardly unique.  Much of DotNet seems unable to decide whether it's a duck-typed language (and thus interfaces are a silly concept) or a static-typed language (and thus such identifier-based nonsense is absurd).

Duck Typing also exists in the XML Serializer library.

If you have xmlelement field

public int Foo;

then Foo is serialized as an int.

However, if you have a second field

public bool FooSpecified

then Foo is serialized as an optional int, using the "~Specified" field to determine whether to include Foo in the XML data.  This is the only way to make optional fields/properties in XML (since XMLSerializer was implemented before and does not support nullables).

The fact is that the DotNet designers are not purists - they are extremely pragmatic.  In most other fields of software development, that is an asset... however, in language design, it leads to muddy, complicated, bizarre designs like what we have here.

April 5, 2007 5:09 PM
 

dnelson said:

You're right, it definitely seems like the designers of the C# language, and the .NET platform in general, are "The Next Big Thing" oriented. Dynamic languages have been all the rage lately, so Microsoft decides to try to shove dynamic elements into C#, despite the painfully obvious fact that they don't belong there. If I wanted to use a dynamic language, I would; I use C# specifically because I DON'T want to use a dynamic language. Hopefully this trend reverses itself before C# becomes a completely unusable mess.

November 1, 2007 5:50 PM
 

Fabio said:

I don't know what's the big deal here.

You can't save programmers doing harm to themselves. This is just syntatic sugar. If the programmer is a moron, he can still copy and paste several lines of Add() method calls and do the same mistake on your MyMulticastDelegate sample.

If I were developing the language, I'd rather assume my users are smart enough to only use that kind of utility if they know what they're doing. Just like operator overloading, auto(un)boxing, by-ref parameters, stack-allocated structures.

But, to be fair, I also think MS sometimes makes a big fuss on some of their new tools which can end up confusing people. This language feature should not be called "Collection Initializers", but maybe a more humble name, like "syntatic sugar for chain Add() method calls".

June 6, 2008 5:15 PM

Leave a Comment

(required) 
(optional)
(required) 
Submit
Powered by Community Server, by Telligent Systems