Wednesday, 16 May 2007

Code Content Assistant Efficiency

Purpose of this article is to evaluate efficiency of content assistant functionality provided by Eclipse with focus on type name completion. The results will be of course applicable not just for type names but also for field and method names.

Content assistant functionality
Eclipse content assistant is simply suggesting code completions while writing code considering to concrete context - type names, field names, method names, ... It also provides excellent feature called "camel case matches" (NPE matches to NullPointerException) which is also object of my evaluation.

Content assistant efficiency definition
Content assistant efficiency is number, which says how many times more you will get considering to what you spend. When we are talking about type names then this number is defined as typeNameLength/price. Where typeNameLength is simply the number of letters in the type name. The price is lowest number of letters and key presses (arrow down in suggestion list + Enter) necessary to be pressed to generate type name using content assistant.

For example class NullPointerException has 20 letters but using content assistant we can generate the name by 5 key presses (typing "NPE" + one down arrow press, because NoPermissionException is also in the list + Enter). It means that in case of NullPointerException class the efficiency of content assistant is 20/5=4. Basically it means that the result is four time bigger than the effort to generate it.

The example is talking just about one class. But it can be also applicable for set of classes (eg. jar) simply counting their lengths together and dividing by counted prices.

How I evaluated
Of course the biggest challenge was to find lowest number of letters and key presses necessary to be pressed to generate the type name. For this purpose I wrote simple application which for 3905 patterns (maximum 5 words with maximum 5 letters) distributes classes by name into groups where classes in one group matches the pattern with same result. After alphabetical sort of the classes in the group it counts price for every class as length of group id + position in the sorted list (first = 0) + 1 (Enter). The lowest price for a class over all groups is the number what we are looking for.

Example:
Pattern: ^([A-Z])[^A-Z]([A-Z])[^A-Z]([A-Z])[^A-Z].*$
Group Id: NPE
Sorted classes: NoPermissionException, NullPointerException
Prices: 3+0+1=4, 3+1+1=5

Pattern: ^([A-Z][^A-Z]{1})[^A-Z]([A-Z])[^A-Z]([A-Z])[^A-Z].*$
Group Id: NoPE
Sorted classes: NoPermissionException
Prices: 4+0+1=5

Lower price for NoPermissionException class is in the first group so it is the lowest price.

For evaluation I used classes from seven jars (rt.jar, catalina.jar, xercesImpl.jar, derby.jar, xalan.jar, axis.jar, junit.jar) with total number of 12707 classes.

Results
As we can see in the table below the content assistant efficiency counted for all seven jars together is 3.20.

It means that using content assistant is about three times more effectively than typing everything. The second benefit of content assistant usage is higher quality of code, because it is harder to make mistakes.

All this can be true just in case that the user uses content assistant effectively. About how to really use it and what are the best practises see TBD - article about effective useage is in progress.

Attachment
On chart in the image bellow we can see a histograms of class name length and class price. It shows number of classes with concrete length (blue) and number of classes for price (red). We can see the efficiency of the content assistant as a peek of red bars in comparison with blue "hill".

AddThis Feed Button 0 comments: