Lots of editors and IDEs have code completion. Some of them are very "intelligent" others are not really. I am interested in the more intelligent type. For example I have seen IDEs that only offer a function if it is a) available in the current scope b) its return value is valid. (For example after "5 + foo[tab]" it only offers functions that return something that can be added to an integer or variable names of the correct type.) I have also seen that they place the more often used or longest option ahead of the list.
I realize you need to parse the code. But usually while editing the current code is invalid there are syntax errors in it. How do you parse something when it is incomplete and contains errors?
There is also a time constraint. The completion is useless if it takes seconds to come up with a list. Sometimes the completion algorithm deals with thousands of classes.
What are the good algorithms and data structures for this?
The IntelliSense engine in my UnrealScript language service product is complicated, but I'll give as best an overview here as I can. The C# language service in VS2008 SP1 is my performance goal (for good reason). It's not there yet, but it's fast/accurate enough that I can safely offer suggestions after a single character is typed, without waiting for ctrl+space or the user typing a .
(dot). The more information people [working on language services] get about this subject, the better end-user experience I get should I ever use their products. There are a number of products I've had the unfortunate experience of working with that didn't pay such close attention to details, and as a result I was fighting with the IDE more than I was coding.
In my language service, it's laid out like the following:
aa.bb.cc
, but can also contain method calls as in aa.bb(3+2).cc
.IDeclarationProvider
, where you can call GetDeclarations()
to get an IEnumerable<IDeclaration>
of all items visible in the scope. In my case, this list contains the locals/parameters (if in a method), members (fields and methods, static only unless in an instance method, and no private members of base types), globals (types and constants for the language I'm working on), and keywords. In this list will be an item with the name aa
. As a first step in evaluating the expression in #1, we select the item from the context enumeration with the name aa
, giving us an IDeclaration
for the next step.IDeclaration
representing aa
to get another IEnumerable<IDeclaration>
containing the "members" (in some sense) of aa
. Since the .
operator is different from the ->
operator, I call declaration.GetMembers(".")
and expect the IDeclaration
object to correctly apply the listed operator.cc
, where the declaration list may or may not contain an object with the name cc
. As I'm sure you're aware, if multiple items begin with cc
, they should appear as well. I solve this by taking the final enumeration and passing it through my documented algorithm to provide the user with the most helpful information possible.Here are some additional notes for the IntelliSense backend:
GetMembers
. Each object in my cache is able to provide a functor that evaluates to its members, so performing complicated actions with the tree is near trivial.List<IDeclaration>
of its members, I keep a List<Name>
, where Name
is a struct containing the hash of a specially-formatted string describing the member. There's an enormous cache that maps names to objects. This way, when I re-parse a file, I can remove all items declared in the file from the cache and repopulate it with the updated members. Due to the way the functors are configured, all expressions immediately evaluate to the new items.IntelliSense "frontend"
As the user types, the file is syntactically incorrect more often than it is correct. As such, I don't want to haphazardly remove sections of the cache when the user types. I have a large number of special-case rules in place to handle incremental updates as quickly as possible. The incremental cache is only kept local to an open file and helps make ensure the user doesn't realize that their typing is causing the backend cache to hold incorrect line/column information for things like each method in the file.
Code snippet for the previous section:
class A
{
int x; // linked to A
void foo() // linked to A
{
int local; // linked to foo()
// foo() ends here because bar() is starting
void bar() // linked to A
{
int local2; // linked to bar()
}
int y; // linked again to A
I figured I'd add a list of the IntelliSense features I've implemented with this layout. Pictures of each are located here.