Problem:
I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection.
The search method will occur every time the user change the text of a TextBox
and the result should be the strings that contain the text in TextBox
.
I don't have to change the list, just pull the results and put them in a ListBox
.
What I've tried so far:
I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course):
List<string> allUsers;
HashSet<string> allUsers;
With the following LINQ query:
allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
My search event (fires when user change the search text):
private void textBox_search_TextChanged(object sender, EventArgs e)
{
if (textBox_search.Text.Length > 2)
{
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
}
else
{
listBox_choices.DataSource = null;
}
}
Results:
Both gave me a poor response time (around 1-3 seconds between each key press).
Question:
Where do you think my bottleneck is? The collection I've used? The search method? Both?
How can I get better performance and more fluent functionality?
You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed.
The general idea is to be able to use it like this:
public partial class YourForm : Form
{
private readonly BackgroundWordFilter _filter;
public YourForm()
{
InitializeComponent();
// setup the background worker to return no more than 10 items,
// and to set ListBox.DataSource when results are ready
_filter = new BackgroundWordFilter
(
items: GetDictionaryItems(),
maxItemsToMatch: 10,
callback: results =>
this.Invoke(new Action(() => listBox_choices.DataSource = results))
);
}
private void textBox_search_TextChanged(object sender, EventArgs e)
{
// this will update the background worker's "current entry"
_filter.SetCurrentEntry(textBox_search.Text);
}
}
A rough sketch would be something like:
public class BackgroundWordFilter : IDisposable
{
private readonly List<string> _items;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
private readonly Thread _workerThread;
private readonly int _maxItemsToMatch;
private readonly Action<List<string>> _callback;
private volatile bool _shouldRun = true;
private volatile string _currentEntry = null;
public BackgroundWordFilter(
List<string> items,
int maxItemsToMatch,
Action<List<string>> callback)
{
_items = items;
_callback = callback;
_maxItemsToMatch = maxItemsToMatch;
// start the long-lived backgroud thread
_workerThread = new Thread(WorkerLoop)
{
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
_workerThread.Start();
}
public void SetCurrentEntry(string currentEntry)
{
// set the current entry and signal the worker thread
_currentEntry = currentEntry;
_signal.Set();
}
void WorkerLoop()
{
while (_shouldRun)
{
// wait here until there is a new entry
_signal.WaitOne();
if (!_shouldRun)
return;
var entry = _currentEntry;
var results = new List<string>();
// if there is nothing to process,
// return an empty list
if (string.IsNullOrEmpty(entry))
{
_callback(results);
continue;
}
// do the search in a for-loop to
// allow early termination when current entry
// is changed on a different thread
foreach (var i in _items)
{
// if matched, add to the list of results
if (i.Contains(entry))
results.Add(i);
// check if the current entry was updated in the meantime,
// or we found enough items
if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
break;
}
if (entry == _currentEntry)
_callback(results);
}
}
public void Dispose()
{
// we are using AutoResetEvent and a background thread
// and therefore must dispose it explicitly
Dispose(true);
}
private void Dispose(bool disposing)
{
if (!disposing)
return;
// shutdown the thread
if (_workerThread.IsAlive)
{
_shouldRun = false;
_currentEntry = null;
_signal.Set();
_workerThread.Join();
}
// if targetting .NET 3.5 or older, we have to
// use the explicit IDisposable implementation
(_signal as IDisposable).Dispose();
}
}
Also, you should actually dispose the _filter
instance when the parent Form
is disposed. This means you should open and edit your Form
's Dispose
method (inside the YourForm.Designer.cs
file) to look something like:
// inside "xxxxxx.Designer.cs"
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_filter != null)
_filter.Dispose();
// this part is added by Visual Studio designer
if (components != null)
components.Dispose();
}
base.Dispose(disposing);
}
On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution.
That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character.