A frequent request I have when consulting is to figure out why the clients system is running slow. Of course there can be many root causes (and usually are ) but a fairly common one I see is in .NET applications are from using Entity Framework and LINQ without really understanding what is actually happening under the hood.
Don’t get me wrong. Entity Framework is a great tool and I use it all the time, but like any tool you can get in trouble if you do not give thought to what is actually happening in your code.
Here is a common example line of code I see
var data = entityDB.MyVeryLargeTable.ToList()
.Where(r => r.GroupID = idForGroupOfThreeRecords);
How many records do you expect from the database? I have to assume the person who wrote this thinks three. But here is what is really happening
- First, the database is being queried for all the rows in the table.
- Then .NET gets all the rows and loads them into memory.
- .NET loops through all the records one at a time filtering out the ones that match.
There are a few problems this creates for us.
- First of all, we are pulling back an extremely large amount of data. If this is being pulled over the network , which it usually is, this can slow us down right there.
- All this data will be stored in memory. This takes up valuable resources but even worse if you run out of memory the system will start writing to disk which is one of the slowest things you can do.
- Looping through records one at a time is slow when dealing with hundreds or thousands of records. Keep in mind that your database is tuned for and therefore much faster and more efficient at filtering records.
Is there a better way? Sure, but first let’s understand why this is happening.
They key to understanding this is to understand the difference between IQueryable and IEnumerable.
IQueryable like it sounds is used to store queries. There is no data, just the information that is needed to describe the data you want. Simply put this is your query. You can add where clauses, joins and filter but at the end of the day it’s only a query and is very small.
IEnumerable contains data that you can enumerate through. A perfect example is the List object. All the data is there along with functions for going through it one step at a time. The amount of memory this object takes is directly proportionate to the size of the data it stores.
OK, knowing that let’s write the query again to make it run faster.
var data = entityDB.MyVeryLargeTable
.Where(r => r.GroupID = idForGroupOfThreeRecords)
So what did I do? All I did was move the ToList() to be the last thing I do. I want my LINQ to stay IQueryable as long as I possibly can.
Here is what is happening in this query now.
- Entity Framework builds a query filled with a where clause.
- ToList() is called. Moving to IEnumberable from IQueryable causes the query to be sent to the database.
- The Database delivers three records.
- And we are done. We have three records in memory. No more work to do.
So how did this help us with our problems.
- We are only pulling back three records. That will go over the network really fast.
- Three records will take up very little space in memory.
- All out iterating is done on the database. The database has been fine tuned to perform these kind of actions and can do it faster than we can.
In conclusion, always be aware of your LINQ to Entities queries and what they are doing behind the scenes. Know the difference between IQueryable and IEnumerable, and always stay IQueryable as long as you possible can.