How To handle Huge Data sets in C# and Entity Framework.

Praveen Sambu
2 min readFeb 18, 2020

--

When we are dealing with huge data sets like more than 1 million records and processing them for doing some logical operations which is common in various domains like Health sector , E-Commerce, Banking…etc.. We get lot of troubles like Out of Memory exceptions or process gets hanged up.

Here are few tips to solve those kind of issues. Which I have resolved while working with huge data set.

  • When I was reading data from other web request then I got the below exception

System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.at System.Net.Http.StreamToStreamCopy.BufferReadCallback(IAsyncResult ar)

We get this BufferReadCallBack as the response is huge. we need to change this setting in web.config to allow max data as response. the Max limit is what I mentioned.

<system.webServer>
<security>
<requestFiltering>
<requestLimits maxAllowedContentLength=”4294967295"/>
</requestFiltering>
</security></system.webServer>

Now I break down my result set into chunks based on the availability of memory , This below code would split the list into collection of lists which we can iterate using same logic.

private static List<List<T>> Split<T>(List<T> collection, int size)
{
var chunks = new List<List<T>>();
var chunkCount = collection.Count() / size;

if (collection.Count % size > 0)
chunkCount++;

for (var i = 0; i < chunkCount; i++)
chunks.Add(collection.Skip(i * size).Take(size).ToList());

return chunks;
}

This was the same problem for Entity Data sets when there was huge data set and processing using Entity I got below exception and I have resolved using the split method and processed. Some times even Linq queries result in out of memory.

System.OutOfMemoryException: Exception of type ‘System.OutOfMemoryException’ was thrown.at System.Data.Common.DecimalStorage.SetCapacity(Int32 capacity)at System.Data.RecordManager.set_RecordCapacity(Int32 value)at System.Data.RecordManager.GrowRecordCapacity()at System.Data.RecordManager.NewRecordBase()at System.Data.DataTable.NewRecordFromArray(Object[] value)at System.Data.DataRowCollection.Add(Object[] values).

Tips I followed :

  1. Used IDisposable Interface to class and dispose unnecessary objects from Memory and have destructor call the dispose() method.
  2. Forcefully cleared memory using GC.Collect(); and GC.WaitForPendingFinalizers(); which cleared lot of memory and provided space for new collections.
  3. use Using for all the persistent connections like DB or File readings.

using (HttpClient client = new HttpClient())
{
string url = “Your URL”;
using (HttpResponseMessage response = await client.GetAsync(url, HttpCompletionOption.ResponseHeadersRead))
using (Stream streamToReadFrom = await response.Content.ReadAsStreamAsync())
{
var variable = await DeserializeStream(streamToReadFrom, strFileType);
}
}

4.Stop creating memory inside the for each to create new instances on the fly. instead I used struct or variables or Tuples.

With This approach I was able to process huge data sets easily.

--

--

Praveen Sambu
Praveen Sambu

Written by Praveen Sambu

Software Engineer |AWS Community Builder |Technical Blogger | Trainer . Founder of Cloud In Detail (https://cloudindetail.com/) still working on building blog…

Responses (1)