Tuesday, September 18, 2018

How may I read from a flat file that is more than 2 Gigs in size in C#?

The only way to do this is to do so in chunks. I would recommend rewriting the flat file to a series of smaller flat files and then doing some operations with two chunks at a time to deal with breaks in XML formatting or other lack-of-a-clean-break dirtiness between the chunks that may arise. Below is some logic for breaking up a file into smaller files based on this.

public async void DivideAndConjure(InitialDetails initialDetails, Action<Alert> alertAction,
      long length)
{
   try
   {
      int shortLength = (int)Math.Ceiling((decimal)length / 26);
      string[] pathParts = initialDetails.FromPath.Split('\\');
      string[] nameParts = pathParts[pathParts.Length - 1].Split('.');
      string stringFormat = initialDetails.ToPath + "\\" + nameParts[0] + "{0}." +
            nameParts[1];
      for (int counter = 0; counter < 26; counter++)
      {
         char character = (char)(65 + counter);
         string toPath = String.Format(stringFormat, character);
         await Task.Run(() => { Write(initialDetails.FromPath, alertAction, counter,
               shortLength, stringFormat, toPath); });
         alertAction(new Alert(String.Format("Temp file written to: {0}", toPath),
               TypeOfAlert.Success));
      }
      alertAction(new Alert(String.Format("{0} has been chunked to: {1}",
            initialDetails.FromPath, initialDetails.ToPath), TypeOfAlert.Success));
   }
   catch (Exception exception)
   {
      alertAction(new Alert(exception));
   }
}

 
 

You have to break the writing of the individual chunks out onto a separate thread for each or else the process will crash partway in the same way it would if you just tried to read a file that is too big. Below is my subtask and do also note the casting to an Int64 type below. Without the casting, I was getting weird errors about how I should be using a positive number instead of a negative number and this was due me counting an Int32 variable so high that it past its upper bounds and then lapped into the lower bounds of its range. Wacky!

private void Write(string fromPath, Action<Alert> alertAction, int counter, int shortLength,
      string stringFormat, string toPath)
{
   try
   {
      byte[] bytes = new byte[shortLength];
      using (FileStream inStream = new FileStream(fromPath, FileMode.Open,
            FileAccess.Read))
      {
         inStream.Position = (((Int64)shortLength) * counter);
         inStream.Read(bytes, 0, shortLength);
      }
      using (FileStream outStream = new FileStream(toPath, FileMode.Create,
            FileAccess.Write))
      {
         outStream.Write(bytes, 0, bytes.Length);
      }
   }
   catch (Exception exception)
   {
      alertAction(new Alert(exception));
   }
}

 
 

You may realize that with the Math.Ceiling trick that the numbers may not line up so nicely for the Z slice. That's alright. It's always alright to overshoot the size of a bytes array when you are hydrating it in these file slurping acrobatics. That won't throw an error or hurt anything. The file that gets written won't be sick.

No comments:

Post a Comment