I’ve recently had a need for users of an intranet application to upload comparatively large text files (1-200MB) to a web server. There are only a couple of ways I can think of to get around the limits imposed by IIS and ASP.NET without writing code : train the users to upload smaller files which could be concatenated at the server, or allow the users to “upload” from a network share which the server has access to. These are obviously inelegant solutions, and after a little research I’ve found that the necessary code to enable large uploads yourself is surprisingly easy to write.
Jon Galloway has written a useful article which gives a more complete and eloquent view of the ASP.NET large file upload problem than I have time for here. Particularly of interest is the discussion he links to, titled “HttpHandler or HttpModule for file upload, large files, progress indicator?”. I’ve adapted or rewritten some of the suggested code from there for my solution. I’m particularly indebted to Travis Whidden whose code is much more complete than mine (it handles multiple files, for a start). Part of the reason I ended up rewriting it is because I didn’t think this solution handled a particular edge case, but I also needed to understand what it was doing, and the best way for me to do that was rewrite from scratch (this took me about a day of thinking, poking around and testing – I don’t claim to be a fast learner
.
Essentially the code consists of two classes, UploadModule and RequestProcessor, along with some minor changes in the web.config file :
UploadModule
In order to intercept the HttpRequest and deal with it in a stream-wise fashion, we have to implement an IHttpModule :
using System;
using System.Diagnostics;
using System.Text;
using System.Web;
using System.Reflection;
namespace MyIntranetSite
{
public class UploadModule : IHttpModule
{
#region IHttpModule Members
void IHttpModule.Dispose()
{
}
void IHttpModule.Init(HttpApplication context)
{
context.BeginRequest += new EventHandler(context_BeginRequest);
}
void context_BeginRequest(object sender, EventArgs e)
{
HttpApplication application = (HttpApplication) sender;
HttpContext context = application.Context;
if (context.Request.ContentType.IndexOf("multipart/form-data") == -1)
{
//Not our bag, baby.
return;
}
try
{
HttpWorkerRequest workerRequest = (HttpWorkerRequest) context.GetType().GetProperty("WorkerRequest", BindingFlags.Instance | BindingFlags.NonPublic).GetValue(context, null);
if (workerRequest.HasEntityBody())
{
long defaultBuffer = 500000;
long contentLength = long.Parse((workerRequest.GetKnownRequestHeader(HttpWorkerRequest.HeaderContentLength)));
byte[] preloadedBufferData = workerRequest.GetPreloadedEntityBody();
RequestProcessor rp = new RequestProcessor(contentLength);
rp.ReadBuffer(ref preloadedBufferData);
long remaining = contentLength - preloadedBufferData.Length;
byte[] bufferData;
while (remaining > 0)
{
bufferData = new byte[(remaining > defaultBuffer)? defaultBuffer : remaining];
remaining -= bufferData.Length;
workerRequest.ReadEntityBody(bufferData, bufferData.Length);
rp.ReadBuffer(ref bufferData);
}
}
}
catch(Exception ex)
{
EventLog.WriteEntry("Custom ASP.NET Upload Module", ex.Message);
}
context.Response.Redirect(context.Request.RawUrl);
}
#endregion
}
}
This handles ALL “multipart/form-data” requests at the moment. You would probably want to check the url of the request and match it against a list of expected pages, otherwise all of your web requests (that is, every postback) would get processed by this module, and much of your code would be bypassed!
RequestProcessor
This class takes a series of byte array buffers and parses them for a start and end pattern. Hopefully, I’ve commented my code well enough for it to be read and understood. However, I should say that this only deals with single file uploads at the moment and expects to be able to write to “C:\temp\”. It would be possible to improve the code to handle multiple files and making the upload directory configurable would be fairly trivial, but I think it’s more useful as a learning tool if I keep it simple for now.
using System;
using System.Diagnostics;
using System.Collections.Generic;
using System.IO;
using System.Text;
namespace MyIntranetSite
{
//
/// <summary>
/// Takes byte[] chunks from an HTTP request and processes them looking for (currently) the first file.
/// Each file will be wrapped by the lines :
/// (start) "Content-Type: [some content type]\r\n"
/// (end) "-----------------------------[a form post ID]\r\n\r\n"
/// (that's 29 "-"s followed by a number, followed by 2 * carriage return + newline
///
/// Problems arise because the start and end patterns could span two buffers.
/// This means we can't write from the latest buffer - we have to always be writing from the previous buffer,
/// since we can never know if the latest buffer (assuming there are more bytes to read) contains the start of the
/// end pattern, but not all of it.
/// </summary>
public class RequestProcessor : IDisposable
{
public long Length { get; private set; }
public long BytesRead { get; private set; }
public List<string> FinishedFiles = new List<string>();
private BufferChunk previous;
private bool _startFound = false;
private bool _endFound = false;
private List<byte> startPatternBegin;
private List<byte> startPatternEnd;
private List<byte> endPattern;
private FileStream currentFileStream;
private string currentFileName = Guid.NewGuid() + ".bin";
public RequestProcessor(long length)
{
Length = length;
BytesRead = 0;
startPatternBegin = new List<byte>(Encoding.UTF8.GetBytes("Content-Type: "));
startPatternEnd = new List<byte>(Encoding.UTF8.GetBytes("\r\n\r\n"));
}
public void ReadBuffer(ref byte[] buffer)
{
if (_endFound) return;
BufferChunk current = new BufferChunk(ref buffer);
if (previous == null)
{
//first buffer chunk
//the first line of this will give the form content separator, which is also the endPattern
int i = 0;
endPattern = new List<byte>();
while (current.Data[i] != Encoding.UTF8.GetBytes("\r")[0])
{
endPattern.Add(current.Data[i]);
i++;
}
}
//Merge the previous and current buffers
List<byte> mergedBuffers = new List<byte>();
if (previous != null) mergedBuffers.AddRange(previous.Data);
mergedBuffers.AddRange(current.Data);
if (!_startFound)
{
//Look for start pattern in the current buffer.
//It could span this buffer and the one before (in which case the start point is in THIS buffer)
//or it could span this buffer and the next (in which case the start point is in the NEXT buffer)
//The latter case has to be checked when the next buffer comes in.
int startBegin;
if ((startBegin = FindBytePattern(mergedBuffers, startPatternBegin, 0)) != -1)
{
//found a content-type declaration, look for the end of that line :
int startEnd;
if ((startEnd = FindBytePattern(mergedBuffers, startPatternEnd, startBegin + startPatternBegin.Count)) != -1)
{
//found the end of the line
if (startEnd + startPatternEnd.Count < mergedBuffers.Count - 1)
{
int startByte = startEnd + startPatternEnd.Count;
if (previous != null)
{
current.Start = startByte - previous.Data.Count;
}
else
{
current.Start = startByte;
}
_startFound = true;
}
// else the start byte is in the next buffer.
}
}
if (!_startFound) current.Start = current.Data.Count;
}
if (_startFound && !_endFound)
{
//Look for the end pattern in the current buffer
//As with the start it could span beginning (in which case the last byte is in the PREVIOUS buffer)
//Or it could span the end (in which case the last byte is in THIS buffer)
//The latter case has to be checked when the next buffer comes in.
int endBegin;
int searchStart = previous != null? previous.Start : current.Start;
if ((endBegin = FindBytePattern(mergedBuffers, endPattern, searchStart)) != -1)
{
int endByte = endBegin - 1;
if (previous != null)
{
if (endByte < previous.Data.Count)
previous.End = endByte;
else
current.End = endByte - previous.Data.Count;
}
else
{
current.End = endByte;
}
_endFound = true;
}
// else the end byte is in the next buffer.
if (!_endFound && previous != null) previous.End = previous.Data.Count;
}
BytesRead += current.Data.Count;
//FILE CREATION
if (previous != null && _startFound && previous.WriteBytes > 0)
{
//Write out the previous buffer from Start to End
if (currentFileStream == null)
{
currentFileStream = File.OpenWrite(@"C:\temp\" + currentFileName);
}
currentFileStream.Write(previous.Data.ToArray(), previous.Start, previous.WriteBytes);
}
if (_startFound && _endFound || BytesRead == Length)
{
//Write out the current buffer from Start to End
if (currentFileStream == null)
{
currentFileStream = File.OpenWrite(@"C:\temp\" + currentFileName);
}
currentFileStream.Write(current.Data.ToArray(), current.Start, current.WriteBytes);
currentFileStream.Close();
currentFileStream.Dispose();
}
previous = current;
}
private static int FindBytePattern(List<byte> container, List<byte> pattern, int startIndex)
{
int i, position;
if (pattern.Count > container.Count - startIndex) return -1;
for (position = startIndex; position < container.Count; position++)
{
if (container[position] == pattern[0])
{
for(i = 1; i < pattern.Count; i++)
{
if (position + i == container.Count || pattern[i] != container[position + i]) break;
}
if (i == pattern.Count) return position;
}
}
return -1;
}
#region IDisposable Members
void IDisposable.Dispose()
{
if (currentFileStream != null)
{
currentFileStream.Close();
currentFileStream.Dispose();
}
}
#endregion
}
public class BufferChunk
{
public List<byte> Data;
public int Start;
public int End;
public int WriteBytes { get { return End - Start; } }
public BufferChunk(ref byte[] buffer)
{
Data = new List<byte>(buffer);
Start = 0;
End = Data.Count;
}
}
}
These files can sit within your web project or in a separate assembly if you want. Personally I’d rather have them sitting with the web project since that makes them more straightforward to debug and more obvious as to where they belong.
Web.Config changes
This is almost laughably trivial :
<httpModules>
<add name="UploadModule" type="MyIntranetSite.UploadModule"/>
</httpModules>
And that’s it! There’s obviously a lot more that could be done (such as the progress indicator Travis incorporated), but this seems like a decent start to me.
Ideally, ASP.NET 3.0(?) and IIS 7.0 would address this kind of problem once and for all, but I’m not holding my breath. I also suspect a lot of businesses will remain on IE6.0, IIS 5/6 and ASP.NET 2.0 for another few years, so this approach will remain relevant a little while longer.
Update (a warning)
It’s entirely possible that the parser will bomb out with some kind of error on occasion. If this happens when the number of bytes left to process is greater than the ASP.NET maxRequestLength (or the IIS request length) then the site will (seemingly) silently fail and you’ll get the dreaded “Connection was reset” error page!
Comment