<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Ad hoc Geek &#187; SqlBulkCopy</title>
	<atom:link href="http://www.adhocgeek.com/tag/sqlbulkcopy/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.adhocgeek.com</link>
	<description>Approaching geekery in an ad hoc and improvisational manner.</description>
	<lastBuildDate>Fri, 30 Sep 2011 09:43:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Playing with IDataReader and SqlBulkCopy</title>
		<link>http://www.adhocgeek.com/2009/08/playing-with-idatareader-and-sqlbulkcopy/</link>
		<comments>http://www.adhocgeek.com/2009/08/playing-with-idatareader-and-sqlbulkcopy/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 13:57:14 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Geek]]></category>
		<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[IDataReader]]></category>
		<category><![CDATA[sql]]></category>
		<category><![CDATA[SqlBulkCopy]]></category>

		<guid isPermaLink="false">http://www.adhocgeek.com/?p=108</guid>
		<description><![CDATA[For importing huge amounts of data into SQL Server, there&#8217;s really nothing quite like SqlBulkCopy. I&#8217;ve recently had a need to manipulate a (roughly) 330,000 line CSV file and import the results of that manipulation into a single table. Doing this record by record can take minutes, but with SqlBulkCopy, importing that many records can [...]]]></description>
			<content:encoded><![CDATA[<p>For importing huge amounts of data into SQL Server, there&#8217;s really nothing quite like <a href="http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx">SqlBulkCopy</a>. I&#8217;ve recently had a need to manipulate a (roughly) 330,000 line CSV file and import the results of that manipulation into a single table. Doing this record by record can take <em>minutes</em>, but with <a href="http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.aspx">SqlBulkCopy</a>, importing that many records can be done in about 4 seconds on my development machine (and it&#8217;s definitely not the fastest PC in the world).</p>
<p><strong>Out of Memory</strong></p>
<p>Originally I was reading in the file, manipulating the data and writing out another CSV file I could use with DTS. However, <a href="http://msdn.microsoft.com/en-us/library/system.data.sqlclient.sqlbulkcopy.writetoserver.aspx">SqlBulkCopy.WriteToServer</a> doesn&#8217;t take a CSV file directly, it only takes either a <a href="http://msdn.microsoft.com/en-us/library/system.data.datatable.aspx">DataTable</a>, <a href="http://msdn.microsoft.com/en-us/library/system.data.datarow.aspx">DataRow</a>[] or <a href="http://msdn.microsoft.com/en-us/library/system.data.idatareader.aspx">IDataReader</a>, so at first, while writing out the CSV file I was also building up a <a href="http://msdn.microsoft.com/en-us/library/system.data.datatable.aspx">DataTable</a> to pass to it. For a file like mine of only a few hundred thousand records, it wasn&#8217;t a big problem to build that <a href="http://msdn.microsoft.com/en-us/library/system.data.datatable.aspx">DataTable</a> in memory &#8211; it was only taking a few hundred MB &#8211; but it occurred to me that there could be a problem if the number of records increased modestly to a million or so. In fact, with a file of only 4 million records, I&#8217;d probably be looking at a <a href="http://msdn.microsoft.com/en-us/library/system.outofmemoryexception.aspx">System.OutOfMemoryException</a>.</p>
<p><strong>IDataReader</strong></p>
<p>The solution to this problem is to write a class which implements <a href="http://msdn.microsoft.com/en-us/library/system.data.idatareader.aspx">IDataReader</a> and pass this to SqlBulkCopy. There are <a href="http://www.google.co.uk/search?hl=en&#038;q=CSVDataReader+C%23">a few implementations out there already</a>, but I couldn&#8217;t find anything both free and in C#. I didn&#8217;t look terribly hard though, and I was curious to try writing a basic implementation myself just to see how difficult it would be.<br />
It turns out it&#8217;s not very difficult at all, it depends on how much effort you want to put in. For a simple spike like this I just wanted to see how long it took to implement enough of IDataReader for SqlBulkCopy to work so I could then see how much memory was being used. This is (part of) what I ended up with :</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;"><span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">class</span> CSVDataReader <span style="color: #008000;">:</span> IDataReader
<span style="color: #008000;">&#123;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> StreamReader stream<span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> Dictionary<span style="color: #008000;">&lt;</span><span style="color: #6666cc; font-weight: bold;">string</span>, <span style="color: #6666cc; font-weight: bold;">int</span><span style="color: #008000;">&gt;</span> columnsByName <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Dictionary<span style="color: #008000;">&lt;</span><span style="color: #6666cc; font-weight: bold;">string</span>,<span style="color: #6666cc; font-weight: bold;">int</span><span style="color: #008000;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> Dictionary<span style="color: #008000;">&lt;</span><span style="color: #6666cc; font-weight: bold;">int</span>, <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&gt;</span> columnsByOrdinal <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> Dictionary<span style="color: #008000;">&lt;</span><span style="color: #6666cc; font-weight: bold;">int</span>,<span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&gt;</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> currentRow<span style="color: #008000;">;</span>
    <span style="color: #0600FF; font-weight: bold;">private</span> <span style="color: #6666cc; font-weight: bold;">bool</span> _isClosed <span style="color: #008000;">=</span> <span style="color: #0600FF; font-weight: bold;">true</span><span style="color: #008000;">;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> CSVDataReader<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> fileName<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span><span style="color: #008000;">!</span>File<span style="color: #008000;">.</span><span style="color: #0000FF;">Exists</span><span style="color: #008000;">&#40;</span>fileName<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span>
            <span style="color: #0600FF; font-weight: bold;">throw</span> <span style="color: #008000;">new</span> Exception<span style="color: #008000;">&#40;</span><span style="color: #666666;">&quot;File [&quot;</span> <span style="color: #008000;">+</span> fileName <span style="color: #008000;">+</span> <span style="color: #666666;">&quot;] does not exist.&quot;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #0600FF; font-weight: bold;">this</span><span style="color: #008000;">.</span><span style="color: #0000FF;">stream</span> <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> StreamReader<span style="color: #008000;">&#40;</span>fileName<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
&nbsp;
        <span style="color: #6666cc; font-weight: bold;">string</span><span style="color: #008000;">&#91;</span><span style="color: #008000;">&#93;</span> headers <span style="color: #008000;">=</span> stream<span style="color: #008000;">.</span><span style="color: #0000FF;">ReadLine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Split</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">','</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF; font-weight: bold;">for</span> <span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span> i<span style="color: #008000;">=</span><span style="color: #FF0000;">0</span><span style="color: #008000;">;</span> i <span style="color: #008000;">&lt;</span> headers<span style="color: #008000;">.</span><span style="color: #0000FF;">Length</span><span style="color: #008000;">;</span> i<span style="color: #008000;">++</span><span style="color: #008000;">&#41;</span>
        <span style="color: #008000;">&#123;</span>
            columnsByName<span style="color: #008000;">.</span><span style="color: #0000FF;">Add</span><span style="color: #008000;">&#40;</span>headers<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span>, i<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
            columnsByOrdinal<span style="color: #008000;">.</span><span style="color: #0000FF;">Add</span><span style="color: #008000;">&#40;</span>i, headers<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #008000;">&#125;</span>
&nbsp;
        _isClosed <span style="color: #008000;">=</span> <span style="color: #0600FF; font-weight: bold;">false</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">void</span> Close<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span>stream <span style="color: #008000;">!=</span> <span style="color: #0600FF; font-weight: bold;">null</span><span style="color: #008000;">&#41;</span> stream<span style="color: #008000;">.</span><span style="color: #0000FF;">Close</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        _isClosed <span style="color: #008000;">=</span> <span style="color: #0600FF; font-weight: bold;">true</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">int</span> FieldCount
    <span style="color: #008000;">&#123;</span>
        get <span style="color: #008000;">&#123;</span> <span style="color: #0600FF; font-weight: bold;">return</span> columnsByName<span style="color: #008000;">.</span><span style="color: #0000FF;">Count</span><span style="color: #008000;">;</span> <span style="color: #008000;">&#125;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">/// &lt;summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// This is the main function that does the work - it reads in the next line of data and parses the values into ordinals.</span>
    <span style="color: #008080; font-style: italic;">/// &lt;/summary&gt;</span>
    <span style="color: #008080; font-style: italic;">/// &lt;returns&gt;A value indicating whether the EOF was reached or not.&lt;/returns&gt;</span>
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">bool</span> Read<span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span>stream <span style="color: #008000;">==</span> <span style="color: #0600FF; font-weight: bold;">null</span><span style="color: #008000;">&#41;</span> <span style="color: #0600FF; font-weight: bold;">return</span> <span style="color: #0600FF; font-weight: bold;">false</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF; font-weight: bold;">if</span> <span style="color: #008000;">&#40;</span>stream<span style="color: #008000;">.</span><span style="color: #0000FF;">EndOfStream</span><span style="color: #008000;">&#41;</span> <span style="color: #0600FF; font-weight: bold;">return</span> <span style="color: #0600FF; font-weight: bold;">false</span><span style="color: #008000;">;</span>
&nbsp;
        currentRow <span style="color: #008000;">=</span> stream<span style="color: #008000;">.</span><span style="color: #0000FF;">ReadLine</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">.</span><span style="color: #0000FF;">Split</span><span style="color: #008000;">&#40;</span><span style="color: #666666;">','</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> <span style="color: #0600FF; font-weight: bold;">true</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">object</span> GetValue<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span> i<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> currentRow<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">string</span> GetName<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">int</span> i<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> columnsByOrdinal<span style="color: #008000;">&#91;</span>i<span style="color: #008000;">&#93;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #0600FF; font-weight: bold;">public</span> <span style="color: #6666cc; font-weight: bold;">int</span> GetOrdinal<span style="color: #008000;">&#40;</span><span style="color: #6666cc; font-weight: bold;">string</span> name<span style="color: #008000;">&#41;</span>
    <span style="color: #008000;">&#123;</span>
        <span style="color: #0600FF; font-weight: bold;">return</span> columnsByName<span style="color: #008000;">&#91;</span>name<span style="color: #008000;">&#93;</span><span style="color: #008000;">;</span>
    <span style="color: #008000;">&#125;</span>
&nbsp;
    <span style="color: #008080; font-style: italic;">//Other IDataReader methods/properties here, but all throwing not implemented exceptions.</span>
<span style="color: #008000;">&#125;</span></pre></div></div>

<p>It turns out you only need to implement these few properties and methods for SqlBulkCopy (I&#8217;m not even sure you need implement this much). Once I had this, it was a mere four lines to import the CSV file into SQL Server :</p>

<div class="wp_syntax"><div class="code"><pre class="csharp" style="font-family:monospace;">SqlBulkCopy sbc <span style="color: #008000;">=</span> <span style="color: #008000;">new</span> SqlBulkCopy<span style="color: #008000;">&#40;</span>mySqlConnection<span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span>
sbc<span style="color: #008000;">.</span><span style="color: #0000FF;">DestinationTableName</span> <span style="color: #008000;">=</span> <span style="color: #666666;">&quot;MyTable&quot;</span><span style="color: #008000;">;</span>
sbc<span style="color: #008000;">.</span><span style="color: #0000FF;">BulkCopyTimeout</span> <span style="color: #008000;">=</span> <span style="color: #FF0000;">6000</span><span style="color: #008000;">;</span> <span style="color: #008080; font-style: italic;">//10 Minutes</span>
sbc<span style="color: #008000;">.</span><span style="color: #0000FF;">WriteToServer</span><span style="color: #008000;">&#40;</span><span style="color: #008000;">new</span> CSVDataReader<span style="color: #008000;">&#40;</span>myFileName<span style="color: #008000;">&#41;</span><span style="color: #008000;">&#41;</span><span style="color: #008000;">;</span></pre></div></div>

<p>Of course, this relies on all of MyTable&#8217;s columns being of type varchar and the column headers in the CSV file need to match up with the column headers in the table, but this is supposed to be a simple spike.<br />
The first time I ran this, it was using about 12MB of memory for a 12MB file (my original 330K line file), and while this was an improvement over the 100s of MB for building the DataTable, it didn&#8217;t really tell me anything about how it might scale. So, I generated a file with about 35 million rows in it just to see what would happen. I was pleasantly surprised to find that it only used about 12MB from start to finish &#8211; this is clearly the benefit of using this DataReader model, the whole file/data structure is never in memory so we&#8217;re not generating enormous data structures to pass around.</p>
<p>If I have to do something similar to this in the future, I&#8217;ll probably tidy up this CSVDataReader and use it again. I may even implement the rest of it&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.adhocgeek.com/2009/08/playing-with-idatareader-and-sqlbulkcopy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

