logo
Published on developer.* Blogs (http://www.developerdotstar.com/community)

Lessons Learned Automating Excel from .NET

By Daniel Read
Created 2007-03-07 20:26

Recently I had occasion to write a moderately complex component that used "automation [1]" (using the old fashioned COM term) to communicate with the Microsoft Office Excel application installed on the same computer.

The scenario I was working in was an import scenario--that is, reading a provided Excel spreadsheet file and importing into a database. Some of my tips below may be tied to this assumption, but I think my advice will apply to most Excel automation scenarios. (Also, I'll be a lot of this applies to automating other Office applications such as Word.)

First, a bit of background: keep in mind that the Excel API is implemented in COM, which is a precursor to .NET. Microsoft has gone to great lengths to make .NET backwards compatible with COM (and vice versa), but the fact remains that when you are communicating with Excel from your .NET code you are not communicating with something that is native to .NET.

One more thing: my experiences as documented here are based on using Excel 9.0 (Office 2000) and Excel 11.0 (Office 2003) from .NET 2.0 running on Windows XP SP 2. I can't say for sure whether anything would be different relative to newer versions of Excel, .NET, or Windows. Feel free to add comments to this post to provide more up-to-date information.

The links at the end of this post point to some related information that you may find useful.

On to the tips...

Tip #1: Seriously Consider NOT Automating Excel from .NET

I don't mean to be flippant: the most significant conclusion I have reached from my experience is that if I ever have to do this again, I will twist whatever arms I have to in order to purchase some kind of Excel component so that I do not have to program against the Excel API at all. There are many .NET component libraries on the market that can read from and write to Excel spreadsheet files without Excel even needing to be installed on the machine. I recommend strongly that you seek one of these out. As you read the rest of my tips, you will have a better idea why. A quick Google search should turn up several options.

(Component vendors, please refrain from adding comments to this post to advertise your Excel library; requests to advertise, review, or plug your component will be politely declined. Thanks for understanding.)

Tip #2: Use VB.NET instead of C#

Even if your preferred .NET language is C# (or some other .NET language), even if you absolutely hate VB.NET (it's really not so bad), I strongly recommend that you use VB.NET instead of C#. If the rest of your solution is written in C#, break out your Excel-related functionality into a separate VB.NET DLL. I got about 10% into the initial attempt at my solution before I said "Screw this!", threw out my C# code, and started over again in VB.NET.

Why? In short, two reasons:

First, Office applications make prodigious use of optional arguments--in many cases a single function can take a dozen optional arguments. C# does not know how to handle optional arguments, so you end up passing a ton of useless null or Object values. This might not be so bad if there weren't so many, and if the lack of useful Intellisense in the IDE didn't make it even more difficult to tell which argument is which.

Second, C# is not really hip to the Variant data type, which is used often in Excel's VBA [2]-centric API.

Trust me: just use VB.NET. If you're still not convinced, check out the link to the article "Programming Office Applications Using Visual C#" at the end of this post.

Tip #3: Communicate with Excel in a late bound manner

Late bound? That's not a term you hear much these days since .NET has overtaken COM. This term refers to a technique of not compiling a reference to the Excel library into your assembly. Late binding is otherwise known as IDispatch binding, with early binding otherwise known as vtable binding. Since Excel is still a COM application, these terms still apply in a .NET interop context.

I recommend NOT adding an early bound reference to Excel in your project. Microsoft has not done a good job over the years in managing the compatibility between the various versions of the Excel library; depending on which version of Office you have installed, you will have different version-specific Excel reference options in your COM references list. Don't be tempted by these early bound reference opportunities. Late bound all the way, baby. Protect yourself from version incompatibilities when you find out that you compiled against Excel 11.0 and your client is using 9.0. Similarly, you can protect your solution from breaking when your user or administrator upgrades Office.

In .NET, going late bound requires some syntax and techniques that may be unfamiliar to you, but it's not that bad.

To instantiate an Excel Application object, use the Interaction.CreateObject() method, which is much like the old VB6/VBA/VBScript CreateObject() function. It is part of the Microsoft.VisualBasic namespace (see Tip #4 if you decide to go with C# instead of VB.NET). Here is my LoadExcel() method:

Private Sub loadExcel()
m_excelApp = Interaction.CreateObject("Excel.Application", "")
If m_excelApp Is Nothing Then
Throw New Exception("The Excel libarary could not be loaded. " & _
"You may an incorrect version of Excel installed. " & _
"Please consult your system administrator. The instantiation " & _
"failed without an exception.")
End If
End Sub

Similarly, in order to invoke any properties or methods on the Excel.Application object, you want to use InvokeMember, like so:

workbooks = m_excelApp.GetType().InvokeMember("Workbooks", _
BindingFlags.GetProperty Or BindingFlags.Public, _
Nothing, m_excelApp, Nothing)

If that code looks a little cumbersome to you, see Tip #5. If you wrap your Excel calls, your main application logic won't be burdened with the more verbose code required for late bound automation.

Check the .NET documentation on InvokeMember(). You may also prefer using Interaction.CallByName(), which offers a kind of shorthand version of InvokeMember(), but I prefer the more explicit control offered by InvokeMember().

The original version of my solution used an early bound reference. When I refactored my code to be late bound, I did not notice any significant performance difference. This jibes with my experiences with early vs. late binding performance when I used to write a ton of COM code in VB 6. If performance is a real concern, I return your attention to Tip #1; I think there is a pretty low performance ceiling you'll hit when automating Excel no matter what you do; theoretically, a good Excel component would be faster.

(You may also need an Imports System.Reflection statement at the top of your class file to do late binding. I have this at the top of my class, but as I write this I can't remember why.)

Tip #4: If you use C#, go through VB to get to Excel

While researching my Excel automation solution, I came across a great Dan Appleman article that is linked at the end of this post. It shows how you can reference the VB.NET runtime library from your C# project to make use of the Microsoft.VisualBasic.Runtime namespace, which is where you will find the aforementioned Interaction.CreateObject(). I think it's safe to say that you can't do late bound COM programming from C# without going through the Visual Basic library.

Tip #5: Wrap your Excel communication points

I strongly recommend that you use good old information hiding [3] to protect the main logic of your code from the particulars of communicating with Excel. Here are a few of the functions I developed:

Private Function getRange(ByRef rangeString As String, ByVal worksheet As Object) As Object

Dim range As Object = Nothing

range = worksheet.GetType().InvokeMember("Range", _
BindingFlags.GetProperty Or BindingFlags.Public, Nothing, _
worksheet, New Object() {rangeString})

Return range
End Function

Private Function getCellValString(ByRef columnLetter As String, ByRef rowNum As Integer, _
ByVal workSheet As Object) As String

Dim expectedCell As Object = Nothing

expectedCell = getRange(columnLetter & rowNum.ToString(), workSheet)

Return cellToString(expectedCell).Trim()
End Function

Private Function cellToString(ByVal singleCell As Object) As String

'The Value property is a VBA Variant, which does not
'translate well to .NET. When the cell is blank, the
'Value property will be Nothing
Dim singleCellValue As Object = Nothing

singleCellValue = singleCell.GetType().InvokeMember("Value", _
BindingFlags.GetProperty Or BindingFlags.Public, _
Nothing, singleCell, Nothing)

If singleCellValue Is Nothing Then
Return String.Empty
Else
Return singleCellValue.ToString()
End If
End Function

Note: I have omitted by retry wrapper code from these methods for the sake of clarity; see Tip #7 for an explanation. If you reuse these methods, you may want to add in the retry wrapper--though in my testing so far, it looks like when the "application is busy" error happens, you may be irrevocably screwed.

See Tip #6 for an explanation for why I use cellToString().

Tip #6: Watch out for empty cells

In testing I ran into some annoyances related to empty cells, so I developed the cellToString() wrapper function you see above. The issue is that Range.Value is a Variant, not a String or numeric type. So when a cell is empty you can (but not necessarily always) get Nothing back instead of an empty string.

Tip #7: Watch out interruptions and disconnects

I don't have a lot of information to provide for what causes this problem, but in testing I have run into problems with lost communication between the .NET code and the running instance of Excel. What I mean is, if your Excel automation code is running and the user makes use of the Excel application at the same time, instability may occur. The main scenario we've been able to reproduce is, oddly enough, if the user opens a .XLS file by double clicking on it (as opposed to using the File->Open menu) then your .NET code will suddenly lose its ability to connect with Excel. This is the error we saw in this scenario:

System.Exception: There was an error during the main import process: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Runtime.InteropServices.COMException (0x8001010A): The message filter indicated that the application is busy. (Exception from HRESULT: 0x8001010A (RPC_E_SERVERCALL_RETRYLATER))

It's possible that other Excel usage scenarios could cause this same problem.

If this scares the hell out of you (and it should) see Tip #1.

One solution I tried is wrapping all of my Excel communication calls in some trap-and-retry logic. So a statement like this:

range = worksheet.GetType().InvokeMember("Range", _
BindingFlags.GetProperty Or BindingFlags.Public, Nothing, _
worksheet, New Object() {rangeString})

becomes this:

Do
Try
range = worksheet.GetType().InvokeMember("Range", _
BindingFlags.GetProperty Or BindingFlags.Public, Nothing, _
worksheet, New Object() {rangeString})
Catch ex As Exception
If ex.Message.Contains("application is busy") Then
If m_excelRetryCount <= MAX_EXCEL_RETRIES Then
m_retryExcel = True
m_excelRetryCount += 1
System.Threading.Thread.Sleep(EXCEL_RETRY_DELAY)
Else
m_retryExcel = False
m_excelRetryCount = 0
Throw New Exception("The Excel application is apparently busy, which is " & _
"interfering with the import process. We have retried communication with Excel " & _
MAX_EXCEL_RETRIES & " times with a delay between each retry, but we are not " & _
"able to establish communicatation with Excel.", ex)
End If
Else
m_retryExcel = False
m_excelRetryCount = 0
Throw ex
End If
End Try
Loop While m_retryExcel
m_retryExcel = False
m_excelRetryCount = 0

Note that I declare EXCEL_RETRY_DELAY, m_retryExcel, and m_excelRetryCount at the class level since I had to repeat this retry logic in several of my Excel wrapper functions. That may or may not have been the best technique, but there it is.

Your mileage may vary. So far from testing it appears that the retries cannot ever reestablish the communication link.

Some Potentially Useful Links

Good luck!

Thanks for reading.

Dan

P.S.
If you thought this was helpful and feel compelled to do something nice in return, considering buying one of our high quality software development books [9], published by developer.* Books. You will be doubly enriched. :-)


Source URL:
http://www.developerdotstar.com/community/community/automate_excel_dotnet