Lessons Learned Automating Excel from .NET
Recently I had occasion to write a moderately complex component that used "automation" (using the old fashioned COM term) to communicate with the Microsoft Office Excel application installed on the same computer.
The scenario I was working in was an import scenario--that is, reading a provided Excel spreadsheet file and importing into a database. Some of my tips below may be tied to this assumption, but I think my advice will apply to most Excel automation scenarios. (Also, I'll be a lot of this applies to automating other Office applications such as Word.)
First, a bit of background: keep in mind that the Excel API is implemented in COM, which is a precursor to .NET. Microsoft has gone to great lengths to make .NET backwards compatible with COM (and vice versa), but the fact remains that when you are communicating with Excel from your .NET code you are not communicating with something that is native to .NET.
One more thing: my experiences as documented here are based on using Excel 9.0 (Office 2000) and Excel 11.0 (Office 2003) from .NET 2.0 running on Windows XP SP 2. I can't say for sure whether anything would be different relative to newer versions of Excel, .NET, or Windows. Feel free to add comments to this post to provide more up-to-date information.
The links at the end of this post point to some related information that you may find useful.
On to the tips...
Tip #1: Seriously Consider NOT Automating Excel from .NET
I don't mean to be flippant: the most significant conclusion I have reached from my experience is that if I ever have to do this again, I will twist whatever arms I have to in order to purchase some kind of Excel component so that I do not have to program against the Excel API at all. There are many .NET component libraries on the market that can read from and write to Excel spreadsheet files without Excel even needing to be installed on the machine. I recommend strongly that you seek one of these out. As you read the rest of my tips, you will have a better idea why. A quick Google search should turn up several options.
(Component vendors, please refrain from adding comments to this post to advertise your Excel library; requests to advertise, review, or plug your component will be politely declined. Thanks for understanding.)
Tip #2: Use VB.NET instead of C#
Even if your preferred .NET language is C# (or some other .NET language), even if you absolutely hate VB.NET (it's really not so bad), I strongly recommend that you use VB.NET instead of C#. If the rest of your solution is written in C#, break out your Excel-related functionality into a separate VB.NET DLL. I got about 10% into the initial attempt at my solution before I said "Screw this!", threw out my C# code, and started over again in VB.NET.
Why? In short, two reasons:
First, Office applications make prodigious use of optional arguments--in many cases a single function can take a dozen optional arguments. C# does not know how to handle optional arguments, so you end up passing a ton of useless null or Object values. This might not be so bad if there weren't so many, and if the lack of useful Intellisense in the IDE didn't make it even more difficult to tell which argument is which.
Second, C# is not really hip to the Variant data type, which is used often in Excel's VBA-centric API.
Trust me: just use VB.NET. If you're still not convinced, check out the link to the article "Programming Office Applications Using Visual C#" at the end of this post.
Tip #3: Communicate with Excel in a late bound manner
Late bound? That's not a term you hear much these days since .NET has overtaken COM. This term refers to a technique of not compiling a reference to the Excel library into your assembly. Late binding is otherwise known as IDispatch binding, with early binding otherwise known as vtable binding. Since Excel is still a COM application, these terms still apply in a .NET interop context.
I recommend NOT adding an early bound reference to Excel in your project. Microsoft has not done a good job over the years in managing the compatibility between the various versions of the Excel library; depending on which version of Office you have installed, you will have different version-specific Excel reference options in your COM references list. Don't be tempted by these early bound reference opportunities. Late bound all the way, baby. Protect yourself from version incompatibilities when you find out that you compiled against Excel 11.0 and your client is using 9.0. Similarly, you can protect your solution from breaking when your user or administrator upgrades Office.
In .NET, going late bound requires some syntax and techniques that may be unfamiliar to you, but it's not that bad.
To instantiate an Excel Application object, use the Interaction.CreateObject() method, which is much like the old VB6/VBA/VBScript CreateObject() function. It is part of the Microsoft.VisualBasic namespace (see Tip #4 if you decide to go with C# instead of VB.NET). Here is my LoadExcel() method:
Private Sub loadExcel()
m_excelApp = Interaction.CreateObject("Excel.Application", "")
If m_excelApp Is Nothing Then
Throw New Exception("The Excel libarary could not be loaded. " & _
"You may an incorrect version of Excel installed. " & _
"Please consult your system administrator. The instantiation " & _
"failed without an exception.")
End If
End Sub
Similarly, in order to invoke any properties or methods on the Excel.Application object, you want to use InvokeMember, like so:
workbooks = m_excelApp.GetType().InvokeMember("Workbooks", _
BindingFlags.GetProperty Or BindingFlags.Public, _
Nothing, m_excelApp, Nothing)If that code looks a little cumbersome to you, see Tip #5. If you wrap your Excel calls, your main application logic won't be burdened with the more verbose code required for late bound automation.
Check the .NET documentation on InvokeMember(). You may also prefer using Interaction.CallByName(), which offers a kind of shorthand version of InvokeMember(), but I prefer the more explicit control offered by InvokeMember().
The original version of my solution used an early bound reference. When I refactored my code to be late bound, I did not notice any significant performance difference. This jibes with my experiences with early vs. late binding performance when I used to write a ton of COM code in VB 6. If performance is a real concern, I return your attention to Tip #1; I think there is a pretty low performance ceiling you'll hit when automating Excel no matter what you do; theoretically, a good Excel component would be faster.
(You may also need an Imports System.Reflection statement at the top of your class file to do late binding. I have this at the top of my class, but as I write this I can't remember why.)
Tip #4: If you use C#, go through VB to get to Excel
While researching my Excel automation solution, I came across a great Dan Appleman article that is linked at the end of this post. It shows how you can reference the VB.NET runtime library from your C# project to make use of the Microsoft.VisualBasic.Runtime namespace, which is where you will find the aforementioned Interaction.CreateObject(). I think it's safe to say that you can't do late bound COM programming from C# without going through the Visual Basic library.
Tip #5: Wrap your Excel communication points
I strongly recommend that you use good old information hiding to protect the main logic of your code from the particulars of communicating with Excel. Here are a few of the functions I developed:
Private Function getRange(ByRef rangeString As String, ByVal worksheet As Object) As Object
Dim range As Object = Nothing
range = worksheet.GetType().InvokeMember("Range", _
BindingFlags.GetProperty Or BindingFlags.Public, Nothing, _
worksheet, New Object() {rangeString})
Return range
End Function
Private Function getCellValString(ByRef columnLetter As String, ByRef rowNum As Integer, _
ByVal workSheet As Object) As String
Dim expectedCell As Object = Nothing
expectedCell = getRange(columnLetter & rowNum.ToString(), workSheet)
Return cellToString(expectedCell).Trim()
End Function
Private Function cellToString(ByVal singleCell As Object) As String
'The Value property is a VBA Variant, which does not
'translate well to .NET. When the cell is blank, the
'Value property will be Nothing
Dim singleCellValue As Object = Nothing
singleCellValue = singleCell.GetType().InvokeMember("Value", _
BindingFlags.GetProperty Or BindingFlags.Public, _
Nothing, singleCell, Nothing)
If singleCellValue Is Nothing Then
Return String.Empty
Else
Return singleCellValue.ToString()
End If
End Function
Note: I have omitted by retry wrapper code from these methods for the sake of clarity; see Tip #7 for an explanation. If you reuse these methods, you may want to add in the retry wrapper--though in my testing so far, it looks like when the "application is busy" error happens, you may be irrevocably screwed.
See Tip #6 for an explanation for why I use cellToString().
Tip #6: Watch out for empty cells
In testing I ran into some annoyances related to empty cells, so I developed the cellToString() wrapper function you see above. The issue is that Range.Value is a Variant, not a String or numeric type. So when a cell is empty you can (but not necessarily always) get Nothing back instead of an empty string.
Tip #7: Watch out interruptions and disconnects
I don't have a lot of information to provide for what causes this problem, but in testing I have run into problems with lost communication between the .NET code and the running instance of Excel. What I mean is, if your Excel automation code is running and the user makes use of the Excel application at the same time, instability may occur. The main scenario we've been able to reproduce is, oddly enough, if the user opens a .XLS file by double clicking on it (as opposed to using the File->Open menu) then your .NET code will suddenly lose its ability to connect with Excel. This is the error we saw in this scenario:
System.Exception: There was an error during the main import process: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Runtime.InteropServices.COMException (0x8001010A): The message filter indicated that the application is busy. (Exception from HRESULT: 0x8001010A (RPC_E_SERVERCALL_RETRYLATER))
It's possible that other Excel usage scenarios could cause this same problem.
If this scares the hell out of you (and it should) see Tip #1.
One solution I tried is wrapping all of my Excel communication calls in some trap-and-retry logic. So a statement like this:
range = worksheet.GetType().InvokeMember("Range", _
BindingFlags.GetProperty Or BindingFlags.Public, Nothing, _
worksheet, New Object() {rangeString})becomes this:
Do
Try
range = worksheet.GetType().InvokeMember("Range", _
BindingFlags.GetProperty Or BindingFlags.Public, Nothing, _
worksheet, New Object() {rangeString})
Catch ex As Exception
If ex.Message.Contains("application is busy") Then
If m_excelRetryCount <= MAX_EXCEL_RETRIES Then
m_retryExcel = True
m_excelRetryCount += 1
System.Threading.Thread.Sleep(EXCEL_RETRY_DELAY)
Else
m_retryExcel = False
m_excelRetryCount = 0
Throw New Exception("The Excel application is apparently busy, which is " & _
"interfering with the import process. We have retried communication with Excel " & _
MAX_EXCEL_RETRIES & " times with a delay between each retry, but we are not " & _
"able to establish communicatation with Excel.", ex)
End If
Else
m_retryExcel = False
m_excelRetryCount = 0
Throw ex
End If
End Try
Loop While m_retryExcel
m_retryExcel = False
m_excelRetryCount = 0
Note that I declare EXCEL_RETRY_DELAY, m_retryExcel, and m_excelRetryCount at the class level since I had to repeat this retry logic in several of my Excel wrapper functions. That may or may not have been the best technique, but there it is.
Your mileage may vary. So far from testing it appears that the retries cannot ever reestablish the communication link.
Some Potentially Useful Links
- This MSDN article "Programming Office Applications Using Visual C#" contains some useful information and tips.
- I found a nice tip in this article by Dan Appleman on how you can reference the VB.NET library from C# and use VB.NET's methods to make late-bound calls to Office (or any other IDispatch-able COM library). Unfortunately, this article was published by Fawcette, which requires a paid account to read online. However, you can try searching Google for appleman c# invokemember and try clicking Google's "cached" link. You also might have some luck reading the article via this link.
- This discussion on the Joel on Software board offers some interesting suggestions and insights that could be particularly helpful to someone weighing whether or not to automate Excel or take a different approach
- Methods in Excel, and Excel-oriented blog; lots of links there to other Excel blogs and sites
Good luck!
Thanks for reading.
Dan
P.S.
If you thought this was helpful and feel compelled to do something nice in return, considering buying one of our high quality software development books, published by developer.* Books. You will be doubly enriched. :-)
Well, all this shows how
Well, all this shows how crap the Microsoft binary formats are. If these stupid bollockers just used a nice text based programming interface in Unix spirit, let us say Excel listening on a socket, accepting text "messages", everything would be 100x easier.
Hell to them, piece of Microsoft piss.
Chill
My friends at Microsoft aren't "stupid bollockers". Instead, they are often people who were excluded from those universities who developed the culture of computing elegance by their parents' finances and who, like an older generation of IBM employees, got some dignity out of a full time professional job from a man, Gates or Watson, who didn't care about their having a fancy education.
Gates made a decision to protect his intellectual property in the shark tank which then and now is the American economy, a thieves' economy, by using binary and proprietary formats, because he saw people lose their shirts in the 1970s to men who smiled and stole.
This site is not a football match, mate, nor is real software development, either proprietary or open. You want to fight, see me on usenet.
Not here.
Goes for alll of Office
These are great tips that apply to the whole office suite. I recently had the pleasure of automating Word and PowerPoint. Unfortunately I wrote the whole thing in C#. I firmly agree with Tip #1. Never ever automate Office if you can find another way. With Excel at least you have the option of dumping the whole thing to a CSV and then manipulating the data. The one exception is PowerPoint which should never ever be touched by code, full stop. PowerPoint is the cistern of the Office automation sewer. No third-party is going to help you there.
Other way to automate excel
I havent done any office automation using .net. However, I feel, it will be better to create a COM component using VB6, which does all the work expected from the application
This COM component can be called from c# or vb.net. So, one need not go through reflection & optional parameter thing, variant. Also, VB code will do the cleanup of objects instantiated (e.g. set myrange = nothing)
How about VSTO? Does it do any better?
Hope this is a good suggestion & not a stupid one.
Wrapping COM with COM
Kalpesh, I don't think your suggestion to implement the needed Excel/Office functionality in a VB 6 DLL is stupid at all. I think it would work well, and I still have fond feelings for VB 6. However, I suspect that in a larger organization, or in a consulting situation, there could be managers or others who would not want to introduce VB 6 code into a solution for fear that they would have trouble maintaining it. That would depend on the situation, though...
Thanks,
Dan
VSTO
If you need to couple .NET to Excel or any of the Office suite of applications, I would highly recommend using VSTO (Visual Studio Tools for Office) instead of the above techniques. When you install VSTO, you get a series of project templates that start you off working directly with, say, an Excel spreadsheet as part of your application. You can then use the spreadsheet (or Word doc) as a front-end to do just about anything you want. At my current job I had to create a spreadsheet that allowed a user to provide a customerID, click a button, and return a history of invoice summaries that had to populate a range. I wrote it all in C# just as if I were using a Windows Form. It was quite easy. The only hurdle I see is that if you have to deploy a VSTO application to a client's machine (or throughout an office) then you have to go through extra hoops including VSTO and some dependent assemblies in your MSI installer. Plus you need to set some security policy to grant execution rights on the client machine to the required assemblies.
But in terms of the development itself, I found VSTO to be much easier and quicker because it allowed me to use everything natively in .NET. All of the COM+ interop is wrappered for you.
VSTO Envy
Thank you for providing this additional info, Trevor. It sounds like a worthy alternative. However, I have a strong doubt that this solution would have worked for my client's company, which was still on Office 2000 and not planning to upgrade anytime soon. The first version of my solution was early bound, and I had to rewrite it to be late bound when it did not test well with Office 2000. Converting from early bound to late bound was relatively straightforward, but I shudder to think that I would have had to do a near-total rewrite if I had started with VSTO, which I had not heard of at the time (and which I am only assuming would have backwards compatibility problems with Office 2000--though I would be willing to be money that it does since Microsoft has a history of being obnoxious about driving Office upgrades with their Office development hook-ins).
Anyway, I don't mean to crap on your suggested solution, Trevor. I appreciate it. If the circumstances are right, VSTO sounds like a great option, and one I'd like to explore further for future projects.
Dan


Alternatives to calling Excel
An incredible story of the effort needed to communicate from .Net to an old and very complex COM object, Dan.
I have another suggestion. If you're calling Excel to do trivial computations, to allow the user to work on a spreadsheet in your GUI, consider creating your own version of Excel.
The main problems:
(1) Restricting the "language" of what can be entered in a cell to a subset of what is supported by Excel and which you feel comfortable compiling
(2) Solving the problem of recalculating the linked cells in a spreadsheet by finding the top of the "tree" of computations and working downward. This was solved by Dan Bricklin in the first spreadsheet and his solution was patented, but it's fairly easy to reinvent, at least I think I did so in 1987 for a real estate appraisal system that presented the user with a spreadsheet of linked fields.
If this sounds too complex, consider getting the free Open Office suite: this is written in modern code, it isn't COM, it presents no license or patent problems, and it MAY be easy to work with.
Another solution would be "arms length". Send keystrokes to an Excel application running in a different process entirely, save the result using these keystrokes, and parse it for the answer.
The bottom line: if you define the problem (or it is defined for you) as "I gotta use Excel as a COM object in a .Net application" you're in for an interesting time, and as Dan has done, you need to use tryCatch a LOT.
The need for wait loops to wait while Excel gets its act together implies for me a mini GUI to allow power users to control the timing and possibly progress screens. I say this because time-sensitive code can appear to hang a user's machine when it doesn't indicate what it is waiting for, and often needs to be tuned for different machines.
The object2String() method found in the utilities library shipped with Build Your Own .Net Language and Compiler, btw, handles the problem of serializing (converting to character in my lexicon) any .Net object, and I found it useful in a .Net project that used variants from an Office application.