Software Development
Blogs and Discussion
developer.*
Books Articles Blogs Subscribe d.* Gear About Home

SQL Server Job Success, Failure, and Everything In-Between

If your scheduled SQL Server job fails at 2:00AM when no one's checking, will anyone know, or is it like a tree falling silently in a forest? "No problem," you might reply. You have email notification enabled.

Email notification on SQL Server job status presents at least three dilemmas:

#1 Who receives notification?

If you send notification to an individual, that person could be on vacation or otherwise unable to receive the email. If that concern is addressed by sending notification to a group, who will take action? Each person may assume someone else will respond. If you decide to send notification to a generic email account that can be checked, the timeliness of notification is lost and depends on dutiful employees remembering to check. If you decide to send the notification to a pager (which is rotated among on-call staff), employees may become weary of receiving pages at all hours of the night, when jobs are scheduled to run.

#2 Should you notify on job/step success?

If you set up the job to notify on success, you may be surprised at the results. SQL Server interprets "success" in the broadest sense of the word: something ran and it must be good. No matter that your primary table was truncated but wasn't repopulated. When it comes to interpreting success for jobs that actually *do* something involving data (and not just backups), it seems that SQL Server is a hound dog with its head out the car window and ears flapping in the wind. Plus, sending a notification every time a job succeeds is pretty much crying wolf. Who's going to notice or care that a message *wasn't* received?

#3 Should you notify on job failure?

Notification on job failure seems reasonable, but what happens if something prevents the job from even starting? Is that failure? Not to SQL Server...not unless the condition causing the job not to run (such a the backup job getting hung up, suspending all future operations) is trapped and sends a notification. What if the problem is serious enough to prevent the service responsible for notification to run?

An Approach (or Two)

I understand one approach to this general quandary is to create a single job success/failure report by querying the msdb database for job history information, and email that to the DBA. That at least reduces the number of notifications, but takes us back to dilemma #1, requiring the DBA's email to be forwarded to someone else in her absence, and also depends on a DBA who doesn't mind going blind reviewing what could be a substantial report, trying to pick the needle failure out of a job run haystack.

Yet, as I mentioned before, native definitions of success are often not granular enough to be truly useful or reliable. I haven't looked at SQL Server 2005 enough yet to know if things have improved. Yet, I can imagine a custom solution (that I never seem to have time to work on): Every critical job could conclude with a call to a custom stored procedure that will check the pulse of the primary tables (record counts, etc) affected by the job and update a table with the server, application, job name, date/time, and a health indicator. (This would work conceptually like the exception handling we usually build into our applications.) Then we'd build a simple front-end dashboard application for the Data Center so that the first thing they'd see when they come in is an attractive interface with color-codes based on health indicator flags for all the jobs important enough to track. A red light means something is wrong (for real): call somebody.

I can imagine that these dilemmas can't be uncommon. If someone else has a another solution or a good workaround, I'd love to hear about it.

Unattended Synergy

Hi Donna,

Your post inspired me to get to work on a post of my own on this topic that has been brewing in my head for awhile now. Coincidentally a major responsibility of mine in my day job for the past few months has been the daily monitoring and upkeep of a somewhat complex and loosely connected hive of download/import/export/maintenance processes clustered around a set of operational databases. The system is a loosely connected (I wish I could say loosely coupled) mixture of scripts, DTS packages, SQL Server 2000 databases, .NET programs, plus the "SQL Agent" scheduler and a standalone job scheduling tool called Automate. I feel all of the pains you describe in your post on a daily basis. The people who put this system together clearly did not build it with the qualities I describe in my post in mind.

I'm also in the process of designing and building a major rewrite for this system, and we're centering most of it on SQL Server 2005's new SSIS (SQL Server Integration Services) facility, which is a wholesale replacement for the longstanding good-but-problematic-and-fragile SQL Server ETL tool, DTS (Data Transformation Services). So far I am pretty impressed with SSIS. Microsoft went back to the drawing board and conceived a true enterprise-class ETL framework. The design of SSIS overcomes many shortcomings of DTS. To learn more about it, I suggest checking out this ProjectREAL thing that Microsoft has set up. I found this white paper about a large-scale ETL process they built for Barnes & Noble's BI to be extremely interesting. Of particular interest to the current discussion, it discusses their logging and exception handling framework in detail. For more on SQL Server 2005, I highly recommend this SAMS book by "Salability Experts, Inc." called Changing the Paradigm. It will bring you up to speed on SQL 2005 quickly.

Even with the out-of-the-box benefits of SSIS, we still plan to build the kind of "dashboard" capabilities you describe, with the specific goal of allowing non-technical operators to be able to judge the health of the system. This will involve designing logging and tracking tables that the dashboard can query for status information. As you suggest, this gives a much better view of the system, and a much more effective management tool, than success and failure emails. However, building this kind of thing takes time and resources, and often these kinds of processes are built on a tight budget. In my current case, in which the client is willing to pay for this "dashboard" capability, they have been living with and feeling the pain of their current non-independent, non-transparent, and non-monitorable system for awhile now.

For what it's worth, in cases where we are stuck with emails for monitoring, I try to increase redundancy by using distribution lists for the emails rather than a single email address.

Best,
Dan

SQL Server Job Success, Failure, and Everything In-Between

Do you mind a basic question from a DBA new to Sql Server? I tried to get a job schedule's notifications set to an Outlook distribution list but wasn't able to get it to work. I noticed you said "I try to increase redundancy by using distribution lists for the emails rather than a single email address". Can you tell me what the secret is for creating a distribution list that the job scheduler will recognize as an operator?

Distribution List

Hi. Hopefully I can help: what I meant was that we created a distribution list on the Exchange server itself. This is simply an email account that you create on your Exchange server, like for example sql_errors@yourcompany.com. Then configure this address in Exchange to route any messages to however many email addresses you choose.

Managing this centrally in Exchange is preferrable to trying to keep up with all the different jobs and configuration points that have these kinds of notification email addresses. Even if you only have one person who is going to receive the notifications, I would still suggest creating an "alias" email address so that when employees come and go all you have to do is update the Exchange settings.

Hope that helps,
Dan

SQL Jobs

Have any of you guys tried sqlSentry? www.sqlsentry.net It sounds exactly like what you are talking about. It gives a color coded caldendar view of all the jobs and DTS packages on the server for successes and failures. It even sends emails on more conditions than I could even think up. We tried it at my job and management even loves the look of it. I hope they make the purchase. This thing will save hours of troubleshooting.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Recent comments

User login

About our advertising.

Atom Feed

developer.* Blogs also has an Atom feed, located at this url.

Click here for more information about Atom.

A Jolt Award Finalist
Software Creativity 2.0
Foreword by Tom DeMarco

Recent Posters

Based on most recent 60 days, sorted by # of posts and name.

Google
Web developer.*

Who's online

There are currently 1 user and 17 guests online.

Online users

Syndicate

Syndicate content
All views expressed by authors, bloggers, and commentors are their own and do not necessarily reflect the views of developer.* or its proprietors.
Click to read the Copyright Notice.

All content copyright ©2000-2005 by the individual specified authors (and where not specified, copyright by Read Media, LLC). Reprint or redistribute only with written permission from the author and/or developer.*.

www.developerdotstar.com