Software Development
Blogs and Discussion
developer.*
Books Articles Blogs Subscribe d.* Gear About Home

Independence, Transparency, and Monitorability in the Design of Unattended Software Systems

Donna Davis recently posted regarding system design and maintenance quandaries related to back-end processes that need to run unattended, and which need to be monitored, supported, and maintained by humans. Donna in particular points out some of the types of issues that can arise due to the dependence on a small number of persons who must always be available to support the system, and the difficulties that arise from the need for a highly technical person to judge the "health" of the system at any given time, or to recover from failures. Donna also laments some particular shortcomings that SQL Server 2000 has in supporting unattended systems.

Donna's post actually dovetails nicely with a related topic that has been on my mind a lot lately. In generalizing the kind of issues Donna brings up I have concluded that the essential qualities are independence, transparency, and monitorability. That is, the combination of these three general qualities is essential in the kinds of back-office, unattended, usually data-oriented systems Donna describes. I think these kinds of systems are common in at least three scenarios:

  • operational support processes, feeding into, extracting from, or performing jobs on a transactional data store or other type of operational database
  • BI-related processes, performing ETL processes to feed data warehouses, data marts, staging databases, aggregation tables, and the like
  • enterprise integrations, linking heterogenous systems together, either for operational or business intelligence purposes

The lines between these categories are probably somewhat blurry in specific cases, but it seems like a useful taxonomy. One key quality these kinds of systems have in common is that they are usually unattended in nature--or at least people would prefer them to be. Robust and reliable unattended operation, though, does not come for free. We must design for it and build it in. So I've generalized this into the concept of "independence," which brings with it the need for "transparency," and the related requirement of "monitorability." I'll try to explain exactly what I mean by these three terms.

Independence to me means the ability of the system to run on it's own--indeed, the *requirement* for the system to do this. I think the reason systems of this type often fall short is the independence is an implicit requirement, which can cause developers to miss it. A whole set of techniques/qualities, from exception handling, to bootstrapping, to "restartability," (not to mention transparency and monitorability) must be designed into the system from the beginning.

Transparency to me means that the inner workings of the system can be examined, especially after the fact. Transparency is key for support personnel, DBAs, and maintenance programmers who need to be able to trace problems back to the source, understand what data transformations took place, analyze how long processes took to run, trace the cause of failures, ferret out possible data corruption, etc. If the system is not designed to offer these kinds of capabilities support and maintenance can be difficult.

Monitorability overlaps with the concept of transparency, but here we get more into the need for management to sleep at night knowing that unattended processes are accounted for, for DBAs to take a vacation and get other work done, and for non-technical operators to be able assess the health of the system. Exception logs, screen output, operational logs, and status emails offer a basic level of monitorability, but this can be taken much further with dashboards and the like.

On area that Donna raises in her post that I find interesting is that the concepts of "success" and "failure" are domain-specific and difficult to generalize; the capabilities of a generalized/horizontal system like a relational database system or ETL tool to judge the "success" or "failure" of a task (or a set of tasks) are often too coarse to be truly useful to the unattended system designer.

A job management system like SQL Agent, DTS, or the new SQL Server 2005 Integration Services (SSIS, the replacement for DTS) can really only judge whether an error occurred or whether a status return code indicates success or failure. Only the custom logic of the system can be aware that "success" really means, for example, that the rows updated in table A minus the rows inserted into table B must balance with the rows deleted from table C.

One additional issue of interest, I think, is the personnel called on to design and build unattended systems. From my observation, these kinds of projects are either implemented by amateur technologists who are just trying to do their best under fire to get something working or by less experienced developers who have not had the pleasure of monitoring, supporting, and maintaining production systems of this unique type.

Often these "systems" don't start out as systems at all, but grow to become sprawling systems over time, one script at a time, one hack at a time. With each addition, the enterprise becomes more dependent and the system becomes harder to support. Assuming you have the luxury of foresight, and depending on the complexity and criticality of such a system for your enterprise, it might be more effective to put your best people on the macro-level architecture and design to ensure that the qualities I've described are baked into the system from the ground up.

What are your thoughts? Please post them below. I'm interested in discussing and exploring what we can do as software developers, managers, and QA engineers to achieve "ITM." Over time I'd like to build this informal blog post into a more formal and complete essay on the subject, perhaps for collection into a book of similar essays (more on that topic at a later date). I welcome your input.

Thanks for reading,
Dan

User login

About our advertising.

Atom Feed

developer.* Blogs also has an Atom feed, located at this url.

Click here for more information about Atom.

A Jolt Award Finalist
Software Creativity 2.0
Foreword by Tom DeMarco

Recent Posters

Based on most recent 60 days, sorted by # of posts and name.

Google
Web developer.*

Who's online

There are currently 0 users and 24 guests online.

Syndicate

Syndicate content
All views expressed by authors, bloggers, and commentors are their own and do not necessarily reflect the views of developer.* or its proprietors.
Click to read the Copyright Notice.

All content copyright ©2000-2005 by the individual specified authors (and where not specified, copyright by Read Media, LLC). Reprint or redistribute only with written permission from the author and/or developer.*.

www.developerdotstar.com