Informing the IBM Community

Be the Exception. Become an IT Monitoring Hero.


Flat vector illustration of web analytics information and development website statistic

Today’s businesses must be available 24/7 with fewer people having to manage more complex systems and processes.

IT departments receive a constant bombardment of information from a diverse variety of operating systems, business applications and critical processes and support a complex array of servers and devices running across their entire network. To monitor every application and system requires the eyes of a hawk and the arms of an octopus!

What five strategies can you deploy to take control of information overload so it doesn’t take control of you?

1.      Automation, Automation and Automation

My wife says I am lazy. I say I am an automation evangelist. If I can automate something I will. If I don’t know how to automate something I will ask somebody who does know.

The point is, once you correctly automate a process you can repeat it consistently. Consistently is the important word in this context. If a single person is responsible for checking a process then you can’t guarantee consistency; they may be on the telephone, they may be in a traffic jam or they simply may be elsewhere.

If you use an automation tool instead – you can guarantee that the process you’ve defined, or a procedure which is contained within templates, can be repeated again and again without human intervention.

Let’s take an example of ensuring that my ERP system or my High Availability software is active;

  • I know I need to ensure that the jobs, subsystems and processes associated with each application are running at the required times
  • I know that the performance response times must be equal or better than those stated in Service Level Agreements (SLAs)
  • I know that messages and alerts with different severities will be generated from each application or system and these will need dealing with in different ways
  • Maybe I need to take an action if an alert appears only once but a different action or sequence of actions, if an alert appears more than once within a certain time frame
  • Additionally, I may need to perform different actions for the same message depending on the time of day or the day of the week

Here’s the important point; multiply this scenario several hundred times for all my applications and systems and I am set to fail unless I automate the monitoring and operations.

2.      Break down your automation project into bite-sized chunks

You have reached an important first stage when you have installed systems management tools which can monitor and automate the most common scenarios in your organization, but also act as an early warning system for the unexpected.

The next step is to identify the “quick fixes” and “easy wins” first and then chart your progress.

Start by creating a base line of everything you should be monitoring.

By example, for any manual tasks such as check lists that are completed on a daily, weekly or monthly basis, try to attribute cost wherever possible (hourly staff rate) so that you have a useful indicator for management in proving return on investment when you fully deploy an automated solution.

Remember Rome wasn’t built in day.  You are embarking on a continuous improvement program which should reap significant business benefits as you progress your project against each milestone or target.

3.      Use templates for common and specialized monitoring requirements

If you are going to spend your valuable time automating the monitoring and associated actions for each task, then put that process into a template so it can be deployed to multiple systems.

Templates can be grouped by function or application. Multiple templates can then be applied to existing systems – or each newly installed system.

Some examples of how templates can be applied:

  • Create processor, disk space, and memory monitoring rules as a ‘Standard Performance Template’. Apply it to every system.
  • Create an additional template for disk thresholds and disk arm utilization, memory paging faults, and processor bound processes and save it as an ‘Advanced Performance Template’. Apply this template to your database servers.
  • Create Exchange Server templates to ensure services are started, outbound queues are active, log files are cleared and database size increases. Apply this alongside the performance templates to your email servers.
  • Create AIX templates based on the above criteria, plus Linux and IBM i templates and so on.

Or better still, use a product that has point and click technology and best practice example templates included in the package, so you are already one step ahead.

4.      Monitor by exception

It seems so obvious when you say it out loud but ask the question to yourself. “Do you monitor by exception?” Do you check each day that all your backups have completed normally? If you answered yes to that question, then you do not monitor by exception.

I do not check that my backups have completed each day as I do not care. I only care about the backups that have not completed.

I use my automation tools to check if my backups have completed normally and if they have not I automate an action which sends a message to my cell phone and I also have the alert displayed on an Enterprise Management Console, so I have full visibility.

It sounds simple but it means I do not see lots of ‘noise’. I do not see 150 alerts telling me everything is good and 4 alerts telling me something is bad. I only see the 4 alerts I need to action.

5.      Finally, embrace mobile computing smart technology so that information comes to you rather than you having to look for it

You no longer have the luxury of just sitting at your desk looking at consoles and screens; you have meetings to attend and reports to write. You have to prove what value you add to the business just by being employed.

You need to use your automated operations software to come and find you and tap you on your shoulder to say; “Hey you need to look at this and do something about it before something goes wrong.”

You need to embrace a solution that provides automation ‘on the go’. These days, smartphones, tablets and mobile consoles are a necessity, not a nice to have. Look for modern system management solutions that have developed specialist ‘apps’ which enable you to receive alerts on your preferred mobile device.

What’s the alternative?

So that all sounds like common sense, right? Why don’t I just write my own automated scripts and routines? I frequently talk with client’s who ask me that very question. You can write your own scripts and routines based on what you know. But what about what you don’t know? How can you monitor for something that you don’t know may happen? That is why you need a professional monitoring and automated operations product, that nurtures best practices and is also maintained and updated with new features and monitoring requirements before they become an issue.

You should look for an automated solution that preferably uses ‘point and click’ technology on all platforms and does not use scripts, unless you need to for legacy reasons.

What about operating system updates and file structures changes? I have seen so many in-house written utilities that break when changes outside of the author’s control have been made. Such as upgrading to a newer release of the operating system or simply applying an update. Additionally, I have seen many clients using utilities where the original author is no longer working at the company – or worse. Using an automated system management solution takes those concerns away.

Maybe new technologies are introduced as the result of a company merger, acquisition or a technology refresh. How do you know what needs to be monitored and how do you suddenly become a subject matter expert?

Learn more about how Haclyon can help become an IT monitoring hero.

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.