Suppression-Based Event Handlers

Author: Pete Kazmier
Version: suppression.txt,v 1.2 2003/08/04 14:09:31 kaz Exp

Introduction

The LogWrap framework comes with a set of event handlers designed to suppress events generated from matched log messages. Suppression of events can be useful if you do not want your action-based event handlers (mail, paging, SNMP traps) triggering upon each and every event. For example, suppose you have the following rule defined in your XML configuration file:

<rule>
  <regexp>Error Code = (\d+), Error Message = (.*)</regexp>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Application Error"
           body="\1: \2"/>
</rule>

As it is currently defined, I will get an email each and every time an error occurs in my application (as seen via the log file). Although this sounds great, in reality, when my application fails it writes several messages to the log file very quickly. This in turn generates several emails to me (flooding my inbox). In this case, it would have been sufficient for me to get a single email for this one application fault. The above example illustrates the reason suppression-based event handlers were created for the LogWrap framework.

In this document, each of the four suppression-based event handlers (WaitToCount, IntervalCount, WaitToFrequency, and LimitToFrequency) will be described in detail so you can start using them in your own environments. However, prior to this a discussion on event handler chaining and matching must be presented to provide the necessary background for the upcoming descriptions.

Event Handler Chaining

Before proceeding to the descriptions of the event handlers, we will explore how the event handlers fit into the LogWrap framework. Lets begin with the specification of handlers in the XML configuration file. When you define a <rule> in your configuration file, you must specify a regular expression (used to match incoming log messages) and then one or more event hanlders that will be used to process any matching messages.

If you define more than one handler, you have created an event handler chain. The chain is simply a list of handlers to be used when processing an event. When an event arrives that matches a rule, each handler in the chain processes the event. The handlers process the event in the order they were defined. For example:

<rule>
  <regexp>Error Code = (\d+), Error Message = (.*)</regexp>
  <handler type="Print" file="stdout"/>
  <handler type="Print" file="stderr"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Application Error"
           body="\1: \2"/>
</rule>

If a message arrives in the log that matches the pattern specified in the above rule, an event chain consisting of three event handlers will be used to process the event. First, the Print handler will write the message to stdout, then another Print handler will write the message to stderr, followed by the Mail handler which will send an email with details from the message.

Technically speaking, an event can have one and only one event handler associated with it. You may be scratching your head then wondering how come we have specified three handlers in the above example if you can only have one handler per event. To answer that question, lets examine how the above rules would be written if we were using the API directly (instead of the XML configuration file). First, we show the original rule where only an email is sent:

# Lets setup our mail handler first.  This just creates the handler,
# it does not associate it with any particular rule yet.
handler = MailEventHandler(toaddrs="pete@example.com",
                           fromaddr="root@example.com",
                           subject="Application Error",
                           body="\1: \2")

# Now we are going to associate the handler to a particular rule.
# Recall, a rule contains a regular expression used to match, and a
# SINGLE event handler used to process events matching this rule.
rule = Rule("Error Code = (\d+), Error Message = (.*)", handler)

As you can see, a single event handler is passed to the constructor of the Rule object which associates it to a particular event that matches the specified regular expression. Now, lets see how we define an event handler chain:

# Define our handlers first.  This does not associate them to any
# specific event, it only creates the handlers.  We define three
# handlers.
handler1 = PrintEventHandler(file="stdout")
handler2 = PrintEventHandler(file="stderr")
handler3 = MailEventHandler(toaddrs="pete@example.com",
                             fromaddr="root@example.com",
                             subject="Application Error",
                             body="\1: \2")

# Here comes the magic.  We are going to group all of the above
# handlers in ONE handler called a ChainEventHandler.
handler = ChainEventHandler(handler1, handler2, handler3)

# With all of the handlers now grouped into one handler, we can now
# pass that single ChainEventHandler to the constructor of the Rule
# object which can only take a SINGLE event handler.
rule = Rule("Error Code = (\d+), Error Message = (.*)", handler)

The above example illustrates the use of a new event handler called the ChainEventHandler. This handler's sole purpose is to enable users to specify more than one handler to process an event. Going back to our XML configuration, the configuration parser automatically creates a ChainEventHandler whenever more than one event handler has been specified in a <rule> block.

There is one additional piece of information that you must be aware of before we proceed to the discussion of the four suppression-based event handlers. Any handler that is part of a chain may terminate the execution of the chain thus bypassing any handlers specified after the one that terminated the chain. This is how the suppression-based handlers operate. If the event should be suppressed, the suppression-based handler will terminate the chain and prevent any other handlers from executing. This is why you must specify these special event handlers before any of your action-based handlers (those that take real action).

Now that you are aware of event handler chains, and how a handler can terminate the chain, you are ready to move on to the actual descriptions of the suppression-based event handlers.

Suppression-Based Matching

Each of the suppression-based event handlers enable you to suppress based on the entire matched message, or only parts of a matched message. Lets use an example to illustrate this concept. Assume you oversee the management of a centralized syslog server in which all of your organization's network devices are configured to send syslog messages whenever they have been rebooted. We can expect to see the following messages in the log file:

mm/dd/yy hh:mm:ss w.x.y.z: Device has been rebooted

Where w.x.y.z is the IP address of the device logging the reboot message. Upon receipt of these messages, you want to send an email to yourself notifying you of the event. However, you don't want to overwhelm yourself with emails so you decide that you only want to get an email when the same device have been rebooted three times.

To specify this rule, I will use the WaitToCount suppression-based event handler, which we haven't discussed yet. As you might imagine, WaitToCount will wait for a particular count of events to occur before allowing the rest of the event chain to proceed. Although we are using WaitToCount as an example, it is important to realize that each of the suppression-based event handlers can be configured with the match_on option.

Your first attempt at the configuration for this rule might look like this:

<rule>
  <regexp>.*? (\d+\.\d+\.\d+\.\d+): .*? rebooted</regexp>
  <handler type="WaitToCount" threshold="3"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Device Rebooted"
           body="\1 was rebooted"/>
</rule>

However, if it did, it would not work as expected. We must be very careful with words and what we expect to happen. Our initial goal was to wait until the same device has been rebooted three times before sending an email. The above rule will only work as expected if the same device was rebooted three times within one second, which is very unlikely. Why is that? The reason is that all of the suppression-based event handlers suppress based on the entire matched message. For example:

08/03/03 12:10:05 10.10.10.10: Device has been rebooted
08/03/03 13:07:36 11.11.11.11: Device has been rebooted
08/03/03 15:10:05 12.12.12.12: Device has been rebooted

If the above three messages arrived in the log file, the event handler "sees" three entirely different events even though they all matched the specified regular expression. In order to generate an email for the above rule, the following messages would have to be logged:

08/03/03 12:10:05 10.10.10.10: Device has been rebooted
08/03/03 12:10:05 10.10.10.10: Device has been rebooted
08/03/03 12:10:05 10.10.10.10: Device has been rebooted

In this case, the three messages match in their entirety and thus trigger an email. Clearly, this is not what we had intended, which is why the match_on option was created. This optional configuration parameter is a comma-separated list of positional parameters identifying the regular expression's match groups (items within the parentheses). When this parameter is specified, only the parts identified by the match groups are used to determine if a message should be considered a match in terms of suppression. Thus, in order to trigger an email when the same device is rebooted three times, we need to match on the IP address instead of the entire message:

<rule>
  <regexp>.*? (\d+\.\d+\.\d+\.\d+): .*? rebooted</regexp>
  <handler type="WaitToCount" threshold="3" match_on="1"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Device Rebooted"
           body="\1 was rebooted"/>
</rule>

In the above configuration block, the match_on parameter was specified as part of the suppression-based event handler. Its value is 1 which corresponds to the IP address in the log message because we have used the regular expression grouping operators (parentheses) surrounding the IP address. The following log messages would trigger an email now:

08/03/03 12:10:05 10.10.10.10: Device has been rebooted
08/03/03 13:07:36 11.11.11.11: Device has been rebooted
08/03/03 15:10:05 12.12.12.12: Device has been rebooted
08/03/03 16:34:11 10.10.10.10: Device has been rebooted
08/03/03 18:52:39 10.10.10.10: Device has been rebooted

What if you wanted to be notified via email when any three devices have been rebooted? In order to write this configuration rule, you need to identify a part of the log message that can be used to match by the suppression handler. Because we want to match on any three devices, we need to use the grouping operators on a part of the log message that would be constant among the log messages:

<rule>
  <regexp>.*? (\d+\.\d+\.\d+\.\d+): .*? (rebooted)</regexp>
  <handler type="WaitToCount" threshold="3" match_on="2"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Device Rebooted"
           body="\1 was rebooted"/>
</rule>

The text rebooted would be constant in all of the log messages and thus would make an ideal candidate to use for our suppression-based matching. You'll notice that parentheses now surround the text which enables us to identify that part of the regular expression as 2 because it is the second item that was been grouped (the IP address is item 1). Consequently, the parameter to match_on has been adjusted to the value of 2. This will now generate an email whenever any three devices have been rebooted.

Although these examples have only used a single value to match_on, you are free to specify as many parts of the message as you'd like. Just use commas to separate the match group identifiers. For example:

<rule>
  <regexp>.*? (\d+\.\d+\.\d+\.\d+): .*? (rebooted)</regexp>
  <handler type="WaitToCount" threshold="3" match_on="1,2"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Device Rebooted"
           body="\1 was rebooted"/>
</rule>

Even though we are able to specify more than one match group, the above example is a poor one as the additional match group does not change the behavior whatsoever. It will still only send an email when the same device has been rebooted three times. Our earlier solution is more concise.

You've now seen the power of the match_on parameter and how to use it properly when matching for suppression. This is important for one to master the correct use of the suppression handlers. Again, the match_on parameter can be used in any of the suppression-based event handlers that will be discussed in the next sections.

WaitToCount Event Handler

The WaitToCount event handler suppresses matching events until the number of matched events has reached the configured threshold. In other words, it waits until n matching events arrive before allowing the handler chain to proceed. The following configuration will only send an email when three messages arrive with the same error code:

<rule>
  <regexp>Error Code = (\d+), Error Message = (.*)</regexp>
  <handler type="WaitToCount" threshold="3" match_on="1"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Application Error"
           body="\1: \2"/>
</rule>

WaitToCount can accept the following options:

threshold (optional)
Specifies the number of matched messages that must arrive before allowing the event handler chain to proceed. The value must be a positive integer. The default value is 3.
reset (optional)
Specifies whether the handler should reset its internal count once the threshold has been crossed. If this is not true, any subsequent messages received after the threshold has been exceeded will enable the handler chain to proceed. The value is a boolean which is specified using 1 to indicate true, or 0 to indicate false. The default value is 1.
match_on (optional)
See the earlier section on suppression-based matching.

IntervalCount Event Handler

The IntervalCount event handler will enable the event handler chain to proceed upon receipt of the first matched message. However, subsequent events are suppressed until the configured number of matched events has intervened. In other words, this is identical to the default behavior of WaitToCount with the exception that the handler chain is permitted to proceed on the first matched event instead of having to wait to the threshold has been crossed. The following configuration will send an email immediately when a message arrives with an error code. Subsequent messages with the same error code are suppressed until every third message:

<rule>
  <regexp>Error Code = (\d+), Error Message = (.*)</regexp>
  <handler type="IntervalCount" threshold="3" match_on="1"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Application Error"
           body="\1: \2"/>
</rule>

IntervalCount can accept the following options:

threshold (optional)
Specifies the number of matched messages that must arrive after allowing the handler chain to proceed. Bear in mind that the chain is permitted to proceed upon receipt of the first matched message. The value must be a positive integer. The default value is 3.
match_on (optional)
See the earlier section on suppression-based matching.

WaitToFrequency Event Handler

The WaitToFrequency event handler suppresses matching events until the number of matched events has reached the configured threshold within the configured interval. In other words, it waits until n matching events arrive in t seconds before allowing the handler chain to proceed. The following configuration will only send an email when three messages arrive with the same error code within an interval of 10-seconds:

<rule>
  <regexp>Error Code = (\d+), Error Message = (.*)</regexp>
  <handler type="WaitToFrequency" threshold="3" interval="10" match_on="1"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Application Error"
           body="\1: \2"/>
</rule>

WaitToFrequency can accept the following options:

threshold (optional)
Specifies the number of matched messages that must arrive within the interval before allowing the event handler chain to proceed. The value must be a positive integer. The default value is 3.
interval (optional)
Specifies the number of seconds in which the number of matched events must arrive before allowing the event handler chain to proceed. The value must be a positive integer. The default value is 60.
reset (optional)
Specifies whether the handler should reset its internal count once the threshold has been crossed. If this is not true, the previous matched messages are used to compute the frequency of message arrival. The value is a boolean which is specified using 1 to indicate true, or 0 to indicate false. The default value is 1.
match_on (optional)
See the earlier section on suppression-based matching.

LimitToFrequency Event Handler

The LimitToFrequency event handler limits the number of matched events to the configured threshold within the configured interval. In other words, only a maximum of n matching events will occur in t seconds. The following configuration will send a maximum of three emails per hour no matter how many messages contain the same error code:

<rule>
  <regexp>Error Code = (\d+), Error Message = (.*)</regexp>
  <handler type="LimitToFrequency" threshold="10" interval="3600" match_on="1"/>
  <handler type="Mail"
           toaddrs="pete@example.com"
           fromaddr="root@example.com"
           subject="Application Error"
           body="\1: \2"/>
</rule>

LimitToFrequency can accept the following options:

threshold (optional)
Specifies the maximum number of matched messages that can arrive within the interval before suppressing further messages and terminating the event handler chain. The value must be a positive integer. The default value is 3.
interval (optional)
Specifies the number of seconds in which the maximum number of matched events must arrive before suppressing further messages and terminating the event handler chain. The value must be a positive integer. The default value is 60.
reset (optional)
Specifies whether the handler should reset its internal count once the threshold has been crossed. If this is not true, the previous matched messages are used to compute the frequency of message arrival. The value is a boolean which is specified using 1 to indicate true, or 0 to indicate false. The default value is 1.
match_on (optional)
See the earlier section on suppression-based matching.

Conclusion

You have seen how LogWrap's suppression-based event handlers operate and how they are integrated into the framework. In addition, each of the packaged suppression-based event handlers has been presented. If one of these handlers does not meet your suppression needs, you can implement your own by writing a custom event handler. There is a detailed tutorial on the subject.