Application Server: Health System: Tracking And Reporting Anomalies
From Resin 4.0 Wiki
Line 3: | Line 3: | ||
== Monitoring Application Server Health Through Statistical Analysis of JMX Attributes == | == Monitoring Application Server Health Through Statistical Analysis of JMX Attributes == | ||
− | Resin | + | The Resin Application Server [http://www.caucho.com/resin-4.0/admin/health.xtp health system] provides many useful tools to monitor, report, and alert on the health of your application server. Monitoring of all the typical metrics such as [http://www.caucho.com/resin-4.0/admin/health-checking.xtp#healthCpuHealthCheck high cpu], [http://www.caucho.com/resin-4.0/admin/health-checking.xtp#healthMemoryTenuredHealthCheck low memory], [http://www.caucho.com/resin-4.0/admin/health-checking.xtp#healthJvmDeadlockHealthCheck deadlocked threads], etc, is pre-configured for you in health.xml. We also include appropriately conservative remediation actions in health.xml, such as triggering [http://www.caucho.com/resin-4.0/admin/health-checking.xtp#healthDumpThreads thread dumps], [http://www.caucho.com/resin-4.0/admin/health-checking.xtp#healthDumpHeap heap dumps], and [http://www.caucho.com/resin-4.0/admin/health-checking.xtp#healthRestart restarts] when necessary. It's up to you to tweak these settings to increase or decrease the aggressiveness of the health system as you see appropriate. |
Latest revision as of 00:00, 28 January 2012
Contents |
Monitoring Application Server Health Through Statistical Analysis of JMX Attributes
The Resin Application Server health system provides many useful tools to monitor, report, and alert on the health of your application server. Monitoring of all the typical metrics such as high cpu, low memory, deadlocked threads, etc, is pre-configured for you in health.xml. We also include appropriately conservative remediation actions in health.xml, such as triggering thread dumps, heap dumps, and restarts when necessary. It's up to you to tweak these settings to increase or decrease the aggressiveness of the health system as you see appropriate.
Resin goes beyond typical metrics monitoring by looking for anomalies in JMX attributes.
Any numeric attribute of any MBean in JMX can be configured as Meter in Resin, which then enables:
- Persistent historical tracking
- Visual graphing in resin-admin
- Visual graphing in PDF reports
- Cluster wide reporting
- Health monitoring
- Anomaly analysis and logging
- Triggering health actions (heap dump, thread dump, restart, etc)
Creating a Meter
Resin comes pre-configured with a set of common meters in health.xml. When adding new meters and/or anomaly analyzers, we recommend you create a new file in conf/resin-inf, which will be automatically imported by Resin. This makes upgrades simpler in the future. Alternatively you can add meters directly to conf/health.xml.
conf/resin-inf/my-meters.xml: <resin xmlns="http://caucho.com/ns/resin" xmlns:resin="urn:java:com.caucho.resin" xmlns:health="urn:java:com.caucho.health" xmlns:ee="urn:java:ee"> <health:JmxMeter> <name>JVM|Thread|JVM Blocked Count</name> <objectName>resin:type=JvmThreads</objectName> <attribute>BlockedCount</attribute> </health:JmxMeter> </resin>
In this example we've created a JMXMeter on the attribute BlockedCount on the MBean resin:type=JvmThreads. This is an important attribute to track, since it reports the number of blocked threads, which can indicate a serious issue when the value increases significantly.
We also provide JMXDeltaMeter, which reports the difference between the current and previous attribute values.
<health:JmxDeltaMeter> <name>JVM|Compilation|Compilation Time</name> <objectName>java.lang:type=Compilation</objectName> <attribute>TotalCompilationTime</attribute> </health:JmxDeltaMeter>
Above, a delta meter is created for compilation time, another important metric to monitor.
Please refer to to resin-doc on Health Meters for more information.
Analyzing a Meter
Meters alone are useful for manual inspection in resin-admin since every meter can be graphed. However Resin provides an extremely useful automatic analysis tool called AnomalyAnalyzer. AnomalyAnalyzer looks at the current meter value, checking for deviations from the average value. So unusual changes like a spike in blocked threads can be detected.
<health:AnomalyAnalyzer> <meter>JVM|Thread|JVM Blocked Count</meter> <health-event>caucho.thread.anomaly.jvm-blocked</health-event> </health:AnomalyAnalyzer>
In this example we've created an AnomalyAnalyzer on the blocked thread meter we created above, and assigned it to the health event "caucho.thread.anomaly.jvm-blocked". The health-event attribute is optional. Without a health-event, an anomaly analyzer alone will only log anomalies it detects to the resin log at WARNING level. These alerts also show up in PDF reports. An example anomaly log is shown below:
2012-01-20 16:10:00 AnomalyAnalyzer JVM|Thread|JVM Runnable Count WARNING value=3.000, deviation=9.487 sigma mean=2.011 std=0.104 n=92.0
Reacting to Anomalies
Resin's health system provides a set of remediation actions that you can configure to automatically execute in reaction to an anomaly. The <health-event> attribute we configured above allows us to tie health actions to a detected anomaly, as shown below:
<health:DumpThreads> <health:IfHealthEvent regexp="caucho.thread"/> <health:IfNotRecent time="15m"/> </health:DumpThreads>
In this example we've created a DumpThreads action with 2 conditions. The first condition, IfHealthEvent, tells the action to execute only if the health event starts with "caucho.thread". The send condition, IfNotRecent, prevents the action from executing more than once every 15 minutes.
Resin provides many other useful conditions that can be applied to any health action.
Here is the example in full, which belongs in conf/resin-inf/my-meters.xml:
<resin xmlns="http://caucho.com/ns/resin" xmlns:resin="urn:java:com.caucho.resin" xmlns:health="urn:java:com.caucho.health" xmlns:ee="urn:java:ee"> <health:JmxMeter> <name>JVM|Thread|JVM Blocked Count</name> <objectName>resin:type=JvmThreads</objectName> <attribute>BlockedCount</attribute> </health:JmxMeter> <health:AnomalyAnalyzer> <meter>JVM|Thread|JVM Blocked Count</meter> <health-event>caucho.thread.anomaly.jvm-blocked</health-event> </health:AnomalyAnalyzer> <health:DumpThreads> <health:IfHealthEvent regexp="caucho.thread"/> <health:IfNotRecent time="15m"/> </health:DumpThreads> </resin>
Full documentation on Resin's Application Health System is available in the public resin-doc.