Posts Tagged jamon

Jenkins as monitoring platform of the poor

The goal

The goal was to monitor some html page and wsdl availability. I don’t really have access to all the monitoring infrastructure and wanted to check my development servers. I was looking for a lightweight way of monitoring them. I’ve mixed jenkins and groovy and ended up to pretty and low-cost monitoring solution 😉

Install the necessary plugin and tools

Manage Jenkins > Manage Plugins :
Groovy plugin : This plugin adds the ability to directly execute Groovy code.
Green Balls : Changes Hudson to use green balls instead of blue for successful builds
Groovy Postbuild Plugin : This plugin executes a groovy script in the Jenkins JVM. Typically, the script checks some conditions and changes accordingly the build result, puts badges next to the build in the build history and/or displays information on the build summary page.

Manage Jenkins > Configure System : Groovy > Groovy installations or Install automatically
For the groovy plugin you can use the built-in tool installer or just point it to a unzipped binary of groovy
GROOVY_HOME : /opt/groovy/

So let’s create a free-style jenkins job with the following settings

Discard Old Builds : Max # of builds to keep : 100
Build periodically : */10 * * * *
Execute Groovy Script : groovy command :

servers = ['ex1.server.com','ex2.server.com','ex3.server.com']

wdsls=[]
simpleurls=[]
servers.each() {host ->
   wdsls.add("http://${host}/ws/MyWebService?wsdl")
   simpleurls.add("http://${host}/ui/MyConsole.html")
}

def koCount=0;
def slowCount=0;
def checkUrl = { url, check ->
 def status ='KO'
 def host =''
   start= System.currentTimeMillis()
    try {
     myurl = new URL(url)
       host =myurl.getHost()
       def text = myurl.getText(connectTimeout: 10000, readTimeout: 10000)
       def ok = check(url,text)
      status = ok?'OK':'KO';
      if (!ok) {koCount++}
    } catch (Throwable t) {
       koCount++
       }
    end= System.currentTimeMillis()
    if ((end-start)>100)
       slowCount++
    println "$host\t"+status+'\t'+(end-start)+'\t'+' '+url
}
def checkAllUrl =  {urls, check -> urls.each() {url ->checkUrl(url,check)}}
def wsdlCheck = {url,content -> content.contains("wsdl:definitions")}
def pingCheck = {url,content -> content.contains("status=NORMAL")}
def contentCheck = {url,content -> content.contains("login")}

checkAllUrl (wdsls,wsdlCheck )
checkAllUrl (simpleurls,contentCheck)

println "ko.count="+koCount
println "slow.count="+slowCount

if (koCount>0 || slowCount >0) {
    System.exit(-1)
}

Build two lists of urls : wsdls, simpleurls based on a list of servers.
A first closure checkUrl get the content of an url and update counters ok, ko
, it’s also receiving another closure that will check the expected content of the url content.
Now depending on the kind of content call the checkAllUrl with matching check closure wsdlCheck ,contentCheck,….

Add a Groovy Postbuild : Groovy script:

def addShortTextSlow = { comp,shortcomp->
matcher = manager.getMatcher(manager.build.logFile, comp+".count=(.*)\$")
if(matcher?.matches()) {
    manager.addShortText(shortcomp+' '+matcher.group(1), "grey", "white", "0px", "white")
}
}
addShortTextSlow('slow','slow')
addShortTextSlow('ko','ko')

That’s it !

Subscribe to the jenkins “RSS for failures” feed or your preferred jenkins notification tool and benefit from the jenkins built-in ui !

You have an history of the checks :

jenkins-monitoring-history

And trending
jenkins-monitoring-trend

You can easily embed this graph or the green/red ball in jira our your wiki :


<a href="http://myjenkins.com/job/monitoring/lastBuild/consoleText">
    <img src="http://myjenkins.com/job/monitoring/buildTimeGraph/png" alt="200" title="200" border="0"/>
</a>

<a href="http://myjenkins.com/job/monitoring/lastBuild/consoleText">
    <img src="http://myjenkins.com/job/monitoring/lastBuild/buildStatus" border="0">
</a>

The sky is the limit !

Ok now you got the idea… let’s add some checks to gather

— check some open ports :

try {
    s = new Socket(host, port);
    s.withStreams { input, output ->	}
    println "management port ok $host $port"
} catch (Exception e){
    koCount++
    println "management port KO for  $host $port : "+e.getMessage()
}	

— access jmx beans

import javax.management.remote.*
def serverUrl = new JMXServiceURL('service:jmx:rmi:///jndi/rmi://ex1.server.com:9999/jmxrmi')
def server = JMXConnectorFactory.connect(serverUrl).MBeanServerConnection;
def memory = new GroovyMBean(server, 'java.lang:type=Memory')
println memory.listAttributeNames() 
println memory.listOperationNames() 

— some jamon statistics :

jamonurls=[]
jamonurlsuffix='/jamonadmin.jsp?sortCol=2&sortOrder=desc&displayTypeValue=RangeColumns&RangeName=ms.&outputTypeValue=xml&formatterValue=%23%2C%23%23%23&TextSize=0&highlight=&ArraySQL=^WS-|^Fault&'

servers.each() {host ->	jamonurls.add("http://${host}"+jamonurlsuffix)}

def fixJamonXml= {
	xml ->
	if (xml.indexOf("No data was returned") != -1) {
		return '<JAMonXML></JAMonXML>';
	}
	String content = xml.substring(xml.indexOf('<JAMonXML>'));
	rangeLabels = [ "0_10ms", "10_20ms","20_40ms","40_80ms","80_160ms","160_320ms","320_640ms","640_1280ms","1280_2560ms","2560_5120ms","5120_10240ms","10240_20480ms"];
	content = content.replaceAll( '<Label>','<Label><![CDATA[');
	content = content.replaceAll( '</Label>',']]></Label>');
	rangeLabels.each() {
		rangeLabel -> content = content.replaceAll(rangeLabel, "range_" + rangeLabel);
	}
	content = content.replaceAll(  "LessThan_0ms", "range_LessThan_0ms");
	content = content.replaceAll( "GreaterThan", "range_GreaterThan");
	return content
}

def jamonCheck= {
	url,content ->
	monitors = []
	def JAMonXML = new XmlSlurper().parseText(fixJamonXml(content))
    def parseLong =  { t ->  if (t.text().equals("")) return null; Long.valueOf(t.text().replaceAll(',', ''))}
    def parseLongString =  { t ->  if (t.equals("")) return null; Long.valueOf(t.replaceAll(',', ''))}
    def parseRange = {
		rangeText ->	// 15/10.2 (0/0/0)
		// http://docs.codehaus.org/display/GROOVY/Tutorial+5+-+Capturing+regex+groups
		rangeFormat = /(.*)\/(.*) \((.*)\/(.*)\/(.*)\)/
		matched = ( rangeText.text() =~ rangeFormat )
		if (matched.matches()) {
			return [	'label':rangeText.name() , hits : parseLongString(matched[0][1]),average:matched[0][2]]
		}
		return [	'label':rangeText.name() , hits : 0,average:0.0]
	}
	println "************************"+ url
	JAMonXML.children().each() { row ->
		monitors.add( [
			'label' : row.Label,
			'units' : row.Units,
			'hits' : parseLong(row.Hits),
			'avg'  : parseLong(row.Avg),
			'total' : parseLong(row.Total),
			'stddev' : parseLong(row.StdDev),
			'lastvalue': parseLong(row.LastValue),
			'min' : parseLong(row.Min),
			'max' : parseLong(row.Max),
			'active' : parseLong(row.Active),
			'avgActice':parseLong(row.AvgActive),
			'maxActice':parseLong(row.MaxActive),
			'firstAccess':row.FirstAccess,
			'lastAccess' : row.LastAccess,
			'ranges' : [
				'range_LessThan_0ms' :parseRange(row.range_LessThan_0ms),
				'range_0_10ms' : parseRange(row.range_0_10ms),
				'range_10_20ms' : parseRange(row.range_10_20ms) ,
				'range_20_40ms' : parseRange(row.range_20_40ms),
				'range_40_80ms':parseRange(row.range_40_80ms),
				'range_80_160ms' : parseRange(row.range_80_160ms) ,
				'range_160_320ms' : parseRange(row.range_160_320ms),
				'range_320_640ms' : parseRange(row.range_320_640ms),
				'range_640_1280ms' : parseRange(row.range_640_1280ms),
				'range_1280_2560ms' : parseRange(row.range_1280_2560ms) ,
				'range_2560_5120ms' : parseRange(row.range_2560_5120ms),
				'range_5120_10240ms' : parseRange(row.range_5120_10240ms),
				'range_10240_20480ms' : parseRange(row.range_10240_20480ms),
				'range_GreaterThan_20480ms': parseRange(row.range_GreaterThan_20480ms)]
		] )
	}
	/**
	 *  1      0      10ms
		2     10      20ms
		3     20      40ms
		4     40      80ms 
		5     80     160ms
		6    160     320ms
		7    320     640ms
		8    640    1280ms
		9   1280    2560ms
		10  2560    5120ms
		10  5120   10240ms
		12 10240   20480ms
		13 >>      20480ms
	 */
	def getPercentiles = {monitor ->
	    def ps = [0.5,0.8,0.9,0.95,0.98,0.99]
		def ranges = [];
		monitor.ranges.eachWithIndex() {it, i -> ranges.add(it.value.hits) }	
		def rangesCumulative  = [];	 
		(0..13).each() {i -> rangesCumulative.add (monitor.hits>0?ranges[i]/monitor.hits:0)}
		def percentages= (0..13).collect() {i -> rangesCumulative[1..i].sum()}
		def percentiles = ps.collect{ percentile->percentages.findIndexOf{it>=percentile}}
	   return percentiles
    }
	percentileserrors = [];
	monitors.each {
		percentiles = getPercentiles(it)
		println percentiles.join('\t') + "\t"+it.label
		if (percentiles[2]>8) {
			percentileserrors.add(it.label)
		}		
	}	

	return percentileserrors>0;
}
checkAllUrl (jamonurls,jamonCheck)

, , , ,

4 Comments

Performance : when average is not enough ?

Application performance can’t be summarized to an average and a standard deviation. Most performance issues aren’t so clear… jamonapi can help identifying your bottlenecks

Same average, same standard deviation, not same reality

Most application performance solutions are collecting performance data and only keep average and standard deviation (stddev). But application performance rarely follows normal distribution. Two samples with the same average and stddev doesn’t imply happy users.

Let’s suppose you have a first release of your app and see an histogram like this one

Most users are happy with a average of 1.9 seconds and standard deviation of 0.6 seconds

Let’s introduce our version 2.0 of the application. Our monitoring still shows an average of 1.9 seconds and standard deviation of 0.6 seconds.

But you receive a lot of feedback : 50%  of your end-users are complaining about bad performance… what’s going on ?

on the left the happy users… and on the right your unhappy end-users !
Hopefully you can easily instrument your application with jamon and discover this distribution.

Jamon is to System.currentTimeMillis() what log4j is to System.out.println()

jamon

Jamon collects “stop/start” events and aggregates the logarithmic distribution of these events/monitor.

  • 0-10ms. 11-20ms. 21-40ms. 41-80ms. 81-160ms. 161-320ms. 321-640ms.
  • 641-1280ms. 1281-2560ms. 2561-5120ms.
  • 5121-10240ms. 10241-20480ms. >20480ms.

It also keeps for each monitor additional informations like :

  • Hits
  • Avg ms.
  • Total ms.
  • Std Dev ms.
  • Min ms.
  • Max ms.
  • Active
  • Avg Active
  • Max Active
  • First access
  • Last access

The active, avg active and max active shows the degree of concurrency of your monitor.

Jamon feature and advantages :

  • easy installation : drop 3 jars, a bunch of jsp that’s it
  • production ready with low overhead
  • a servlet filter to monitor url time response by just modifying the web.xml
  • datasource wrapper to gather sql statistics (just an extra bean in your application)
  • spring integration via aop JamonPerformanceMonitorInterceptor
  • for non web application like batch processing or junit, you can write the jamon stats to a csv via a jmx console or a the end of the process.

Real life usage

0. mesure don’t guess
1. enable it in production
2. sort by total time
3. detect what can be improved in these use case : db indexes, hibernate batch-size,…
4. fix and rollout in production
5. goto 1 😉

Alternative to jamon

  • codahale metrics looks really promising with implementation for Gauges, Ratio Gauges, Percent Counters, Histogram, Meters,… integration with Guice, Jetty, Log4j, Apache HttpClient, Ehcache, Logback, Spring, Ganglia, Graphite
  • javasimon : quantiles, hierarchichal, nanoseconds,… but jdk 1.6 and I’m stuck with websphere 😦
  • datasource-proxysql no distribution but can summarize sql interactions per http request.But can be linked with other librairies

, ,

4 Comments