A common problem I’ve experienced in past projects is measuring the performance of some part of an application.
Have you ever refactored a routine and needed to measure the performance difference between the original and the new code? Has a customer ever told you your application is too slow? (Seldom does a customer tell you the application is too fast!). As you add new features to your application and you deploy new versions, do you know the changes in your latest version’s performance? Have you experienced performance issues as your application’s user base grows? Can you identify bottlenecks in your application and determine how memory, cpu utilization, or database access are affecting your application’s performance?
I don’t know about you—maybe I’m just unlucky—but almost every large application I’ve dealt with has had to deal with performance problems.
Performance issues may arise from poorly coded algorithms, poor memory management, excessive network calls, excessive disk activity or anemic hardware. Often bottlenecks don’t become apparent until the user load increases. Have you ever worked on a project where a team member said (or you said yourself), “It works great on my desktop! What do you mean there’s a problem when 500 users try to log in when they come into the office in the morning?”
Before you can effectively respond to performance issues in your application, you need to have a baseline measurement. If you don’t have a baseline, you are immediately in reactive mode when dealing with a failing system. Examples of reactive mode behaviors include trying to placate the customer(s) while working overtime to fix a problem, recycling the app pool to flush memory, or investigating the cause of database locks.
You need to be proactive and build instrumentation into your application as you develop it, and acquire performance data during development and testing before deploying to production.
In my experience, when trying to identify performance hot spots, the development team used ad hoc methods to acquire performance data.
I’ve seen a variety of techniques used in past projects to instrument an application. The most common way is adding logging statements using a framework like Log4Net or the Enterprise Application Logging block. Sometimes the development team created their own ‘framework’ using writes to a custom database table or flat file. In addition to the liberal sprinkling of write/trace statements through the code base is frequent usage of the Stopwatch class to capture timings.
Ad hoc solutions like these often create several problems.
First, the ad hoc solution provides no standard tooling available to control the data capture. Often the data capture is controlled by editing a configuration. Performance data is usually targeted to a flat file or a database. Sometimes custom code is needed to control the destination of the data—to a database or flat file, and additional custom code is created for circular buffering.
Often the instrumentation to capture performance metrics doesn’t work well for a production environment. The instrumentation is added as an afterthought and a special hotfix is deployed to production containing the instrumentation. After the performance issue is resolved, the instrumentation is removed and the fixed application is deployed.
Another problem with many ad hoc solutions is called the observer effect. The observer affect is where measurements of a system cannot be made without affecting the system. Many ad hoc solutions, in adding statements to capture data to a flat file or database may cause changes in the application’s performance as the process writes to the disk or performs database inserts.
And finally, many ad hoc solutions provide a narrow view of an application’s performance problems. Ad hoc solutions I’ve seen make it hard to see a holistic view of the environment in which you can examine memory, CPU usage, disk I/O, or network traffic, in relationship to data collected for the application.
A solution I’ve been exploring which avoids the above problems uses the Event Tracing for Windows (ETW) subsystem to instrument your application.
Prior to .NET 4.5, using the ETW subsystem to instrument an application has been difficult .NET developers. Although the ETW subsystem has been part of the Windows operating systems since Windows 2000, the interfaces to connect to ETW were extremely unfriendly to the .NET programmer. Interfacing to ETW required the user to create a manifest file, register the manifest file, use a resource compiler, and interface to several native methods.
With the introduction of the EventSource class in .NET 4.5, interfacing to the ETW subsystem has become as easy as falling off a log. *
The ETW subsystem dramatically simplifies capturing data to measure the performance of your application. ETW and tools using it offer dramatic advantages over ad hoc solutions for controlling event generation and capturing events for analysis in development and production systems.
The following posts in this series will give an overview of the ETW system, using tools to control data collection through ETW, developing your own tools, and present ideas and techniques to effectively instrument your application.
* Although the EventSource class is built into Net 4.5, for projects which are not yet able to update to NET 4.5, you can reference the EventSource.DLL in your NET 4.0 project.