Instrumenting Your Application to Measure its Performance Part 2 An Overview of the ETW subsystem

 

The ETW subsystem has been written about extensively in other blogs and magazine articles. What I wish to accomplish here is provide a brief overview of the ETW subsystem to provide a common context for the future articles on this blog of using the ETW subsystem. To drill deeper into understanding the ETW subsystem, I suggest the following MSDN Magazine article as a starting point:

Improving Debugging and Performance Tuning with ETW
http://msdn.microsoft.com/en-us/magazine/cc163437.aspx

ETW provides a high speed, low overhead, system wide data collection subsystem.

The following is an architectural view of the ETW system copied from the article referenced above:

image

Let’s simplify what this diagram is about.

An Event is a message to the ETW system containing data for logging. In context of these articles, an Event is the message generated using the EventSource class.

A Controller is an application interfacing to the ETW subsystem to enable or disable event collection. A controller is usually an out of box application such as Logman, PerfView, or the Semantic Logging Application Block (SLAB). A controller can also be your own custom application written to interface to ETW.

A Provider is an application sending event messages to the ETW subsystem. In the context of these articles, a provider is your application, using the EventSource class to send messages to the ETW subsystem.

A Consumer is an application gets events from the ETW subsystem. The consumer may read and display the events to the user in real time or read events from a file.

A Session is the ‘pipe’ between ETW and the controller and establishes the internal buffers and memory management of event collection to an Event Trace Log (ETL) file.

Often applications are both controllers and consumers. For example, PerfView can be used as a controller to enable and disable one or more event providers. After stopping event collection and ETW completes writing events to an ETL file, PerfView collects (consumes) the events from an ETL file for analysis.

Or as another example, you can use Logman as a controller to enable and disable event generation. Then use PerfView to consume events from a ETL file.

Similarly, SLAB is often both a controller and consumer. SLAB can be configured to enable (control) events and consume events, directing the captured events to a console, a flat file, a rolling flat file, a database table, or the NT EventLog.

Pub-Sub Pattern

For those familiar with the Publisher Subscriber pattern, you may be thinking “That ETW Subsystem looks a lot like a Pub Sub system”. And I would agree with you.

From Wikipedia Publish-Subscribe pattern
http://en.wikipedia.org/wiki/Publish%E2%80%93subscribe_pattern

In software architecture, publish–subscribe is a messaging pattern where senders of messages, called publishers, do not program the messages to be sent directly to specific receivers, called subscribers. Instead, published messages are characterized into classes, without knowledge of what, if any, subscribers there may be. Similarly, subscribers express interest in one or more classes, and only receive messages that are of interest, without knowledge of what, if any, publishers there are.

Let’s reword the above Wikipedia passage a little to put it into context of ETW with events generated from an application using the EventSource class.

The ETW Subsystem uses the publish-subscribe pattern where senders of events, called providers do not program the messages to be sent directly to specific receivers, called consumers. Instead, published events are characterized into classes (derived from EventSource) without knowledge of what, if any consumers (PerfView or SLAB) there may be. Similarly, controllers express interest in one or more events defined in classes derived from EventSource) and consumers only receive messages that are of interest, without knowledge of what, if any providers there are.

Remember that the ETW subsystem has been packaged with Windows since Windows 2000. The beauty of the ETW system is that it has so many providers within the Windows ecosystem. Providers such as the .NET Garbage Collector, disk I/O subsystem, and CPU—where you can actually drill down on CPU activity per core on your machine.

The advantage of ETW over ad hoc solutions it does memory, thread and buffering management to provide a high speed, low overhead, data collection system— external to your process.

How fast and how much overhead does using ETW have when you use it to instrument your application?

The following snippet is from the Vance Morrison’s blog Windows high speed logging: ETW in C#/.NET using System.Diagnotics.Tracing.EventSource.

http://blogs.msdn.com/b/vancem/archive/2012/08/13/windows-high-speed-logging-etw-in-c-net-using-system-diagnostics-tracing-eventsource.aspx

How fast is ETW? Well, that is actually super-easy to answer with ETW itself because each event comes with its own high-accuracy timestamp. Thus you can run the EventSource demo and simply compare the timestamps of back-to-back events to see how long an event takes. On my 5 year old 2Ghz laptop I could log 10 events in 16 usec so it take 1.6 usec to log an event. Thus you can log 600K events a second. Now that would take 100% of the CPU, so a more reasonable number to keep in your head is 100K. That is ALOT of events. The implementation does not take locks in the logging path and any file writes are handled by the operating system asynchronously so there is little impact to the program other than the CPU it took to actually perform the logging.

I encourage you to read the complete referenced post.

Because you can collect data from multiple providers, you can obtain a more holistic view of your applications performance than using ad hoc tracing / logging solutions.

Consider the story of three blind men asked to describe an elephant. One blind man stood in front of the elephant and ran his hands along the elephant’s trunk. The second blind man stood beside the elephant ran his hands along the elephant’s side. The third blind man stood behind the elephant and ran his hand along the elephant’s tail. Each blind man tried to describe an elephant. Because each had an incomplete view of the elephant, none of the blind men described the elephant correctly.

Using ad hoc approaches to capture performance data in a large scale enterprise wide application, an application defined as multiple layers, using service oriented architecture, utilizing multiple threading, having multiple touch points and multiple user loads, you may find your application to be an elephant and yourself like one of the story’s blind men.

The beauty of the ETW system is that once your application becomes a provider, you can use a controller to enable events from your application as well as other providers and consolidate that data to get a holistic system view.

You can partition your application into areas of responsibility and use features of EventSource to filter specific parts of the application.

Think of your application in terms of contexts, where each context becomes an event provider. For example, consider a sales order application. Taking the idea of Bounded Context from domain driven design, you may view your sales order application as groups of responsibility. One area of the application is responsible for customer service, another part of the application is responsible for product returns, another for sales, and another for billing. By putting contextual boundaries around parts of your application, you give consideration to what ‘context’ that code belongs to.

In terms of instrumentation, your application is not just a sales order application, but is a grouping of contexts– for example a sales context and a billing context. By viewing your application as groups of contextual responsibilities, you can provide finer grain control of application instrumentation.

Given a basic understanding of the ETW subsystem and how using it can help identify performance problems, in the next article of this series I’ll describe the simplest way to make your application an event provider and how to use the Logman and PerfView tools.

This entry was posted in ETW and tagged , . Bookmark the permalink.

One Response to Instrumenting Your Application to Measure its Performance Part 2 An Overview of the ETW subsystem

  1. Pingback: Windows Store Developer Links – 2013-07-12 | Dan Rigby

Leave a Reply

Your email address will not be published. Required fields are marked *


*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>