credits: jaegertracing.io
Let’s say your services get thousands of requests per second, you log enormously and you don’t miss any exception in your logging system. Suddenly, the operation team sent you a message pointing to an error for a customer with id X
on production and you couldn’t understand why all the people around you are over-stressed… and then the penny dropped. The customer X
is actually a friend of the boss! (true story)
Ok… Let’s see what you can do:
You searched X
on your logging system and waited forever as there are TBs of logs. Then you tried it again with a more detailed query such asuserId=X not in this and that but in this at time t
, and got the resu… oh, another problem! Error: wrong query syntax.
Your teammate said that you need to replace not in this
with in not this.
After you also fixed that and waited about 10 secs, 24 results appeared on the screen! Now you are ready to analyze them.
… 10 minutes later…
You figured out that ServiceA
called service ServiceB
and as method LongRunningProcess
in ServiceB
took 3 seconds, service A
returned 500.
After all, you are glad to find the bug in about 20 minutes.
Let’s replay the process assuming that you use Open Tracing and Jaeger.
userId=X
to tags and set the date
Example request that Jaeger traced
In couple of seconds, you can understand that LongRunningProcess
took actually 3 seconds, and as a result, ServiceA
returned 500.
Open tracing is an open standard for distributed tracing, and Jaeger is the tool that implements the standard. Please check the Jaeger architecture from it’s official documentation beforehand.
Today, we are gonna create 2 APIs and 1 console client for our demo environment using .net core. I will also use a simple wrapper that I wrote as it brings simplicity in my opinion.
Let’s start!
If you have .net core
and docker
installed on your workspace, you are ready to go.
Get the docker image for Jaeger
and run it:
docker run -d --name jaeger \ -e COLLECTOR_ZIPKIN_HTTP_PORT=9411 \ -p 5775:5775/udp \ -p 6831:6831/udp \ -p 6832:6832/udp \ -p 5778:5778 \ -p 16686:16686 \ -p 14268:14268 \ -p 9411:9411 \ jaegertracing/all-in-one:1.6
Clone the repo I prepared for this article: https://github.com/skynyrd/opentracing-with-jaeger
We have one .net core solution containing 4 projects: ConsoleClient
, ServiceA
, ServiceB
and JaegerWrapper.
We are going to walk thru all of them. Let’s start with JaegerWrapper
to understand the dynamics of the Jaeger client.
In order to use Jaeger client, we first need to understand how traces work in our code.
A trace is a data/execution path through the system, and can be thought of as a directed acyclic graph of spans.
A span represents a logical unit of work in Jaeger that has an operation name, the start time of the operation, and the duration. Spans may be nested and ordered to model causal relationships.
— Official Jaeger docs.
In an application, we need to have a Tracer that manages our spans. I strongly recommend having one tracer in a microservice for simplicity. After creating the Tracer
and registering it, the jaeger client library starts to analyze the controllers in the system by default, even we don’t need to create a singleTrace
and Span
for that, all created by the OpenTracing.Contrib.NetCore
library. However, if we want to analyze a specific method, or add some additional tag/log to our spans, we need to create a Trace
and bind a Span
to it.
Important Note: If you create and activate a span when there is another active span in the system, it becomes a child. You can see it in the diagram above; parent spans are expandable.
I wrote a simple builder for this purpose and JaegerWrapper
is a class library that contains it, you can use/copy if you like.
_traceBuilder.WithSpanName("LongRunningProcess").WithTag(new StringTag("exampleTag"), "exampleValue").TraceIt(() =>{Thread.Sleep(3000);});
To give an example, this code block is for LongRunningProcess
span represented by a yellow horizontal bar in the diagram above. We added exampleTag: exampleValue
to it and we can even add more complex structures using WithLog
method. Time consumed by TraceIt
is published as a span duration in the GUI. In the example, I used a simple Thread.Sleep(3000)
, but you can also return something (Func
instead of Action
) e.g.:
var result = _traceBuilder....TraceIt(() =>{return "something"}
If you are going to HTTP call a service and you want to preserve your span lifetimes, you need to notify the other service somehow.
For the most cases you would like to use the OpenTracing.Contrib.NetCore
package to automate the configurations, but if you are curious, here is the logic behind that:
Jaeger solves it with HTTP headers that are going to append to the request. You can also use JaegerWrapper
for more abstraction:
For ServiceA
and ServiceB
we need to register the services for Jaeger:
// In ConfigureServices method of Startup class:
GlobalTracer.Register(Tracer);services.AddOpenTracing();
If you also want to use the wrapper, you can simply add these:
var serviceProvider = services.BuildServiceProvider();
services.AddScoped<ITraceBuilder>(t =>new TraceBuilder(serviceProvider.GetService<ITracer>()));
Have a look at AWorldController
of ServiceA
, we call ServiceB
there but didn’t use any JaegerClient or JaegerWrapper method. This is because we added OpenTracing.Contrib.NetCore
library and it traces our requests in the black box, magically.
private static async Task<dynamic> GetBObject(string id){var httpClient = new HttpClient{BaseAddress = new Uri("http://localhost:7334")};
**var** result = **await** httpClient.GetAsync($"/bworld/id/{id}");
**if** (result.IsSuccessStatusCode)
{
**return await** result.Content.ReadAsAsync<**dynamic**\>();
}
**throw new** Exception("uncovered area.");
}
And check the Main method of the ConsoleClient. This time, as this is not a WebAPI, we manually traced our call using JaegerWrapper.
traceBuilder.WithSpanName("MainWork").WithHttpCall(client, url, HttpMethod.Get).TraceIt(() =>{var response = client.GetAsync(url).Result;
**if** (!response.IsSuccessStatusCode)
**throw new** Exception("uncovered area for the demo.");
**var** responseBody = response.Content.ReadAsStringAsync().Result;
Console.WriteLine(responseBody);
});
I used Zipkin
instead of Jaeger
on production couple of years ago, thus I don’t want to write anything about the performance. But I can say that Zipkin was quite performant. If you have production experience with Jaeger, please comment below, I’m curious about it!
Thanks for reading.