dendrite/docs/opentracing.md

113 lines
4.8 KiB
Markdown

Opentracing
===========
Dendrite extensively uses the [opentracing.io](http://opentracing.io) framework
to trace work across the different logical components.
At its most basic opentracing tracks "spans" of work; recording start and end
times as well as any parent span that caused the piece of work.
A typical example would be a new span being created on an incoming request that
finishes when the response is sent. When the code needs to hit out to a
different component a new span is created with the initial span as its parent.
This would end up looking roughly like:
```
Received request Sent response
|<───────────────────────────────────────>|
|<────────────────────>|
RPC call RPC call returns
```
This is useful to see where the time is being spent processing a request on a
component. However, opentracing allows tracking of spans across components. This
makes it possible to see exactly what work goes into processing a request:
```
Component 1 |<─────────────────── HTTP ────────────────────>|
|<──────────────── RPC ─────────────────>|
Component 2 |<─ SQL ─>| |<── RPC ───>|
Component 3 |<─ SQL ─>|
```
This is achieved by serializing span information during all communication
between components. For HTTP requests, this is achieved by the sender
serializing the span into a HTTP header, and the receiver deserializing the span
on receipt. (Generally a new span is then immediately created with the
deserialized span as the parent).
A collection of spans that are related is called a trace.
Spans are passed through the code via contexts, rather than manually. It is
therefore important that all spans that are created are immediately added to the
current context. Thankfully the opentracing library gives helper functions for
doing this:
```golang
span, ctx := opentracing.StartSpanFromContext(ctx, spanName)
defer span.Finish()
```
This will create a new span, adding any span already in `ctx` as a parent to the
new span.
Adding Information
------------------
Opentracing allows adding information to a trace via three mechanisms:
- "tags" ─ A span can be tagged with a key/value pair. This is typically
information that relates to the span, e.g. for spans created for incoming HTTP
requests could include the request path and response codes as tags, spans for
SQL could include the query being executed.
- "logs" ─ Key/value pairs can be looged at a particular instance in a trace.
This can be useful to log e.g. any errors that happen.
- "baggage" ─ Arbitrary key/value pairs can be added to a span to which all
child spans have access. Baggage isn't saved and so isn't available when
inspecting the traces, but can be used to add context to logs or tags in child
spans.
See
[specification.md](https://github.com/opentracing/specification/blob/master/specification.md)
for some of the common tags and log fields used.
Span Relationships
------------------
Spans can be related to each other. The most common relation is `childOf`, which
indicates the child span somehow depends on the parent span ─ typically the
parent span cannot complete until all child spans are completed.
A second relation type is `followsFrom`, where the parent has no dependence on
the child span. This usually indicates some sort of fire and forget behaviour,
e.g. adding a message to a pipeline or inserting into a kafka topic.
Jaeger
------
Opentracing is just a framework. We use
[jaeger](https://github.com/jaegertracing/jaeger) as the actual implementation.
Jaeger is responsible for recording, sending and saving traces, as well as
giving a UI for viewing and interacting with traces.
To enable jaeger a `Tracer` object must be instansiated from the config (as well
as having a jaeger server running somewhere, usually locally). A `Tracer` does
several things:
- Decides which traces to save and send to the server. There are multiple
schemes for doing this, with a simple example being to save a certain fraction
of traces.
- Communicating with the jaeger backend. If not explicitly specified uses the
default port on localhost.
- Associates a service name to all spans created by the tracer. This service
name equates to a logical component, e.g. spans created by clientapi will have
a different service name than ones created by the syncapi. Database access
will also typically use a different service name.
This means that there is a tracer per service name/component.