Distributed Tracing: The good, The Bad, and the Ugly

A presentation at Open Source North 2022 in May 2022 in St Paul, MN, USA by Ricardo Ferreira

Slide 1

Slide 1

Distributed Tracing: The Good, the Bad, and the Ugly Ricardo Ferreira Senior Developer Advocate Amazon Web Services © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 2

Slide 2

Before we begin: Happy hour: it starts at 4:00 PM. New trailer for Thor: Love and Thunder. I welcome your questions anytime. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 3

Slide 3

What is distributed tracing anyway? © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 4

Slide 4

It is a type of logging. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 5

Slide 5

Logging 101 © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 6

Slide 6

Stitching additional data is hard © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 7

Slide 7

© 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 8

Slide 8

Because logging with distributed systems is hard. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 9

Slide 9

Single-machine threading Log{“/customer/find”, 235, 200, 30} API Thread 2 Thread 1 Log{“/api/find”, 840, 200, 35} Log{“/api/find”, 840, 200, 45} Customer API Database Customer Log{“/customer/find”, 235, 200, 42} © 2022, Amazon Web Services, Inc. or its affiliates. Log{“/db/find”, 450, 200, 5} Database Log{“/db/find”, 450, 200, 3} @riferrei

Slide 10

Slide 10

Correlating network distributed threads is hard © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 11

Slide 11

Reason why is called distributed tracing Distributed Transaction Microservice A Microservice B © 2022, Amazon Web Services, Inc. or its affiliates. Microservice C @riferrei

Slide 12

Slide 12

Black-box versus white-box instrumentation Black-box White-box • Code is not changed • Require code changes • Handled by some runtime • Handled by the application • Minimal execution visibility • Better execution visibility © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 13

Slide 13

Ideally; you should use both of them. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 14

Slide 14

But what about logs and metrics? © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 15

Slide 15

The famous trio in the SRE world Metrics: hey, something is not right. Traces: I can reveal the culprit for you. Logs: Let me show you what happened. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 16

Slide 16

Best way to get started with this? © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 17

Slide 17

High-quality, ubiquitous, and portable telemetry to enable effective observability. https://opentelemetry.io © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 18

Slide 18

OpenTelemetry: SDKs, protocols, and integration © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 19

Slide 19

AWS Distro for OpenTelemetry https://aws-otel.github.io/download © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 20

Slide 20

Distributed tracing: the good parts 👍‍ © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 21

Slide 21

Decrease time spent with Incident management 10% 60% 30% • Monitoring metric values • Understanding topologies • Reading logs and events • Catching up with alerts • Isolating the anomalies • Reproducing exceptions • Evaluating trends and changes • Collecting context data • Creating code patches © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 22

Slide 22

Finding bugs between releases V1.12 V1.13 © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 23

Slide 23

Understand what the code (really) does. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 24

Slide 24

Distributed tracing: the bad parts 👎‍ © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 25

Slide 25

My language. Your language. Their language. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 26

Slide 26

UI and frontends give the vendors goosebumps. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 27

Slide 27

Asynchronous programming has its challenges. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 28

Slide 28

To correlate, or not correlate, that is the question. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 29

Slide 29

eBPF is the future; but not the present. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 30

Slide 30

Distributed tracing: the ugly stuff 🤮 © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 31

Slide 31

© 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 32

Slide 32

Native binaries may not come in handy for you. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 33

Slide 33

Black-box is great; but not for everybody. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 34

Slide 34

Sampling is a nightmare to get it right. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 35

Slide 35

Mixing open standards with proprietary stuff. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 36

Slide 36

Be prepared to become a data engineer. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 37

Slide 37

Thank you! Ricardo Ferreira @riferrei © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei