When Kafka is the source of truth; schemas become your source of headaches

A presentation at Current 2022: The Next Generation of Kafka Summit in October 2022 in Austin, TX, USA by Ricardo Ferreira

Slide 1

Slide 1

When Kafka is the Source of Truth; Schemas Become your Source of Headaches Ricardo Ferreira Senior Developer Advocate Amazon Web Services © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 2

Slide 2

“The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong, it usually turns out to be impossible to get at or repair.” — Douglas Adams, Mostly Harmless © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 3

Slide 3

👋‍ Hi, I’m Ricardo Ferreira • Developer Advocate at AWS. • Fanatic Marvel fan. My favorite characters are Daredevil, Venon, Deadpool, and the Punisher. • Yes, it’s my birthday today. 😅 © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 4

Slide 4

Back to basics: data serialization and deserialization © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 5

Slide 5

Let’s do some 🧑💻‍ coding • Scan the bar code in the right for the GitHub repository. • Each example has a unique number and a use case name. • In the folder that starts with “00-” you can find a Docker Compose file to spin up a Kafka deployment for testing. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 6

Slide 6

Binary Encoding with ProtoBuf, Thrift, and Avro © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 7

Slide 7

Schema using Protocol Buffers message Person { string userName = 1; optional int64 favoriteNumber = 2; repeated string interests = 3; } © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 8

Slide 8

Schema using Protocol Buffers 0a 07 52 69 63 61 72 64 6f 10 010 0a 000 10 010 4d 61 72 76 65 6c R i c a r d o 52 69 63 61 72 64 6f 31 34 length = 6 field tag = 3 string 00011 07 06 value = 14 field tag = 2 int64 00010 1a length = 7 field tag = 1 string 00001 31 34 1a 06 © 2022, Amazon Web Services, Inc. or its affiliates. M a r v e l 4d 61 72 76 65 6c @riferrei

Slide 9

Slide 9

Schema using Thrift struct Person { 1: required string userName, 2: optional i64 favoriteNumber, 3: optional list<string> interests } © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 10

Slide 10

Schema using Thrift (BinaryProtocol) 0b 00 01 00 03 0b 00 00 00 07 00 00 00 01 52 69 63 61 72 64 6f 00 00 00 06 0a 00 02 31 34 0f 4d 61 72 76 65 6c type 11 (string) field tag = 1 length = 7 0b 00 01 00 00 00 07 type 10 (i64) field tag = 2 value = 14 0a 00 02 31 34 type 15 (list) field tag = 3 type 11 (string) items list = 1 length = 6 M a r 0f 00 03 0b 00 00 00 01 00 00 00 06 4d 61 72 76 65 6c R i c a r d o 52 69 63 61 72 64 6f © 2022, Amazon Web Services, Inc. or its affiliates. v e l end of struct 00 @riferrei

Slide 11

Slide 11

Schema using Avro { “type”: “record”, “name”: “Person”, “fields”: [ {“name”: “userName”, “type”: “string”}, {“name”: “favoriteNumber”, “type”: [“null”, “long”], “default”: null}, {“name”: “interests”, “type”: {“type”: “array”, “items”: “string”}} ] } © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 12

Slide 12

Schema using Avro 0c 52 69 63 61 72 64 6f 02 31 34 R length = 7 00001100 0c union branch (long, null) 00000010 04 0e i c 4d 61 72 76 65 6c a r d 00 o 52 69 63 61 72 64 6f value = 14 02 31 34 array items = 1 00000100 04 M a r length = 6 00010001 0e v e l 4d 61 72 76 65 6c © 2022, Amazon Web Services, Inc. or its affiliates. end of array 00 @riferrei

Slide 13

Slide 13

Same record; different binary encoding Protocol Buffers Thrift Avro 0a 07 0b 00 01 00 03 0c 52 69 63 61 72 64 6f 0b 00 00 00 07 00 00 00 01 52 69 63 61 72 64 6f 10 31 34 1a 06 52 69 63 61 72 64 6f 0a 00 00 00 06 02 31 34 4d 61 72 76 65 6c 00 02 31 34 0f 4d 61 72 76 65 6c 00 4d 61 72 76 65 6c 04 © 2022, Amazon Web Services, Inc. or its affiliates. 0e @riferrei

Slide 14

Slide 14

Schema compatibility • Backward compatibility: Newer code can read data from older code • Forward compatibility: Older code can read data from newer code © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 15

Slide 15

© 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 16

Slide 16

Required/Optional fields • Not written into the binary format. • Used to provide runtime checks. • Useful to catching bugs in the code that writes data into the streams. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 17

Slide 17

Schema evolution • Change field names, not the tags. • New fields must use new tags. • Old just code ignore unknown tags. • Datatype annotations help the parser to know how many bytes to skip when old code is reading new code. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 18

Slide 18

Changing data types • Possible, but it may lose precision or get data being truncated. Read the documentation. • Change from single value to list depends on the binary format. • Protocol Buffers is just repeated in the format. It can be skipped. But Thrift requires data to be there. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 19

Slide 19

Can I switch the Schema Registry to another registry? © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 20

Slide 20

Time for 🧑💻‍ coding again • Scan the bar code in the right for the GitHub repository. • Each example has a unique number and a use case name. • In the folder that starts with “00-” you can find a Docker Compose file to spin up a Kafka deployment for testing. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 21

Slide 21

Summary 1. Each programming language handles serialization differently. 2. Schema Registry is a database for schemas. It’s not Doctor Strange. 3. A record payload is not just the payload. Mind the Schema IDs. 4. Backward/forward compatibility is different in each binary format. 5. Migrating from one registry to another is possible. But not easy. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 22

Slide 22

Chapter 4: Encoding and Evolution • Many thanks to Martin Kleppmann. • This chapter teaches all about code that can evolve using schemas. • This session is just a small portion of the chapter. I recommend you read the chapter in its entirety. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei

Slide 23

Slide 23

Thank you! Ricardo Ferreira @riferrei © 2022, Amazon Web Services, Inc. or its affiliates. © 2022, Amazon Web Services, Inc. or its affiliates. @riferrei