Down to the Rabbit Hole with Pulsar I/O

A presentation at Pulsar Summit NA 2021 in June 2021 in by Ricardo Ferreira

Slide 1

Slide 1

@riferrei

Slide 2

Slide 2

What is Pulsar I/O? @riferrei

Slide 3

Slide 3

Apac @riferrei he P u ls ar

Slide 4

Slide 4

It’s the plumbing part Source Pulsar I/O @riferrei Sink

Slide 5

Slide 5

”Everything is fine In the backend” @riferrei

Slide 6

Slide 6

@riferrei

Slide 7

Slide 7

Ricardo Ferreira Developer Advocate q Elastic Community Team q HashiCorp Ambassador q Learned about data i/o at: Confluent, Oracle, Red Hat q Distributed Systems, O11y, Streaming Systems, databases q riferrei@elastic.co q riferrei@riferrei.com @riferrei

Slide 8

Slide 8

Agenda • Understanding the Pulsar i/O Architecture • Installing and managing Pulsar Connectors • Troubleshooting and debugging techniques @riferrei

Slide 9

Slide 9

Understanding The Pulsar I/O Architecture @riferrei

Slide 10

Slide 10

The Architecture is like a lasagna 😋 Pulsar Connectors Pulsar Functions Functions worker @riferrei

Slide 11

Slide 11

Pulsar Functions Programming model Input Topics Topic 1 Input Messages Topic 2 Topic 3 Output Topic PulSAR Function Topic 5 Log Output Topic 4 @riferrei Output Message Log Topic

Slide 12

Slide 12

Anatomy of a source connector Input Topics Topic 1 Source Connector Input Messages Topic 2 Topic 3 Output Topic PulSAR Function Topic 5 Log Output Topic 4 @riferrei Output Message Log Topic

Slide 13

Slide 13

Anatomy of a source connector @riferrei

Slide 14

Slide 14

Anatomy of a Sink connector Input Topics Topic 1 Input Messages Topic 2 Topic 3 Sink Connector @riferrei Output Topic PulSAR Function Output Message Topic 5 Log Output Topic 4 Log Topic

Slide 15

Slide 15

Anatomy of a Sink connector @riferrei

Slide 16

Slide 16

Records are your unit-of-work Source Connector Sink Connector @riferrei

Slide 17

Slide 17

Functions worker is how you deploy Running along with brokers @riferrei Running in their own cluster

Slide 18

Slide 18

Functions worker is how you deploy • Running along with Brokers Ø Less clusters to manage. Better Operational simplicity. Ø No resources isolation. CPU, memory, and network is shared. • Running in their own cluster Ø Right-sized deployment as resources are exclusive. Ø More clusters to manage. Hard to operate at scale. @riferrei

Slide 19

Slide 19

Running along with Brokers 1. conf/Broker.conf @riferrei 2. conf/functions_worker.yml

Slide 20

Slide 20

Running along with Brokers Checking if worker on broker is correct: Result should be: @riferrei

Slide 21

Slide 21

Running in their own cluster 1. conf/Broker.conf @riferrei 2. conf/functions_worker.yml

Slide 22

Slide 22

Running in their own cluster Checking if functions worker is correct: Result should be: @riferrei

Slide 23

Slide 23

Fixing the admin rest requests conf/proxy.conf https://pulsar.apache.org/docs/en/administration-proxy @riferrei

Slide 24

Slide 24

Functions runtime configuration Process Runtime (Default) Thread Runtime Kubernetes Runtime Process 1 Thread 1 StatefulSet 1 Process 2 Thread 2 StatefulSet 2 Process 3 Host Machine @riferrei JVM K8S Cluster

Slide 25

Slide 25

Functions runtime configuration @riferrei Resource Specified as Runtime CPU Number of Cores Docker, K8s RAM Number of Bytes Docker, K8s Disk Number of Bytes Docker, K8S

Slide 26

Slide 26

Installing and Managing Pulsar Connectors @riferrei

Slide 27

Slide 27

The bag of Gold for Pulsar Connectors Pulsar Connectors StreamNative HUb Pulsar CLI GitHub @riferrei

Slide 28

Slide 28

Two Types of connectors 1. Built-in Connectors 2. Custom Connectors Custom Source 1 Custom Source 2 Custom Sink 1 @riferrei

Slide 29

Slide 29

StreamNative hub: home of connectors https://Hub.streamnative.io @riferrei

Slide 30

Slide 30

StreamNative hub: code and examples @riferrei

Slide 31

Slide 31

Pulsar Website: code and examples @riferrei

Slide 32

Slide 32

Getting started with Pulsar I/O • For development Ø Use the ”pulsar-all” docker image. Includes all connectors. Ø Run as a thread with the ”localrun” option from the admin cli. • For Production Ø Install the connectors on all brokers/function workers. Ø Connectors will be a list of .nar files on the ./connectors. @riferrei

Slide 33

Slide 33

Verifying your Pulsar i/o Setup Checking which source Connectors are available Result should be: @riferrei

Slide 34

Slide 34

Verifying your Pulsar i/o Setup Checking which Sink Connectors are available Result should be: @riferrei

Slide 35

Slide 35

Testing connectors with localrun: @riferrei

Slide 36

Slide 36

Troubleshooting And Debugging Techniques @riferrei

Slide 37

Slide 37

How to investigate Pulsar I/o Issues Metrics Logs Traces de e Co c r u So Proxy ca Lo @riferrei n lru Breakpoints Stats

Slide 38

Slide 38

Problem: my connector is not running

  1. Check the connector Configuration @riferrei

Slide 39

Slide 39

Problem: my connector is not running 2. Check the Current Connector status @riferrei

Slide 40

Slide 40

Problem: my connector is not running 3. Check the status from the topic @riferrei

Slide 41

Slide 41

Problem: my connector is not running Tenant 4. Check the connector logs @riferrei Namespace Connector Name

Slide 42

Slide 42

Problem: my connector is not running 5. Debug with localrun Play with the number of Connector Threads @riferrei

Slide 43

Slide 43

Problem: Sink is not receiving any data WiretaP 🔎 🧐 MITMProxy -Dhttp.proxy = mitmproxyhost -Dhttp.port = 8080 https://mitmproxy.org @riferrei

Slide 44

Slide 44

Problem: Multiple Clusters and logs Meet the Beats 😎 Visualize Elasticsearch Parse and Enhance Logstash @riferrei Kibana

Slide 45

Slide 45

Problem: I don’t know what to do What about… When you have no idea about what is going on? @riferrei

Slide 46

Slide 46

Debug the connector’s code 1. Enable the jdwp protocol on Pulsar 2. Configure the function runtime to thread 3. Attach your ide to the JVM @riferrei 4. Set breakpoints and debug

Slide 47

Slide 47

@riferrei

Slide 48

Slide 48

THANK YOU 🙂 @riferrei