Down to the Rabbit Hole with Pulsar I/O

A presentation at Pulsar Summit NA 2021 by Ricardo Ferreira

It’s the plumbing part Source Pulsar I/O @riferrei Sink

”Everything is fine In the backend” @riferrei

Ricardo Ferreira Developer Advocate q Elastic Community Team q HashiCorp Ambassador q Learned about data i/o at: Confluent, Oracle, Red Hat q Distributed Systems, O11y, Streaming Systems, databases q riferrei@elastic.co q riferrei@riferrei.com @riferrei

Agenda • Understanding the Pulsar i/O Architecture • Installing and managing Pulsar Connectors • Troubleshooting and debugging techniques @riferrei

Understanding The Pulsar I/O Architecture @riferrei

The Architecture is like a lasagna 😋 Pulsar Connectors Pulsar Functions Functions worker @riferrei

Pulsar Functions Programming model Input Topics Topic 1 Input Messages Topic 2 Topic 3 Output Topic PulSAR Function Topic 5 Log Output Topic 4 @riferrei Output Message Log Topic

Anatomy of a source connector Input Topics Topic 1 Source Connector Input Messages Topic 2 Topic 3 Output Topic PulSAR Function Topic 5 Log Output Topic 4 @riferrei Output Message Log Topic

Anatomy of a Sink connector Input Topics Topic 1 Input Messages Topic 2 Topic 3 Sink Connector @riferrei Output Topic PulSAR Function Output Message Topic 5 Log Output Topic 4 Log Topic

Records are your unit-of-work Source Connector Sink Connector @riferrei

Functions worker is how you deploy Running along with brokers @riferrei Running in their own cluster

Functions worker is how you deploy • Running along with Brokers Ø Less clusters to manage. Better Operational simplicity. Ø No resources isolation. CPU, memory, and network is shared. • Running in their own cluster Ø Right-sized deployment as resources are exclusive. Ø More clusters to manage. Hard to operate at scale. @riferrei

Running along with Brokers 1. conf/Broker.conf @riferrei 2. conf/functions_worker.yml

Running along with Brokers Checking if worker on broker is correct: Result should be: @riferrei

Running in their own cluster 1. conf/Broker.conf @riferrei 2. conf/functions_worker.yml

Running in their own cluster Checking if functions worker is correct: Result should be: @riferrei

Fixing the admin rest requests conf/proxy.conf https://pulsar.apache.org/docs/en/administration-proxy @riferrei

Functions runtime configuration Process Runtime (Default) Thread Runtime Kubernetes Runtime Process 1 Thread 1 StatefulSet 1 Process 2 Thread 2 StatefulSet 2 Process 3 Host Machine @riferrei JVM K8S Cluster

Functions runtime configuration @riferrei Resource Specified as Runtime CPU Number of Cores Docker, K8s RAM Number of Bytes Docker, K8s Disk Number of Bytes Docker, K8S

Installing and Managing Pulsar Connectors @riferrei

The bag of Gold for Pulsar Connectors Pulsar Connectors StreamNative HUb Pulsar CLI GitHub @riferrei

Two Types of connectors 1. Built-in Connectors 2. Custom Connectors Custom Source 1 Custom Source 2 Custom Sink 1 @riferrei

StreamNative hub: home of connectors https://Hub.streamnative.io @riferrei

StreamNative hub: code and examples @riferrei

Pulsar Website: code and examples @riferrei

Getting started with Pulsar I/O • For development Ø Use the ”pulsar-all” docker image. Includes all connectors. Ø Run as a thread with the ”localrun” option from the admin cli. • For Production Ø Install the connectors on all brokers/function workers. Ø Connectors will be a list of .nar files on the ./connectors. @riferrei

Verifying your Pulsar i/o Setup Checking which source Connectors are available Result should be: @riferrei

Verifying your Pulsar i/o Setup Checking which Sink Connectors are available Result should be: @riferrei

Testing connectors with localrun: @riferrei

Troubleshooting And Debugging Techniques @riferrei

How to investigate Pulsar I/o Issues Metrics Logs Traces de e Co c r u So Proxy ca Lo @riferrei n lru Breakpoints Stats

Problem: my connector is not running
Check the connector Configuration @riferrei

Problem: my connector is not running 2. Check the Current Connector status @riferrei

Problem: my connector is not running 3. Check the status from the topic @riferrei

Problem: my connector is not running Tenant 4. Check the connector logs @riferrei Namespace Connector Name

Problem: my connector is not running 5. Debug with localrun Play with the number of Connector Threads @riferrei

Problem: Sink is not receiving any data WiretaP 🔎 🧐 MITMProxy -Dhttp.proxy = mitmproxyhost -Dhttp.port = 8080 https://mitmproxy.org @riferrei

Problem: Multiple Clusters and logs Meet the Beats 😎 Visualize Elasticsearch Parse and Enhance Logstash @riferrei Kibana

Problem: I don’t know what to do What about… When you have no idea about what is going on? @riferrei

Debug the connector’s code 1. Enable the jdwp protocol on Pulsar 2. Configure the function runtime to thread 3. Attach your ide to the JVM @riferrei 4. Set breakpoints and debug

Ricardo Ferreira
@riferrei

1 / 48

Apache Pulsar is a distributed messaging and streaming platform that stores messages durably and scalably into its persistent store, making it an attractive technology to store business data. However, merely storing the data is not enough. To make this data useful, the platform must also provide ways to ingest new data and send existing data elsewhere.

While developers can build applications for this using the client libraries, the reality is that most of them don’t want to spend time writing code for repeatable tasks such as — reading data from a database and storing it into Pulsar. Reason why Pulsar abstracts away things like this by providing a connector-based framework called Pulsar I/O.

This talk will provide an overview of how the Pulsar I/O framework works and a deep dive into troubleshooting things — from identifying when the connector is not working correctly to more elaborating investigations that may be useful for debugging purposes. It will give you the required tools to master how to ingest and export data into and out of Pulsar effectively.

Video

Resources

The following resources were mentioned during the presentation or are useful additional information.

Buzz and feedback

Here’s what was said about this presentation on social media.

Join @riferrei from @elastic at 12:15 PM PT on #PulsarSummit! He will present "Down to the Rabbit Hole with Pulsar I/O"share the required tools to master how to ingest and export data into and out of #ApachePulsar effectively. Join Pulsar Summit ASAP: https://t.co/gEpons6DaF pic.twitter.com/cRglL78bHy
— Pulsar Summit (@PulsarSummit) June 16, 2021
Getting ready for my talk on @PulsarSummit about Pulsar I/O pic.twitter.com/ucTB4KFvC7
— Ricardo Ferreira (@riferrei) June 16, 2021

Down to the Rabbit Hole with Pulsar I/O

Link for this presentation:

HTML code for embedding:

Share on social media:

Video

Resources

Download Pulsar I/O connectors

StreamNative Hub Website

Download Elasticsearch

Download/Install MitMProxy

Download Elastic Beats

Creating Proxy Clusters Docs

Buzz and feedback