1659714540
ruby-kafka
A Ruby client library for Apache Kafka, a distributed log and message bus. The focus of this library will be operational simplicity, with good logging and metrics that can make debugging issues easier.
Add this line to your application's Gemfile:
gem 'ruby-kafka'
And then execute:
$ bundle
Or install it yourself as:
$ gem install ruby-kafka
Producer API | Consumer API | |
---|---|---|
Kafka 0.8 | Full support in v0.4.x | Unsupported |
Kafka 0.9 | Full support in v0.4.x | Full support in v0.4.x |
Kafka 0.10 | Full support in v0.5.x | Full support in v0.5.x |
Kafka 0.11 | Full support in v0.7.x | Limited support |
Kafka 1.0 | Limited support | Limited support |
Kafka 2.0 | Limited support | Limited support |
Kafka 2.1 | Limited support | Limited support |
Kafka 2.2 | Limited support | Limited support |
Kafka 2.3 | Limited support | Limited support |
Kafka 2.4 | Limited support | Limited support |
Kafka 2.5 | Limited support | Limited support |
Kafka 2.6 | Limited support | Limited support |
Kafka 2.7 | Limited support | Limited support |
This library is targeting Kafka 0.9 with the v0.4.x series and Kafka 0.10 with the v0.5.x series. There's limited support for Kafka 0.8, and things should work with Kafka 0.11, although there may be performance issues due to changes in the protocol.
This library requires Ruby 2.1 or higher.
Please see the documentation site for detailed documentation on the latest release. Note that the documentation on GitHub may not match the version of the library you're using – there are still being made many changes to the API.
A client must be initialized with at least one Kafka broker, from which the entire Kafka cluster will be discovered. Each client keeps a separate pool of broker connections. Don't use the same client from more than one thread.
require "kafka"
# The first argument is a list of "seed brokers" that will be queried for the full
# cluster topology. At least one of these *must* be available. `client_id` is
# used to identify this client in logs and metrics. It's optional but recommended.
kafka = Kafka.new(["kafka1:9092", "kafka2:9092"], client_id: "my-application")
You can also use a hostname with seed brokers' IP addresses:
kafka = Kafka.new("seed-brokers:9092", client_id: "my-application", resolve_seed_brokers: true)
The simplest way to write a message to a Kafka topic is to call #deliver_message
:
kafka = Kafka.new(...)
kafka.deliver_message("Hello, World!", topic: "greetings")
This will write the message to a random partition in the greetings
topic. If you want to write to a specific partition, pass the partition
parameter:
# Will write to partition 42.
kafka.deliver_message("Hello, World!", topic: "greetings", partition: 42)
If you don't know exactly how many partitions are in the topic, or if you'd rather have some level of indirection, you can pass in partition_key
instead. Two messages with the same partition key will always be assigned to the same partition. This is useful if you want to make sure all messages with a given attribute are always written to the same partition, e.g. all purchase events for a given customer id.
# Partition keys assign a partition deterministically.
kafka.deliver_message("Hello, World!", topic: "greetings", partition_key: "hello")
Kafka also supports message keys. When passed, a message key can be used instead of a partition key. The message key is written alongside the message value and can be read by consumers. Message keys in Kafka can be used for interesting things such as Log Compaction. See Partitioning for more information.
# Set a message key; the key will be used for partitioning since no explicit
# `partition_key` is set.
kafka.deliver_message("Hello, World!", key: "hello", topic: "greetings")
While #deliver_message
works fine for infrequent writes, there are a number of downsides:
The Producer API solves all these problems and more:
# Instantiate a new producer.
producer = kafka.producer
# Add a message to the producer buffer.
producer.produce("hello1", topic: "test-messages")
# Deliver the messages to Kafka.
producer.deliver_messages
#produce
will buffer the message in the producer but will not actually send it to the Kafka cluster. Buffered messages are only delivered to the Kafka cluster once #deliver_messages
is called. Since messages may be destined for different partitions, this could involve writing to more than one Kafka broker. Note that a failure to send all buffered messages after the configured number of retries will result in Kafka::DeliveryFailed
being raised. This can be rescued and ignored; the messages will be kept in the buffer until the next attempt.
Read the docs for Kafka::Producer for more details.
A normal producer will block while #deliver_messages
is sending messages to Kafka, possibly for tens of seconds or even minutes at a time, depending on your timeout and retry settings. Furthermore, you have to call #deliver_messages
manually, with a frequency that balances batch size with message delay.
In order to avoid blocking during message deliveries you can use the asynchronous producer API. It is mostly similar to the synchronous API, with calls to #produce
and #deliver_messages
. The main difference is that rather than blocking, these calls will return immediately. The actual work will be done in a background thread, with the messages and operations being sent from the caller over a thread safe queue.
# `#async_producer` will create a new asynchronous producer.
producer = kafka.async_producer
# The `#produce` API works as normal.
producer.produce("hello", topic: "greetings")
# `#deliver_messages` will return immediately.
producer.deliver_messages
# Make sure to call `#shutdown` on the producer in order to avoid leaking
# resources. `#shutdown` will wait for any pending messages to be delivered
# before returning.
producer.shutdown
By default, the delivery policy will be the same as for a synchronous producer: only when #deliver_messages
is called will the messages be delivered. However, the asynchronous producer offers two complementary policies for automatic delivery:
These policies can be used alone or in combination.
# `async_producer` will create a new asynchronous producer.
producer = kafka.async_producer(
# Trigger a delivery once 100 messages have been buffered.
delivery_threshold: 100,
# Trigger a delivery every 30 seconds.
delivery_interval: 30,
)
producer.produce("hello", topic: "greetings")
# ...
When calling #shutdown
, the producer will attempt to deliver the messages and the method call will block until that has happened. Note that there's no guarantee that the messages will be delivered.
Note: if the calling thread produces messages faster than the producer can write them to Kafka, you'll eventually run into problems. The internal queue used for sending messages from the calling thread to the background worker has a size limit; once this limit is reached, a call to #produce
will raise Kafka::BufferOverflow
.
This library is agnostic to which serialization format you prefer. Both the value and key of a message is treated as a binary string of data. This makes it easier to use whatever serialization format you want, since you don't have to do anything special to make it work with ruby-kafka. Here's an example of encoding data with JSON:
require "json"
# ...
event = {
"name" => "pageview",
"url" => "https://example.com/posts/123",
# ...
}
data = JSON.dump(event)
producer.produce(data, topic: "events")
There's also an example of encoding messages with Apache Avro.
Kafka topics are partitioned, with messages being assigned to a partition by the client. This allows a great deal of flexibility for the users. This section describes several strategies for partitioning and how they impact performance, data locality, etc.
Load Balanced Partitioning
When optimizing for efficiency, we either distribute messages as evenly as possible to all partitions, or make sure each producer always writes to a single partition. The former ensures an even load for downstream consumers; the latter ensures the highest producer performance, since message batching is done per partition.
If no explicit partition is specified, the producer will look to the partition key or the message key for a value that can be used to deterministically assign the message to a partition. If there is a big number of different keys, the resulting distribution will be pretty even. If no keys are passed, the producer will randomly assign a partition. Random partitioning can be achieved even if you use message keys by passing a random partition key, e.g. partition_key: rand(100)
.
If you wish to have the producer write all messages to a single partition, simply generate a random value and re-use that as the partition key:
partition_key = rand(100)
producer.produce(msg1, topic: "messages", partition_key: partition_key)
producer.produce(msg2, topic: "messages", partition_key: partition_key)
# ...
You can also base the partition key on some property of the producer, for example the host name.
Semantic Partitioning
By assigning messages to a partition based on some property of the message, e.g. making sure all events tracked in a user session are assigned to the same partition, downstream consumers can make simplifying assumptions about data locality. In this example, a consumer can keep process local state pertaining to a user session knowing that all events for the session will be read from a single partition. This is also called semantic partitioning, since the partition assignment is part of the application behavior.
Typically it's sufficient to simply pass a partition key in order to guarantee that a set of messages will be assigned to the same partition, e.g.
# All messages with the same `session_id` will be assigned to the same partition.
producer.produce(event, topic: "user-events", partition_key: session_id)
However, sometimes it's necessary to select a specific partition. When doing this, make sure that you don't pick a partition number outside the range of partitions for the topic:
partitions = kafka.partitions_for("events")
# Make sure that we don't exceed the partition count!
partition = some_number % partitions
producer.produce(event, topic: "events", partition: partition)
Compatibility with Other Clients
There's no standardized way to assign messages to partitions across different Kafka client implementations. If you have a heterogeneous set of clients producing messages to the same topics it may be important to ensure a consistent partitioning scheme. This library doesn't try to implement all schemes, so you'll have to figure out which scheme the other client is using and replicate it. An example:
partitions = kafka.partitions_for("events")
# Insert your custom partitioning scheme here:
partition = PartitioningScheme.assign(partitions, event)
producer.produce(event, topic: "events", partition: partition)
Another option is to configure a custom client partitioner that implements call(partition_count, message)
and uses the same schema as the other client. For example:
class CustomPartitioner
def call(partition_count, message)
...
end
end
partitioner = CustomPartitioner.new
Kafka.new(partitioner: partitioner, ...)
Or, simply create a Proc handling the partitioning logic instead of having to add a new class. For example:
partitioner = -> (partition_count, message) { ... }
Kafka.new(partitioner: partitioner, ...)
Supported partitioning schemes
In order for semantic partitioning to work a partition_key
must map to the same partition number every time. The general approach, and the one used by this library, is to hash the key and mod it by the number of partitions. There are many different algorithms that can be used to calculate a hash. By default crc32
is used. murmur2
is also supported for compatibility with Java based Kafka producers.
To use murmur2
hashing pass it as an argument to Partitioner
. For example:
Kafka.new(partitioner: Kafka::Partitioner.new(hash_function: :murmur2))
The producer is designed for resilience in the face of temporary network errors, Kafka broker failovers, and other issues that prevent the client from writing messages to the destination topics. It does this by employing local, in-memory buffers. Only when messages are acknowledged by a Kafka broker will they be removed from the buffer.
Typically, you'd configure the producer to retry failed attempts at sending messages, but sometimes all retries are exhausted. In that case, Kafka::DeliveryFailed
is raised from Kafka::Producer#deliver_messages
. If you wish to have your application be resilient to this happening (e.g. if you're logging to Kafka from a web application) you can rescue this exception. The failed messages are still retained in the buffer, so a subsequent call to #deliver_messages
will still attempt to send them.
Note that there's a maximum buffer size; by default, it's set to 1,000 messages and 10MB. It's possible to configure both these numbers:
producer = kafka.producer(
max_buffer_size: 5_000, # Allow at most 5K messages to be buffered.
max_buffer_bytesize: 100_000_000, # Allow at most 100MB to be buffered.
...
)
A final note on buffers: local buffers give resilience against broker and network failures, and allow higher throughput due to message batching, but they also trade off consistency guarantees for higher availability and resilience. If your local process dies while messages are buffered, those messages will be lost. If you require high levels of consistency, you should call #deliver_messages
immediately after #produce
.
Once the client has delivered a set of messages to a Kafka broker the broker will forward them to its replicas, thus ensuring that a single broker failure will not result in message loss. However, the client can choose when the leader acknowledges the write. At one extreme, the client can choose fire-and-forget delivery, not even bothering to check whether the messages have been acknowledged. At the other end, the client can ask the broker to wait until all its replicas have acknowledged the write before returning. This is the safest option, and the default. It's also possible to have the broker return as soon as it has written the messages to its own log but before the replicas have done so. This leaves a window of time where a failure of the leader will result in the messages being lost, although this should not be a common occurrence.
Write latency and throughput are negatively impacted by having more replicas acknowledge a write, so if you require low-latency, high throughput writes you may want to accept lower durability.
This behavior is controlled by the required_acks
option to #producer
and #async_producer
:
# This is the default: all replicas must acknowledge.
producer = kafka.producer(required_acks: :all)
# This is fire-and-forget: messages can easily be lost.
producer = kafka.producer(required_acks: 0)
# This only waits for the leader to acknowledge.
producer = kafka.producer(required_acks: 1)
Unless you absolutely need lower latency it's highly recommended to use the default setting (:all
).
There are basically two different and incompatible guarantees that can be made in a message delivery system such as Kafka:
Of these two options, ruby-kafka implements the second one: when in doubt about whether a message has been delivered, a producer will try to deliver it again.
The guarantee is made only for the synchronous producer and boils down to this:
producer = kafka.producer
producer.produce("hello", topic: "greetings")
# If this line fails with Kafka::DeliveryFailed we *may* have succeeded in delivering
# the message to Kafka but won't know for sure.
producer.deliver_messages
# If we get to this line we can be sure that the message has been delivered to Kafka!
That is, once #deliver_messages
returns we can be sure that Kafka has received the message. Note that there are some big caveats here:
required_acks
to zero there is no guarantee that the message will ever make it to a Kafka broker.#deliver_messages
returns. A way of blocking until a message has been delivered with the asynchronous producer may be implemented in the future.It's possible to improve your chances of success when calling #deliver_messages
, at the price of a longer max latency:
producer = kafka.producer(
# The number of retries when attempting to deliver messages. The default is
# 2, so 3 attempts in total, but you can configure a higher or lower number:
max_retries: 5,
# The number of seconds to wait between retries. In order to handle longer
# periods of Kafka being unavailable, increase this number. The default is
# 1 second.
retry_backoff: 5,
)
Note that these values affect the max latency of the operation; see Understanding Timeouts for an explanation of the various timeouts and latencies.
If you use the asynchronous producer you typically don't have to worry too much about this, as retries will be done in the background.
Depending on what kind of data you produce, enabling compression may yield improved bandwidth and space usage. Compression in Kafka is done on entire messages sets rather than on individual messages. This improves the compression rate and generally means that compressions works better the larger your buffers get, since the message sets will be larger by the time they're compressed.
Since many workloads have variations in throughput and distribution across partitions, it's possible to configure a threshold for when to enable compression by setting compression_threshold
. Only if the defined number of messages are buffered for a partition will the messages be compressed.
Compression is enabled by passing the compression_codec
parameter to #producer
with the name of one of the algorithms allowed by Kafka:
:snappy
for Snappy compression.:gzip
for gzip compression.:lz4
for LZ4 compression.:zstd
for zstd compression.By default, all message sets will be compressed if you specify a compression codec. To increase the compression threshold, set compression_threshold
to an integer value higher than one.
producer = kafka.producer(
compression_codec: :snappy,
compression_threshold: 10,
)
A typical use case for Kafka is tracking events that occur in web applications. Oftentimes it's advisable to avoid having a hard dependency on Kafka being available, allowing your application to survive a Kafka outage. By using an asynchronous producer, you can avoid doing IO within the individual request/response cycles, instead pushing that to the producer's internal background thread.
In this example, a producer is configured in a Rails initializer:
# config/initializers/kafka_producer.rb
require "kafka"
# Configure the Kafka client with the broker hosts and the Rails
# logger.
$kafka = Kafka.new(["kafka1:9092", "kafka2:9092"], logger: Rails.logger)
# Set up an asynchronous producer that delivers its buffered messages
# every ten seconds:
$kafka_producer = $kafka.async_producer(
delivery_interval: 10,
)
# Make sure to shut down the producer when exiting.
at_exit { $kafka_producer.shutdown }
In your controllers, simply call the producer directly:
# app/controllers/orders_controller.rb
class OrdersController
def create
@order = Order.create!(params[:order])
event = {
order_id: @order.id,
amount: @order.amount,
timestamp: Time.now,
}
$kafka_producer.produce(event.to_json, topic: "order_events")
end
end
Note: If you're just looking to get started with Kafka consumers, you might be interested in visiting the Higher level libraries section that lists ruby-kafka based frameworks. Read on, if you're interested in either rolling your own executable consumers or if you want to learn more about how consumers work in Kafka.
Consuming messages from a Kafka topic with ruby-kafka is simple:
require "kafka"
kafka = Kafka.new(["kafka1:9092", "kafka2:9092"])
kafka.each_message(topic: "greetings") do |message|
puts message.offset, message.key, message.value
end
While this is great for extremely simple use cases, there are a number of downsides:
The Consumer API solves all of the above issues, and more. It uses the Consumer Groups feature released in Kafka 0.9 to allow multiple consumer processes to coordinate access to a topic, assigning each partition to a single consumer. When a consumer fails, the partitions that were assigned to it are re-assigned to other members of the group.
Using the API is simple:
require "kafka"
kafka = Kafka.new(["kafka1:9092", "kafka2:9092"])
# Consumers with the same group id will form a Consumer Group together.
consumer = kafka.consumer(group_id: "my-consumer")
# It's possible to subscribe to multiple topics by calling `subscribe`
# repeatedly.
consumer.subscribe("greetings")
# Stop the consumer when the SIGTERM signal is sent to the process.
# It's better to shut down gracefully than to kill the process.
trap("TERM") { consumer.stop }
# This will loop indefinitely, yielding each message in turn.
consumer.each_message do |message|
puts message.topic, message.partition
puts message.offset, message.key, message.value
end
Each consumer process will be assigned one or more partitions from each topic that the group subscribes to. In order to handle more messages, simply start more processes.
In order to be able to resume processing after a consumer crashes, each consumer will periodically checkpoint its position within each partition it reads from. Since each partition has a monotonically increasing sequence of message offsets, this works by committing the offset of the last message that was processed in a given partition. Kafka handles these commits and allows another consumer in a group to resume from the last commit when a member crashes or becomes unresponsive.
By default, offsets are committed every 10 seconds. You can increase the frequency, known as the offset commit interval, to limit the duration of double-processing scenarios, at the cost of a lower throughput due to the added coordination. If you want to improve throughput, and double-processing is of less concern to you, then you can decrease the frequency. Set the commit interval to zero in order to disable the timer-based commit trigger entirely.
In addition to the time based trigger it's possible to trigger checkpointing in response to n messages having been processed, known as the offset commit threshold. This puts a bound on the number of messages that can be double-processed before the problem is detected. Setting this to 1 will cause an offset commit to take place every time a message has been processed. By default this trigger is disabled (set to zero).
It is possible to trigger an immediate offset commit by calling Consumer#commit_offsets
. This blocks the caller until the Kafka cluster has acknowledged the commit.
Stale offsets are periodically purged by the broker. The broker setting offsets.retention.minutes
controls the retention window for committed offsets, and defaults to 1 day. The length of the retention window, known as offset retention time, can be changed for the consumer.
Previously committed offsets are re-committed, to reset the retention window, at the first commit and periodically at an interval of half the offset retention time.
consumer = kafka.consumer(
group_id: "some-group",
# Increase offset commit frequency to once every 5 seconds.
offset_commit_interval: 5,
# Commit offsets when 100 messages have been processed.
offset_commit_threshold: 100,
# Increase the length of time that committed offsets are kept.
offset_retention_time: 7 * 60 * 60
)
For some use cases it may be necessary to control when messages are marked as processed. Note that since only the consumer position within each partition can be saved, marking a message as processed implies that all messages in the partition with a lower offset should also be considered as having been processed.
The method Consumer#mark_message_as_processed
marks a message (and all those that precede it in a partition) as having been processed. This is an advanced API that you should only use if you know what you're doing.
# Manually controlling checkpointing:
# Typically you want to use this API in order to buffer messages until some
# special "commit" message is received, e.g. in order to group together
# transactions consisting of several items.
buffer = []
# Messages will not be marked as processed automatically. If you shut down the
# consumer without calling `#mark_message_as_processed` first, the consumer will
# not resume where you left off!
consumer.each_message(automatically_mark_as_processed: false) do |message|
# Our messages are JSON with a `type` field and other stuff.
event = JSON.parse(message.value)
case event.fetch("type")
when "add_to_cart"
buffer << event
when "complete_purchase"
# We've received all the messages we need, time to save the transaction.
save_transaction(buffer)
# Now we can set the checkpoint by marking the last message as processed.
consumer.mark_message_as_processed(message)
# We can optionally trigger an immediate, blocking offset commit in order
# to minimize the risk of crashing before the automatic triggers have
# kicked in.
consumer.commit_offsets
# Make the buffer ready for the next transaction.
buffer.clear
end
end
For each topic subscription it's possible to decide whether to consume messages starting at the beginning of the topic or to just consume new messages that are produced to the topic. This policy is configured by setting the start_from_beginning
argument when calling #subscribe
:
# Consume messages from the very beginning of the topic. This is the default.
consumer.subscribe("users", start_from_beginning: true)
# Only consume new messages.
consumer.subscribe("notifications", start_from_beginning: false)
Once the consumer group has checkpointed its progress in the topic's partitions, the consumers will always start from the checkpointed offsets, regardless of start_from_beginning
. As such, this setting only applies when the consumer initially starts consuming from a topic.
In order to shut down a running consumer process cleanly, call #stop
on it. A common pattern is to trap a process signal and initiate the shutdown from there:
consumer = kafka.consumer(...)
# The consumer can be stopped from the command line by executing
# `kill -s TERM <process-id>`.
trap("TERM") { consumer.stop }
consumer.each_message do |message|
...
end
Sometimes it is easier to deal with messages in batches rather than individually. A batch is a sequence of one or more Kafka messages that all belong to the same topic and partition. One common reason to want to use batches is when some external system has a batch or transactional API.
# A mock search index that we'll be keeping up to date with new Kafka messages.
index = SearchIndex.new
consumer.subscribe("posts")
consumer.each_batch do |batch|
puts "Received batch: #{batch.topic}/#{batch.partition}"
transaction = index.transaction
batch.messages.each do |message|
# Let's assume that adding a document is idempotent.
transaction.add(id: message.key, body: message.value)
end
# Once this method returns, the messages have been successfully written to the
# search index. The consumer will only checkpoint a batch *after* the block
# has completed without an exception.
transaction.commit!
end
One important thing to note is that the client commits the offset of the batch's messages only after the entire batch has been processed.
There are two performance properties that can at times be at odds: throughput and latency. Throughput is the number of messages that can be processed in a given timespan; latency is the time it takes from a message is written to a topic until it has been processed.
In order to optimize for throughput, you want to make sure to fetch as many messages as possible every time you do a round trip to the Kafka cluster. This minimizes network overhead and allows processing data in big chunks.
In order to optimize for low latency, you want to process a message as soon as possible, even if that means fetching a smaller batch of messages.
There are three values that can be tuned in order to balance these two concerns.
min_bytes
is the minimum number of bytes to return from a single message fetch. By setting this to a high value you can increase the processing throughput. The default value is one byte.max_wait_time
is the maximum number of seconds to wait before returning data from a single message fetch. By setting this high you also increase the processing throughput – and by setting it low you set a bound on latency. This configuration overrides min_bytes
, so you'll always get data back within the time specified. The default value is one second. If you want to have at most five seconds of latency, set max_wait_time
to 5. You should make sure max_wait_time
* num brokers + heartbeat_interval
is less than session_timeout
.max_bytes_per_partition
is the maximum amount of data a broker will return for a single partition when fetching new messages. The default is 1MB, but increasing this number may lead to better throughtput since you'll need to fetch less frequently. Setting it to a lower value is not recommended unless you have so many partitions that it's causing network and latency issues to transfer a fetch response from a broker to a client. Setting the number too high may result in instability, so be careful.The first two settings can be passed to either #each_message
or #each_batch
, e.g.
# Waits for data for up to 5 seconds on each broker, preferring to fetch at least 5KB at a time.
# This can wait up to num brokers * 5 seconds.
consumer.each_message(min_bytes: 1024 * 5, max_wait_time: 5) do |message|
# ...
end
The last setting is configured when subscribing to a topic, and can vary between topics:
# Fetches up to 5MB per partition at a time for better throughput.
consumer.subscribe("greetings", max_bytes_per_partition: 5 * 1024 * 1024)
consumer.each_message do |message|
# ...
end
In some cases, you might want to assign more partitions to some consumers. For example, in applications inserting some records to a database, the consumers running on hosts nearby the database can process more messages than the consumers running on other hosts. You can use a custom assignment strategy by passing an object that implements #call
as the argument assignment_strategy
like below:
class CustomAssignmentStrategy
def initialize(user_data)
@user_data = user_data
end
# Assign the topic partitions to the group members.
#
# @param cluster [Kafka::Cluster]
# @param members [Hash<String, Kafka::Protocol::JoinGroupResponse::Metadata>] a hash
# mapping member ids to metadata
# @param partitions [Array<Kafka::ConsumerGroup::Assignor::Partition>] a list of
# partitions the consumer group processes
# @return [Hash<String, Array<Kafka::ConsumerGroup::Assignor::Partition>] a hash
# mapping member ids to partitions.
def call(cluster:, members:, partitions:)
...
end
end
strategy = CustomAssignmentStrategy.new("some-host-information")
consumer = kafka.consumer(group_id: "some-group", assignment_strategy: strategy)
members
is a hash mapping member IDs to metadata, and partitions is a list of partitions the consumer group processes. The method call
must return a hash mapping member IDs to partitions. For example, the following strategy assigns partitions randomly:
class RandomAssignmentStrategy
def call(cluster:, members:, partitions:)
member_ids = members.keys
partitions.each_with_object(Hash.new {|h, k| h[k] = [] }) do |partition, partitions_per_member|
partitions_per_member[member_ids[rand(member_ids.count)]] << partition
end
end
end
If the strategy needs user data, you should define the method user_data
that returns user data on each consumer. For example, the following strategy uses the consumers' IP addresses as user data:
class NetworkTopologyAssignmentStrategy
def user_data
Socket.ip_address_list.find(&:ipv4_private?).ip_address
end
def call(cluster:, members:, partitions:)
# Display the pair of the member ID and IP address
members.each do |id, metadata|
puts "#{id}: #{metadata.user_data}"
end
# Assign partitions considering the network topology
...
end
end
Note that the strategy uses the class name as the default protocol name. You can change it by defining the method protocol_name
:
class NetworkTopologyAssignmentStrategy
def protocol_name
"networktopology"
end
def user_data
Socket.ip_address_list.find(&:ipv4_private?).ip_address
end
def call(cluster:, members:, partitions:)
...
end
end
As the method call
might receive different user data from what it expects, you should avoid using the same protocol name as another strategy that uses different user data.
You typically don't want to share a Kafka client object between threads, since the network communication is not synchronized. Furthermore, you should avoid using threads in a consumer unless you're very careful about waiting for all work to complete before returning from the #each_message
or #each_batch
block. This is because checkpointing assumes that returning from the block means that the messages that have been yielded have been successfully processed.
You should also avoid sharing a synchronous producer between threads, as the internal buffers are not thread safe. However, the asynchronous producer should be safe to use in a multi-threaded environment. This is because producers, when instantiated, get their own copy of any non-thread-safe data such as network sockets. Furthermore, the asynchronous producer has been designed in such a way to only a single background thread operates on this data while any foreground thread with a reference to the producer object can only send messages to that background thread over a safe queue. Therefore it is safe to share an async producer object between many threads.
It's a very good idea to configure the Kafka client with a logger. All important operations and errors are logged. When instantiating your client, simply pass in a valid logger:
logger = Logger.new("log/kafka.log")
kafka = Kafka.new(logger: logger, ...)
By default, nothing is logged.
Most operations are instrumented using Active Support Notifications. In order to subscribe to notifications, make sure to require the notifications library:
require "active_support/notifications"
require "kafka"
The notifications are namespaced based on their origin, with separate namespaces for the producer and the consumer.
In order to receive notifications you can either subscribe to individual notification names or use regular expressions to subscribe to entire namespaces. This example will subscribe to all notifications sent by ruby-kafka:
ActiveSupport::Notifications.subscribe(/.*\.kafka$/) do |*args|
event = ActiveSupport::Notifications::Event.new(*args)
puts "Received notification `#{event.name}` with payload: #{event.payload.inspect}"
end
All notification events have the client_id
key in the payload, referring to the Kafka client id.
produce_message.producer.kafka
is sent whenever a message is produced to a buffer. It includes the following payload:
value
is the message value.key
is the message key.topic
is the topic that the message was produced to.buffer_size
is the size of the producer buffer after adding the message.max_buffer_size
is the maximum size of the producer buffer.deliver_messages.producer.kafka
is sent whenever a producer attempts to deliver its buffered messages to the Kafka brokers. It includes the following payload:
attempts
is the number of times delivery was attempted.message_count
is the number of messages for which delivery was attempted.delivered_message_count
is the number of messages that were acknowledged by the brokers - if this number is smaller than message_count
not all messages were successfully delivered.All notifications have group_id
in the payload, referring to the Kafka consumer group id.
process_message.consumer.kafka
is sent whenever a message is processed by a consumer. It includes the following payload:
value
is the message value.key
is the message key.topic
is the topic that the message was consumed from.partition
is the topic partition that the message was consumed from.offset
is the message's offset within the topic partition.offset_lag
is the number of messages within the topic partition that have not yet been consumed.start_process_message.consumer.kafka
is sent before process_message.consumer.kafka
, and contains the same payload. It is delivered before the message is processed, rather than after.
process_batch.consumer.kafka
is sent whenever a message batch is processed by a consumer. It includes the following payload:
message_count
is the number of messages in the batch.topic
is the topic that the message batch was consumed from.partition
is the topic partition that the message batch was consumed from.highwater_mark_offset
is the message batch's highest offset within the topic partition.offset_lag
is the number of messages within the topic partition that have not yet been consumed.start_process_batch.consumer.kafka
is sent before process_batch.consumer.kafka
, and contains the same payload. It is delivered before the batch is processed, rather than after.
join_group.consumer.kafka
is sent whenever a consumer joins a consumer group. It includes the following payload:
group_id
is the consumer group id.sync_group.consumer.kafka
is sent whenever a consumer is assigned topic partitions within a consumer group. It includes the following payload:
group_id
is the consumer group id.leave_group.consumer.kafka
is sent whenever a consumer leaves a consumer group. It includes the following payload:
group_id
is the consumer group id.seek.consumer.kafka
is sent when a consumer first seeks to an offset. It includes the following payload:
group_id
is the consumer group id.topic
is the topic we are seeking in.partition
is the partition we are seeking in.offset
is the offset we have seeked to.heartbeat.consumer.kafka
is sent when a consumer group completes a heartbeat. It includes the following payload:
group_id
is the consumer group id.topic_partitions
is a hash of { topic_name => array of assigned partition IDs }request.connection.kafka
is sent whenever a network request is sent to a Kafka broker. It includes the following payload:api
is the name of the API that was called, e.g. produce
or fetch
.request_size
is the number of bytes in the request.response_size
is the number of bytes in the response.It is highly recommended that you monitor your Kafka client applications in production. Typical problems you'll see are:
You can quite easily build monitoring on top of the provided instrumentation hooks. In order to further help with monitoring, a prebuilt Statsd and Datadog reporter is included with ruby-kafka.
We recommend monitoring the following:
The Statsd reporter is automatically enabled when the kafka/statsd
library is required. You can optionally change the configuration.
require "kafka/statsd"
# Default is "ruby_kafka".
Kafka::Statsd.namespace = "custom-namespace"
# Default is "127.0.0.1".
Kafka::Statsd.host = "statsd.something.com"
# Default is 8125.
Kafka::Statsd.port = 1234
The Datadog reporter is automatically enabled when the kafka/datadog
library is required. You can optionally change the configuration.
# This enables the reporter:
require "kafka/datadog"
# Default is "ruby_kafka".
Kafka::Datadog.namespace = "custom-namespace"
# Default is "127.0.0.1".
Kafka::Datadog.host = "statsd.something.com"
# Default is 8125.
Kafka::Datadog.port = 1234
It's important to understand how timeouts work if you have a latency sensitive application. This library allows configuring timeouts on different levels:
Network timeouts apply to network connections to individual Kafka brokers. There are two config keys here, each passed to Kafka.new
:
connect_timeout
sets the number of seconds to wait while connecting to a broker for the first time. When ruby-kafka initializes, it needs to connect to at least one host in seed_brokers
in order to discover the Kafka cluster. Each host is tried until there's one that works. Usually that means the first one, but if your entire cluster is down, or there's a network partition, you could wait up to n * connect_timeout
seconds, where n
is the number of seed brokers.socket_timeout
sets the number of seconds to wait when reading from or writing to a socket connection to a broker. After this timeout expires the connection will be killed. Note that some Kafka operations are by definition long-running, such as waiting for new messages to arrive in a partition, so don't set this value too low. When configuring timeouts relating to specific Kafka operations, make sure to make them shorter than this one.Producer timeouts can be configured when calling #producer
on a client instance:
ack_timeout
is a timeout executed by a broker when the client is sending messages to it. It defines the number of seconds the broker should wait for replicas to acknowledge the write before responding to the client with an error. As such, it relates to the required_acks
setting. It should be set lower than socket_timeout
.retry_backoff
configures the number of seconds to wait after a failed attempt to send messages to a Kafka broker before retrying. The max_retries
setting defines the maximum number of retries to attempt, and so the total duration could be up to max_retries * retry_backoff
seconds. The timeout can be arbitrarily long, and shouldn't be too short: if a broker goes down its partitions will be handed off to another broker, and that can take tens of seconds.When sending many messages, it's likely that the client needs to send some messages to each broker in the cluster. Given n
brokers in the cluster, the total wait time when calling Kafka::Producer#deliver_messages
can be up to
n * (connect_timeout + socket_timeout + retry_backoff) * max_retries
Make sure your application can survive being blocked for so long.
By default, communication between Kafka clients and brokers is unencrypted and unauthenticated. Kafka 0.9 added optional support for encryption and client authentication and authorization. There are two layers of security made possible by this:
Encryption of Communication
By enabling SSL encryption you can have some confidence that messages can be sent to Kafka over an untrusted network without being intercepted.
In this case you just need to pass a valid CA certificate as a string when configuring your Kafka
client:
kafka = Kafka.new(["kafka1:9092"], ssl_ca_cert: File.read('my_ca_cert.pem'))
Without passing the CA certificate to the client it would be impossible to protect against man-in-the-middle attacks.
Using your system's CA cert store
If you want to use the CA certs from your system's default certificate store, you can use:
kafka = Kafka.new(["kafka1:9092"], ssl_ca_certs_from_system: true)
This configures the store to look up CA certificates from the system default certificate store on an as needed basis. The location of the store can usually be determined by: OpenSSL::X509::DEFAULT_CERT_FILE
Client Authentication
In order to authenticate the client to the cluster, you need to pass in a certificate and key created for the client and trusted by the brokers.
NOTE: You can disable hostname validation by passing ssl_verify_hostname: false
.
kafka = Kafka.new(
["kafka1:9092"],
ssl_ca_cert: File.read('my_ca_cert.pem'),
ssl_client_cert: File.read('my_client_cert.pem'),
ssl_client_cert_key: File.read('my_client_cert_key.pem'),
ssl_client_cert_key_password: 'my_client_cert_key_password',
ssl_verify_hostname: false,
# ...
)
Once client authentication is set up, it is possible to configure the Kafka cluster to authorize client requests.
Using JKS Certificates
Typically, Kafka certificates come in the JKS format, which isn't supported by ruby-kafka. There's a wiki page that describes how to generate valid X509 certificates from JKS certificates.
Kafka has support for using SASL to authenticate clients. Currently GSSAPI, SCRAM and PLAIN mechanisms are supported by ruby-kafka.
NOTE: With SASL for authentication, it is highly recommended to use SSL encryption. The default behavior of ruby-kafka enforces you to use SSL and you need to configure SSL encryption by passing ssl_ca_cert
or enabling ssl_ca_certs_from_system
. However, this strict SSL mode check can be disabled by setting sasl_over_ssl
to false
while initializing the client.
GSSAPI
In order to authenticate using GSSAPI, set your principal and optionally your keytab when initializing the Kafka client:
kafka = Kafka.new(
["kafka1:9092"],
sasl_gssapi_principal: 'kafka/kafka.example.com@EXAMPLE.COM',
sasl_gssapi_keytab: '/etc/keytabs/kafka.keytab',
# ...
)
AWS MSK (IAM)
In order to authenticate using IAM w/ an AWS MSK cluster, set your access key, secret key, and region when initializing the Kafka client:
k = Kafka.new(
["kafka1:9092"],
sasl_aws_msk_iam_access_key_id: 'iam_access_key',
sasl_aws_msk_iam_secret_key_id: 'iam_secret_key',
sasl_aws_msk_iam_aws_region: 'us-west-2',
ssl_ca_certs_from_system: true,
# ...
)
PLAIN
In order to authenticate using PLAIN, you must set your username and password when initializing the Kafka client:
kafka = Kafka.new(
["kafka1:9092"],
ssl_ca_cert: File.read('/etc/openssl/cert.pem'),
sasl_plain_username: 'username',
sasl_plain_password: 'password'
# ...
)
SCRAM
Since 0.11 kafka supports SCRAM.
kafka = Kafka.new(
["kafka1:9092"],
sasl_scram_username: 'username',
sasl_scram_password: 'password',
sasl_scram_mechanism: 'sha256',
# ...
)
OAUTHBEARER
This mechanism is supported in kafka >= 2.0.0 as of KIP-255
In order to authenticate using OAUTHBEARER, you must set the client with an instance of a class that implements a token
method (the interface is described in Kafka::Sasl::OAuth) which returns an ID/Access token.
Optionally, the client may implement an extensions
method that returns a map of key-value pairs. These can be sent with the SASL/OAUTHBEARER initial client response. This is only supported in kafka >= 2.1.0.
class TokenProvider
def token
"some_id_token"
end
end
# ...
client = Kafka.new(
["kafka1:9092"],
sasl_oauth_token_provider: TokenProvider.new
)
In addition to producing and consuming messages, ruby-kafka supports managing Kafka topics and their configurations. See the Kafka documentation for a full list of topic configuration keys.
Return an array of topic names.
kafka = Kafka.new(["kafka:9092"])
kafka.topics
# => ["topic1", "topic2", "topic3"]
kafka = Kafka.new(["kafka:9092"])
kafka.create_topic("topic")
By default, the new topic has 1 partition, replication factor 1 and default configs from the brokers. Those configurations are customizable:
kafka = Kafka.new(["kafka:9092"])
kafka.create_topic("topic",
num_partitions: 3,
replication_factor: 2,
config: {
"max.message.bytes" => 100000
}
)
After a topic is created, you can increase the number of partitions for the topic. The new number of partitions must be greater than the current one.
kafka = Kafka.new(["kafka:9092"])
kafka.create_partitions_for("topic", num_partitions: 10)
kafka = Kafka.new(["kafka:9092"])
kafka.describe_topic("topic", ["max.message.bytes", "retention.ms"])
# => {"max.message.bytes"=>"100000", "retention.ms"=>"604800000"}
Update the topic configurations.
NOTE: This feature is for advanced usage. Only use this if you know what you're doing.
kafka = Kafka.new(["kafka:9092"])
kafka.alter_topic("topic", "max.message.bytes" => 100000, "retention.ms" => 604800000)
kafka = Kafka.new(["kafka:9092"])
kafka.delete_topic("topic")
After a topic is marked as deleted, Kafka only hides it from clients. It would take a while before a topic is completely deleted.
The library has been designed as a layered system, with each layer having a clear responsibility:
Kafka::Connection
for more details.Kafka::Protocol
for more details.Kafka::Cluster
, which represents an entire cluster, while simpler ones are only available through Kafka::Broker
, which represents a single Kafka broker. In general, Kafka::Cluster
is the high-level API, with more polish.Kafka::Consumer
while the Producer API is implemented in Kafka::Producer
and Kafka::AsyncProducer
.Kafka::Client
implements the public APIs. For convenience, the method Kafka.new
can instantiate the class for you.Note that only the API and configuration layers have any backwards compatibility guarantees – the other layers are considered internal and may change without warning. Don't use them directly.
The producer is designed with resilience and operational ease of use in mind, sometimes at the cost of raw performance. For instance, the operation is heavily instrumented, allowing operators to monitor the producer at a very granular level.
The producer has two main internal data structures: a list of pending messages and a message buffer. When the user calls Kafka::Producer#produce
, a message is appended to the pending message list, but no network communication takes place. This means that the call site does not have to handle the broad range of errors that can happen at the network or protocol level. Instead, those errors will only happen once Kafka::Producer#deliver_messages
is called. This method will go through the pending messages one by one, making sure they're assigned a partition. This may fail for some messages, as it could require knowing the current configuration for the message's topic, necessitating API calls to Kafka. Messages that cannot be assigned a partition are kept in the list, while the others are written into the message buffer. The producer then figures out which topic partitions are led by which Kafka brokers so that messages can be sent to the right place – in Kafka, it is the responsibility of the client to do this routing. A separate produce API request will be sent to each broker; the response will be inspected; and messages that were acknowledged by the broker will be removed from the message buffer. Any messages that were not acknowledged will be kept in the buffer.
If there are any messages left in either the pending message list or the message buffer after this operation, Kafka::DeliveryFailed
will be raised. This exception must be rescued and handled by the user, possibly by calling #deliver_messages
at a later time.
The synchronous producer allows the user fine-grained control over when network activity and the possible errors arising from that will take place, but it requires the user to handle the errors nonetheless. The async producer provides a more hands-off approach that trades off control for ease of use and resilience.
Instead of writing directly into the pending message list, Kafka::AsyncProducer
writes the message to an internal thread-safe queue, returning immediately. A background thread reads messages off the queue and passes them to a synchronous producer.
Rather than triggering message deliveries directly, users of the async producer will typically set up automatic triggers, such as a timer.
The Consumer API is designed for flexibility and stability. The first is accomplished by not dictating any high-level object model, instead opting for a simple loop-based approach. The second is accomplished by handling group membership, heartbeats, and checkpointing automatically. Messages are marked as processed as soon as they've been successfully yielded to the user-supplied processing block, minimizing the cost of processing errors.
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
Note: the specs require a working Docker instance, but should work out of the box if you have Docker installed. Please create an issue if that's not the case.
If you would like to contribute to ruby-kafka, please join our Slack team and ask how best to do it.
If you've discovered a bug, please file a Github issue, and make sure to include all the relevant information, including the version of ruby-kafka and Kafka that you're using.
If you have other questions, or would like to discuss best practises, how to contribute to the project, or any other ruby-kafka related topic, join our Slack team!
Version 0.4 will be the last minor release with support for the Kafka 0.9 protocol. It is recommended that you pin your dependency on ruby-kafka to ~> 0.4.0
in order to receive bugfixes and security updates. New features will only target version 0.5 and up, which will be incompatible with the Kafka 0.9 protocol.
Last stable release with support for the Kafka 0.9 protocol. Bug and security fixes will be released in patch updates.
Latest stable release, with native support for the Kafka 0.10 protocol and eventually newer protocol versions. Kafka 0.9 is no longer supported by this release series.
Currently, there are three actively developed frameworks based on ruby-kafka, that provide higher level API that can be used to work with Kafka messages and two libraries for publishing messages.
Racecar - A simple framework that integrates with Ruby on Rails to provide a seamless way to write, test, configure, and run Kafka consumers. It comes with sensible defaults and conventions.
Karafka - Framework used to simplify Apache Kafka based Ruby and Rails applications development. Karafka provides higher abstraction layers, including Capistrano, Docker and Heroku support.
Phobos - Micro framework and library for applications dealing with Apache Kafka. It wraps common behaviors needed by consumers and producers in an easy and convenient API.
DeliveryBoy – A library that integrates with Ruby on Rails, making it easy to publish Kafka messages from any Rails application.
WaterDrop – A library for Ruby and Ruby on Rails applications, to easy publish Kafka messages in both sync and async way.
There are a few existing Kafka clients in Ruby:
We needed a robust client that could be used from our existing Ruby apps, allowed our Ops to monitor operation, and provided flexible error handling. There didn't exist such a client, hence this project.
Bug reports and pull requests are welcome on GitHub at https://github.com/zendesk/ruby-kafka.
Author: Zendesk
Source Code: https://github.com/zendesk/ruby-kafka
License: Apache-2.0 license
1658437920
A callbag sink (listener) that connects an Observer a-la RxJS.
npm install callbag-subscribe
import pipe from 'callbag-pipe';
import interval from 'callbag-interval';
import subscribe from 'callbag-subscribe';
const source = interval( 10 );
pipe(
source,
subscribe( val => console.log( val ) )
);
// 0
// 1
// 2
// 3
// 4
// 5
// 6
// 7
// 8
// 9
import pipe from 'callbag-pipe';
import interval from 'callbag-interval';
import subscribe from 'callbag-subscribe';
const source = interval( 10 );
pipe(
source,
subscribe({
next: val => console.log( val ),
complete: () => console.log( 'Done!' ),
error: err => console.error( err )
})
);
// 0
// 1
// 2
// 3
// 4
// 5
// 6
// 7
// 8
// 9
// Done!
Use the returned disposal function to terminate the subscription.
const source = fromEvent( document.body, 'click' );
const dispose = pipe(
source,
subscribe({
next: ev => console.log( 'Click:', ev )
})
);
// Do some stuff...
dispose(); // Terminate the subscription.
Author: zebulonj
Source Code: https://github.com/zebulonj/callbag-subscribe
License: MIT license
1636516212
Pa11y is your automated accessibility testing pal. It runs accessibility tests on your pages via the command line or Node.js, so you can automate your testing process.
On the command line:
pa11y https://example.com/
In JavaScript:
const pa11y = require('pa11y');
pa11y('https://example.com/').then((results) => {
// Do something with the results
});
If you need a GUI, you can try Koa11y. It's a desktop application for Windows, OSX and Linux that uses Pa11y to run accessibility tests.
Pa11y requires Node.js 12+ to run. If you need support for older versions of Node.js, then please use Pa11y 5.x.
To install Node.js you can use nvm:
nvm install node
You can also install Node.js using a package manager like for example Homebrew:
brew install node
Alternatively, you can also download pre-built packages from the Node.js website for your particular Operating System.
On Windows 10, download a pre-built package from the Node.js website. Pa11y will be usable via the bundled Node.js application as well as the Windows command prompt.
Install Pa11y globally with npm:
npm install -g pa11y
This installs the pa11y
command-line tool:
Usage: pa11y [options] <url>
Options:
-V, --version output the version number
-n, --environment output details about the environment Pa11y will run in
-s, --standard <name> the accessibility standard to use: WCAG2A, WCAG2AA (default), WCAG2AAA – only used by htmlcs runner
-r, --reporter <reporter> the reporter to use: cli (default), csv, json
-e, --runner <runner> the test runners to use: htmlcs (default), axe
-l, --level <level> the level of issue to fail on (exit with code 2): error, warning, notice
-T, --threshold <number> permit this number of errors, warnings, or notices, otherwise fail with exit code 2
-i, --ignore <ignore> types and codes of issues to ignore, a repeatable value or separated by semi-colons
--include-notices Include notices in the report
--include-warnings Include warnings in the report
-R, --root-element <selector> a CSS selector used to limit which part of a page is tested
-E, --hide-elements <hide> a CSS selector to hide elements from testing, selectors can be comma separated
-c, --config <path> a JSON or JavaScript config file
-t, --timeout <ms> the timeout in milliseconds
-w, --wait <ms> the time to wait before running tests in milliseconds
-d, --debug output debug messages
-S, --screen-capture <path> a path to save a screen capture of the page to
-A, --add-rule <rule> WCAG 2.1 rules to include, a repeatable value or separated by semi-colons – only used by htmlcs runner
-h, --help output usage information
Run an accessibility test against a URL:
pa11y https://example.com
Run an accessibility test against a file (absolute paths only, not relative):
pa11y ./path/to/your/file.html
Run a test with CSV reporting and save to a file:
pa11y --reporter csv https://example.com > report.csv
Run Pa11y using aXe as a test runner:
pa11y --runner axe https://example.com
Run Pa11y using aXe and HTML CodeSniffer as test runners:
pa11y --runner axe --runner htmlcs https://example.com
The command-line tool uses the following exit codes:
0
: Pa11y ran successfully, and there are no errors1
: Pa11y failed run due to a technical fault2
: Pa11y ran successfully but there are errors in the pageBy default, only accessibility issues with a type of error
will exit with a code of 2
. This is configurable with the --level
flag which can be set to one of the following:
error
: exit with a code of 2
on errors only, exit with a code of 0
on warnings and noticeswarning
: exit with a code of 2
on errors and warnings, exit with a code of 0
on noticesnotice
: exit with a code of 2
on errors, warnings, and noticesnone
: always exit with a code of 0
The command-line tool can be configured with a JSON file as well as arguments. By default it will look for a pa11y.json
file in the current directory, but you can change this with the --config
flag:
pa11y --config ./path/to/config.json https://example.com
If any configuration is set both in a configuration file and also as a command-line option, the value set in the latter will take priority.
For more information on configuring Pa11y, see the configuration documentation.
The ignore flag can be used in several different ways. Separated by semi-colons:
pa11y --ignore "issue-code-1;issue-code-2" https://example.com
or by using the flag multiple times:
pa11y --ignore issue-code-1 --ignore issue-code-2 https://example.com
Pa11y can also ignore notices, warnings, and errors up to a threshold number. This might be useful if you're using CI and don't want to break your build. The following example will return exit code 0 on a page with 9 errors, and return exit code 2 on a page with 10 or more errors.
pa11y --threshold 10 https://example.com
The command-line tool can report test results in a few different ways using the --reporter
flag. The built-in reporters are:
cli
: output test results in a human-readable formatcsv
: output test results as comma-separated valuesjson
: output test results as a JSON arraytsv
: output test results as tab-separated valuesThe Pa11y team maintain an additional html
reporter that can be installed separately via npm
and can be used as an example of how to build more complex reporters.
You can also write and publish your own reporters. Pa11y looks for reporters in your node_modules
folder (with a naming pattern), and the current working directory. The first reporter found will be loaded. So with this command:
pa11y --reporter rainbows https://example.com
The following locations will be checked:
<cwd>/node_modules/pa11y-reporter-rainbows
<cwd>/rainbows
A Pa11y reporter must export a property named supports
. This is a semver range (as a string) which indicates which versions of Pa11y the reporter supports:
exports.supports = '^5.0.0';
A reporter should export the following methods, which should all return strings. If your reporter needs to perform asynchronous operations, then it may return a promise which resolves to a string:
begin(); // Called when pa11y starts
error(message); // Called when a technical error is reported
debug(message); // Called when a debug message is reported
info(message); // Called when an information message is reported
results(results); // Called with the results of a test run
Install Pa11y with npm or add to your package.json
:
npm install pa11y
Require Pa11y:
const pa11y = require('pa11y');
Run Pa11y against a URL, the pa11y
function returns a Promise:
pa11y('https://example.com/').then((results) => {
// Do something with the results
});
Pa11y can also be run with some options:
pa11y('https://example.com/', {
// Options go here
}).then((results) => {
// Do something with the results
});
Pa11y resolves with a results
object, containing details about the page and accessibility issues from HTML CodeSniffer. It looks like this:
{
documentTitle: 'The title of the page that was tested',
pageUrl: 'The URL that Pa11y was run against',
issues: [
{
code: 'WCAG2AA.Principle1.Guideline1_1.1_1_1.H30.2',
context: '<a href="https://example.com/"><img src="example.jpg" alt=""/></a>',
message: 'Img element is the only content of the link, but is missing alt text. The alt text should describe the purpose of the link.',
selector: 'html > body > p:nth-child(1) > a',
type: 'error',
typeCode: 1
}
// more issues appear here
]
}
If you wish to transform these results with the command-line reporters, then you can do so in your code by requiring them in. The csv
, tsv
, html
, json
, and markdown
reporters all expose a process
method:
// Assuming you've already run tests, and the results
// are available in a `results` variable:
const htmlReporter = require('pa11y/reporter/html');
const html = await htmlReporter.results(results, url);
Because Pa11y is promise based, you can use async
functions and the await
keyword:
async function runPa11y() {
try {
const results = await pa11y('https://example.com/');
// Do something with the results
} catch (error) {
// Handle the error
}
}
runPa11y();
If you would rather use callbacks than promises or async
/await
, then Pa11y supports this. This interface should be considered legacy, however, and may not appear in the next major version of Pa11y:
pa11y('https://example.com/', (error, results) => {
// Do something with the results or handle the error
});
Pa11y exposes a function which allows you to validate action strings before attempting to use them.
This function accepts an action string and returns a boolean indicating whether it matches one of the actions that Pa11y supports:
pa11y.isValidAction('click element #submit'); // true
pa11y.isValidAction('open the pod bay doors'); // false
Pa11y has lots of options you can use to change the way Headless Chrome runs, or the way your page is loaded. Options can be set either as a parameter on the pa11y
function or in a JSON configuration file. Some are also available directly as command-line options.
Below is a reference of all the options that are available:
actions
(array)Actions to be run before Pa11y tests the page. There are quite a few different actions available in Pa11y, the Actions documentation outlines each of them.
pa11y('https://example.com/', {
actions: [
'set field #username to exampleUser',
'set field #password to password1234',
'click element #submit',
'wait for path to be /myaccount'
]
});
Defaults to an empty array.
browser
(Browser) and page
(Page)A Puppeteer Browser instance which will be used in the test run. Optionally you may also supply a Puppeteer Page instance, but this cannot be used between test runs as event listeners would be bound multiple times.
If either of these options are provided then there are several things you need to consider:
chromeLaunchConfig
option will be ignored, you'll need to pass this configuration in when you create your Browser instancepackage.json
Note: This is an advanced option. If you're using this, please mention in any issues you open on Pa11y and double-check that the Puppeteer version you're using matches Pa11y's.
const browser = await puppeteer.launch({
ignoreHTTPSErrors: true
});
pa11y('https://example.com/', {
browser: browser
});
browser.close();
A more complete example can be found in the puppeteer examples.
Defaults to null
.
chromeLaunchConfig
(object)Launch options for the Headless Chrome instance. See the Puppeteer documentation for more information.
pa11y('https://example.com/', {
chromeLaunchConfig: {
executablePath: '/path/to/Chrome',
ignoreHTTPSErrors: false
}
});
Defaults to:
{
ignoreHTTPSErrors: true
}
headers
(object)A key-value map of request headers to send when testing a web page.
pa11y('https://example.com/', {
headers: {
Cookie: 'foo=bar'
}
});
Defaults to an empty object.
hideElements
(string)A CSS selector to hide elements from testing, selectors can be comma separated. Elements matching this selector will be hidden from testing by styling them with visibility: hidden
.
pa11y('https://example.com/', {
hideElements: '.advert, #modal, div[aria-role=presentation]'
});
ignore
(array)An array of result codes and types that you'd like to ignore. You can find the codes for each rule in the console output and the types are error
, warning
, and notice
. Note: warning
and notice
messages are ignored by default.
pa11y('https://example.com/', {
ignore: [
'WCAG2AA.Principle3.Guideline3_1.3_1_1.H57.2'
]
});
Defaults to an empty array.
ignoreUrl
(boolean)Whether to use the provided Puppeteer Page instance as is or use the provided url. Both the Puppeteer Page instance and the Puppeteer Browser instance are required alongside ignoreUrl
.
const browser = await puppeteer.launch();
const page = await browser.newPage();
pa11y('https://example.com/', {
ignoreUrl: true,
page: page,
browser: browser
});
Defaults to false
.
includeNotices
(boolean)Whether to include results with a type of notice
in the Pa11y report. Issues with a type of notice
are not directly actionable and so they are excluded by default. You can include them by using this option:
pa11y('https://example.com/', {
includeNotices: true
});
Defaults to false
.
includeWarnings
(boolean)Whether to include results with a type of warning
in the Pa11y report. Issues with a type of warning
are not directly actionable and so they are excluded by default. You can include them by using this option:
pa11y('https://example.com/', {
includeWarnings: true
});
Defaults to false
.
level
(string)The level of issue which can fail the test (and cause it to exit with code 2) when running via the CLI. This should be one of error
(the default), warning
, or notice
.
{
"level": "warning"
}
Defaults to error
. Note this configuration is only available when using Pa11y on the command line, not via the JavaScript Interface.
log
(object)An object which implements the methods debug
, error
, and info
which will be used to report errors and test information.
pa11y('https://example.com/', {
log: {
debug: console.log,
error: console.error,
info: console.info
}
});
Each of these defaults to an empty function.
method
(string)The HTTP method to use when running Pa11y.
pa11y('https://example.com/', {
method: 'POST'
});
Defaults to GET
.
postData
(string)The HTTP POST data to send when running Pa11y. This should be combined with a Content-Type
header. E.g to send form data:
pa11y('https://example.com/', {
headers: {
'Content-Type': 'application/x-www-form-urlencoded'
},
method: 'POST',
postData: 'foo=bar&bar=baz'
});
Or to send JSON data:
pa11y('https://example.com/', {
headers: {
'Content-Type': 'application/json'
},
method: 'POST',
postData: '{"foo": "bar", "bar": "baz"}'
});
Defaults to null
.
reporter
(string)The reporter to use while running the test via the CLI. More about reporters.
{
"reporter": "json"
}
Defaults to cli
. Note this configuration is only available when using Pa11y on the command line, not via the JavaScript Interface.
rootElement
(element)The root element for testing a subset of the page opposed to the full document.
pa11y('https://example.com/', {
rootElement: '#main'
});
Defaults to null
, meaning the full document will be tested. If the specified root element isn't found, the full document will be tested.
runners
(array)An array of runner names which correspond to existing and installed Pa11y runners. If a runner is not found then Pa11y will error.
pa11y('https://example.com/', {
runners: [
'axe',
'htmlcs'
]
});
Defaults to:
[
'htmlcs'
]
rules
(array)An array of WCAG 2.1 guidelines that you'd like to include to the current standard. You can find the codes for each guideline in the HTML Code Sniffer WCAG2AAA ruleset. Note: only used by htmlcs runner.
pa11y('https://example.com/', {
rules: [
'Principle1.Guideline1_3.1_3_1_AAA'
]
});
screenCapture
(string)A file path to save a screen capture of the tested page to. The screen will be captured immediately after the Pa11y tests have run so that you can verify that the expected page was tested.
pa11y('https://example.com/', {
screenCapture: `${__dirname}/my-screen-capture.png`
});
Defaults to null
, meaning the screen will not be captured. Note the directory part of this path must be an existing directory in the file system – Pa11y will not create this for you.
standard
(string)The accessibility standard to use when testing pages. This should be one of WCAG2A
, WCAG2AA
, or WCAG2AAA
. Note: only used by htmlcs runner.
pa11y('https://example.com/', {
standard: 'WCAG2A'
});
Defaults to WCAG2AA
.
threshold
(number)The number of errors, warnings, or notices to permit before the test is considered to have failed (with exit code 2) when running via the CLI.
{
"threshold": 9
}
Defaults to 0
. Note this configuration is only available when using Pa11y on the command line, not via the JavaScript Interface.
timeout
(number)The time in milliseconds that a test should be allowed to run before calling back with a timeout error.
Please note that this is the timeout for the entire test run (including time to initialise Chrome, load the page, and run the tests).
pa11y('https://example.com/', {
timeout: 500
});
Defaults to 30000
.
userAgent
(string)The User-Agent
header to send with Pa11y requests. This is helpful to identify Pa11y in your logs.
pa11y('https://example.com/', {
userAgent: 'A11Y TESTS'
});
Defaults to pa11y/<version>
.
viewport
(object)The viewport configuration. This can have any of the properties supported by the puppeteer setViewport
method.
pa11y('https://example.com/', {
viewport: {
width: 320,
height: 480,
deviceScaleFactor: 2,
isMobile: true
}
});
Defaults to:
{
width: 1280,
height: 1024
}
wait
(number)The time in milliseconds to wait before running HTML CodeSniffer on the page.
pa11y('https://example.com/', {
wait: 500
});
Defaults to 0
.
Actions are additional interactions that you can make Pa11y perform before the tests are run. They allow you to do things like click on a button, enter a value in a form, wait for a redirect, or wait for the URL fragment to change:
pa11y('https://example.com/', {
actions: [
'click element #tab-1',
'wait for element #tab-1-content to be visible',
'set field #fullname to John Doe',
'check field #terms-and-conditions',
'uncheck field #subscribe-to-marketing',
'screen capture example.png',
'wait for fragment to be #page-2',
'wait for path to not be /login',
'wait for url to be https://example.com/',
'wait for #my-image to emit load',
'navigate to https://another-example.com/'
]
});
Below is a reference of all the available actions and what they do on the page. Some of these take time to complete so you may need to increase the timeout
option if you have a large set of actions.
This allows you to click an element by passing in a CSS selector. This action takes the form click element <selector>
. E.g.
pa11y('https://example.com/', {
actions: [
'click element #tab-1'
]
});
You can use any valid query selector, including classes and types.
This allows you to set the value of a text-based input or select box by passing in a CSS selector and value. This action takes the form set field <selector> to <value>
. E.g.
pa11y('https://example.com/', {
actions: [
'set field #fullname to John Doe'
]
});
This allows you to check or uncheck checkbox and radio inputs by passing in a CSS selector. This action takes the form check field <selector>
or uncheck field <selector>
. E.g.
pa11y('https://example.com/', {
actions: [
'check field #terms-and-conditions',
'uncheck field #subscribe-to-marketing'
]
});
This allows you to capture the screen between other actions, useful to verify that the page looks as you expect before the Pa11y test runs. This action takes the form screen capture <file-path>
. E.g.
pa11y('https://example.com/', {
actions: [
'screen capture example.png'
]
});
This allows you to pause the test until a condition is met, and the page has either a given fragment, path, or URL. This will wait until Pa11y times out so it should be used after another action that would trigger the change in state. You can also wait until the page does not have a given fragment, path, or URL using the to not be
syntax. This action takes one of the forms:
wait for fragment to be <fragment>
(including the preceding #
)wait for fragment to not be <fragment>
(including the preceding #
)wait for path to be <path>
(including the preceding /
)wait for path to not be <path>
(including the preceding /
)wait for url to be <url>
wait for url to not be <url>
E.g.
pa11y('https://example.com/', {
actions: [
'click element #login-link',
'wait for path to be /login'
]
});
This allows you to pause the test until an element on the page (matching a CSS selector) is either added, removed, visible, or hidden. This will wait until Pa11y times out so it should be used after another action that would trigger the change in state. This action takes one of the forms:
wait for element <selector> to be added
wait for element <selector> to be removed
wait for element <selector> to be visible
wait for element <selector> to be hidden
E.g.
pa11y('https://example.com/', {
actions: [
'click element #tab-2',
'wait for element #tab-1 to be hidden'
]
});
This allows you to pause the test until an element on the page (matching a CSS selector) emits an event. This will wait until Pa11y times out so it should be used after another action that would trigger the event. This action takes the form wait for element <selector> to emit <event-type>
. E.g.
pa11y('https://example.com/', {
actions: [
'click element #tab-2',
'wait for element #tab-panel-to to emit content-loaded'
]
});
This action allows you to navigate to a new URL if, for example, the URL is inaccessible using other methods. This action takes the form navigate to <url>
. E.g.
pa11y('https://example.com/', {
actions: [
'navigate to https://another-example.com'
]
});
Pa11y supports multiple test runners which return different results. The built-in test runners are:
axe
: run tests using aXe-core.htmlcs
(default): run tests using HTML CodeSnifferYou can also write and publish your own runners. Pa11y looks for runners in your node_modules
folder (with a naming pattern), and the current working directory. The first runner found will be loaded. So with this command:
pa11y --runner my-testing-tool https://example.com
The following locations will be checked:
<cwd>/node_modules/pa11y-runner-my-testing-tool
<cwd>/node_modules/my-testing-tool
<cwd>/my-testing-tool
A Pa11y runner must export a property named supports
. This is a semver range (as a string) which indicates which versions of Pa11y the runner supports:
exports.supports = '^5.0.0';
A Pa11y runner must export a property named scripts
. This is an array of strings which are paths to scripts which need to load before the tests can be run. This may be empty:
exports.scripts = [
`${__dirname}/vendor/example.js`
];
A runner must export a run
method, which returns a promise that resolves with test results (it's advisable to use an async
function). The run
method is evaluated in a browser context and so has access to a global window
object.
The run
method must not use anything that's been imported using require
, as it's run in a browser context. Doing so will error.
The run
method is called with two arguments:
options
: Options specified in the test runnerpa11y
: The Pa11y test runner, which includes some helper methods:pa11y.getElementContext(element)
: Get a short HTML context snippet for an elementpa11y.getElementSelector(element)
: Get a unique selector with which you can select this element in a pageThe run
method must resolve with an array of Pa11y issues. These follow the format:
{
code: '123', // An ID or code which identifies this error
element: {}, // The HTML element this issue relates to, or null if no element is found
message: 'example', // A descriptive message outlining the issue
type: 'error', // A type of "error", "warning", or "notice"
runnerExtras: {} // Additional data that your runner can provide, but isn't used by Pa11y
}
Run Pa11y on a URL and output the results. See the example.
Run Pa11y on multiple URLs at once and output the results. See the example.
Step through some actions before Pa11y runs. This example logs into a fictional site then waits until the account page has loaded before running Pa11y. See the example.
Pass in pre-created Puppeteer browser and page instances so that you can reuse them between tests. See the example.
See our Troubleshooting guide to get the answers to common questions about Pa11y, along with some ideas to help you troubleshoot any problems.
You can find some useful tutorials and articles in the Tutorials section of pa11y.org.
There are many ways to contribute to Pa11y, we cover these in the contributing guide for this repo.
If you're ready to contribute some code, clone this repo locally and commit your code on a new branch.
Please write unit tests for your code, and check that everything works by running the following before opening a pull request:
npm run lint
npm test
You can also run verifications and tests individually:
npm run lint # Verify all of the code (ESLint)
npm test # Run all tests
npm run test-unit # Run the unit tests
npm run coverage # Run the unit tests with coverage
npm run test-integration # Run the integration tests
To debug a test file you need to ensure that setup.test.js is ran before the test file. This adds a before/each
to start and stop the integration test server.
Pa11y major versions are normally supported for 6 months after their last minor release. This means that patch-level changes will be added and bugs will be fixed. The table below outlines the end-of-support dates for major versions, and the last minor release for that version.
We also maintain a migration guide to help you migrate.
:grey_question: | Major Version | Last Minor Release | Node.js Versions | Support End Date |
---|---|---|---|---|
:heart: | 6 | N/A | 12+ | N/A |
:warning: | 5 | 5.3 | 8+ | 2021-11-25 |
:skull: | 4 | 4.13 | 4–8 | 2018-08-15 |
:skull: | 3 | 3.8 | 0.12–6 | 2016-12-05 |
:skull: | 2 | 2.4 | 0.10–0.12 | 2016-10-16 |
:skull: | 1 | 1.7 | 0.10 | 2016-06-08 |
If you're opening issues related to these, please mention the version that the issue relates to.
Pa11y is licensed under the Lesser General Public License (LGPL-3.0).
Copyright © 2013–2021, Team Pa11y and contributors
#pa11y
1625848800
Social Buttons Design HTML and CSS Tutorial. Login and Registration Form HTML and CSS Tutorial. Subscribe to Newsletter Design Tutorial. Contact Form Design Using HTML and CSS. Timeline HTML and CSS Tutorial. Navigation Menu Design Tutorial.
How To Create a Comments Box Area in HTML & CSS Tutorial || HTML Tutorial || CSS Tutorial : https://youtu.be/UqO0XvtYUNw
Search Box Design Tutorial Using Only HTML and CSS : https://youtu.be/_F4DdlgQX5U
How To Create Website Preloader in HTML and CSS : https://youtu.be/c9tElJYxyxQ
How to Create Simple Registration Form using only HTML and CSS || Sign up Page Design Tutorial : https://youtu.be/Pcrd_ObbScs
How to Create Simple Login Form using only HTML and CSS || Sign In Page Design Tutorial: https://youtu.be/gyXSKhjZfgU
How To Create Pagination Design using HTML and CSS || UI Tutorial : https://youtu.be/mU--VbRr-hM
How To Create Header Footer Small Media Icons & Links in HTML CSS Tutorial: https://youtu.be/Aylsy4g4wZU
How To Create Countdown Timer in jQuery | How to Use Countdown jQuery Plugin | jQuery Tutorial : https://youtu.be/yuJvbxf2OQQ
Social Share Icon Design in HTML and CSS | Share Button : https://youtu.be/Pl_nlscvHvA
How To Create Custom Radio Button HTML and CSS Tutorial | HTML Tutorial | CSS Tutorial : https://youtu.be/kZV1Bc9AJno
#Tutorial #HTML #CSS #EasyWebCode
#css #html #subscribe
1625845140
How To Create The Subscribe To Newsletter Design Using HTML and CSS. Email Subscription Form in HTML And CSS. Check out how to create the subscribe to newsletter design using html and css.
Code Editor: Visual Studio Code
Like our Facebook Page
https://www.facebook.com/easywebcode7
How To Create a Comments Box Area in HTML & CSS Tutorial || HTML Tutorial || CSS Tutorial : https://youtu.be/UqO0XvtYUNw
Search Box Design Tutorial Using Only HTML and CSS : https://youtu.be/_F4DdlgQX5U
How To Create Website Preloader in HTML and CSS : https://youtu.be/c9tElJYxyxQ
How to Create Simple Registration Form using only HTML and CSS || Sign up Page Design Tutorial : https://youtu.be/Pcrd_ObbScs
How to Create Simple Login Form using only HTML and CSS || Sign In Page Design Tutorial: https://youtu.be/gyXSKhjZfgU
How To Create Pagination Design using HTML and CSS || UI Tutorial : https://youtu.be/mU--VbRr-hM
How To Create Header Footer Small Media Icons & Links in HTML CSS Tutorial: https://youtu.be/Aylsy4g4wZU
How To Create Countdown Timer in jQuery | How to Use Countdown jQuery Plugin | jQuery Tutorial : https://youtu.be/yuJvbxf2OQQ
Social Share Icon Design in HTML and CSS | Share Button : https://youtu.be/Pl_nlscvHvA
How To Create Custom Radio Button HTML and CSS Tutorial | HTML Tutorial | CSS Tutorial : https://youtu.be/kZV1Bc9AJno
Music Credit:
Track: Jensation - Delicious [NCS Release]
Music provided by NoCopyrightSounds.
Free Download / Stream: http://ncs.io/DeliciousYO
#Subscribe #Newsletter #Tutorial #HTML #CSS
#css #html #newsletter #subscribe
1620910620
You have spend a lot of time building your high-quality newsletter and now you want people to sign up. Including the usual <script>
element, provided by Mailchimp, on your website is easy. But what if you want to ask people to subscribe as part of signing up as a user on your .NET website? In this post, I’ll show you how to manage Mailchimp subscriptions from C#.
For the example code in this post I’ll use an ASP.NET Core website. This could be anything .NET really, but asking a user to subscribe to your newsletter as part of signing up for a new user cover a common scenario.
#.net/c# #subscribe #mailchimp
1617619375
Hello my friends, welcome to my other blog on HTML & CSS, today we will learn how to create a working subscribe button using HTML with awesome CSS design and animation. I have been creating and various video tutorials and articles about HTML, if you are my regular viewers then you will definitely know how many things that we can do from HTML. Recently I have shared How to Scroll the Page in Navigation click.
Subscribe button is the type of button or medium to follow the particular person’s work or other to get regular updates from them. For example, we have been subscribed to various person’s videos on the youtube channel. It helps us to notify them when they uploaded new videos on their youtube channel.
The image I have given on the image is the actual design of subscribe button that we are going to build today. As you can see on that image on the right side there is one logo and on the left side, some text and one subscribe button, and one small cross button. Basically, when we clicked on that subscribe button, we will directly be redirected to the youtube channel of the coding lab and this cross button is for hiding that toast notification.
Subscribe Button ( Source Code)
You can download all source code from the given link. Click Here To Download All Source Code
#subscribe #button #cssbutton #buttondesign #webpushnotification #csstoastnotification
1595488683
Whether for Search Engine Optimization or your passion project, it’s important for your webpages to be responsive. After all, more than 70% of the online population loves to read content on their smart devices. This means that your page needs to render perfectly on these small, hand-held devices. Of course, you have bootstrap and other libraries to help you with this. But what if you want to take things into your own hands?
In this post, you will get a step by step guide on how to create a responsive web page in Angular 2.0.
Let’s get started.
Create a new angular project, or you can use an existing one!
Create an enum, for holding all possible window sizes to be considered.
Create a separate folder, SizeDetector, to carry the files required for building a responsive site
Create a sizeDetector.ts file, which is responsible for identifying any changes to the size of the window
Create a sizeService.ts file, which is an injectable service. This service observes any changes made to the size (using a Subject), and notifies the view that subscribes it!
Create a view that subscribes the service, and observes for changes. When a change happens, you can define how the view needs to be modified. The latter is totally up to you!
Doesn’t this sound simple?
#subscribe #injectable #observables #responsive-website-design #angular2