目录
一. 前言
二. 稳定版本(Stable version)
三. ZooKeeper 弃用(ZooKeeper Deprecation)
3.1. Migration(迁移)
3.2. 3.x 和 ZooKeeper 支持(3.x and ZooKeeper Support)
3.3. ZooKeeper 和 KRaft 时间表(ZooKeeper and KRaft timeline)
四. 操作 ZooKeeper(Operationalizing ZooKeeper)
一. 前言
ZooKeeper 是一个分布式的协调服务,主要用于维护集群的元数据信息和配置信息。Kafka 集群依赖于 ZooKeeper 来存储和管理 Kafka 的元数据信息和配置信息。
在 Kafka 集群中,ZooKeeper 主要负责以下任务:
- 管理 Kafka Broker 节点的状态信息,包括 Broker的上-下线状态、主题分区信息、副本信息等;
- 存储和管理 Kafka 集群的元数据信息和配置信息,包括 Broker 的 IP 地址、端口号、主题分区的分配方案等;
- 通过监控 Kafka Broker 节点的状态信息,帮助 Kafka 集群实现自动故障转移和负载均衡等功能。
注:以上这些都低版本 Kafka 依赖 ZooKeeper 实现的,在高版本 Kafka 中,已经摆脱了对 ZooKeeper 的强依赖。
二. 稳定版本(Stable version)
原文引用:The current stable branch is 3.5. Kafka is regularly updated to include the latest release in the 3.5 series.
当前稳定分支为 3.5。Kafka 会定期更新,以包含 3.5 系列中的最新版本。
三. ZooKeeper 弃用(ZooKeeper Deprecation)
原文引用:With the release of Apache Kafka 3.5, Zookeeper is now marked deprecated. Removal of ZooKeeper is planned in the next major release of Apache Kafka (version 4.0), which is scheduled to happen no sooner than April 2024. During the deprecation phase, ZooKeeper is still supported for metadata management of Kafka clusters, but it is not recommended for new deployments. There is a small subset of features that remain to be implemented in KRaft see current missing features for more information.
随着 Apache Kafka 3.5 的发布,ZooKeeper 现在被标记为已弃用。计划在 Apache Kafka(4.0版)的下一个主要版本中删除 ZooKeeper,该版本最快将于2024年4月发布。在弃用阶段,ZooKeeper 仍然支持用于 Kafka 集群的元数据管理,但不建议用于新的部署。KRaft 中还有一小部分功能有待实现。有关更多信息,请参阅当前缺失的功能。
3.1. Migration(迁移)
原文引用:Migration of an existing ZooKeeper based Kafka cluster to KRaft is currently Preview and we expect it to be ready for production usage in version 3.6. Users are recommended to begin planning for migration to KRaft and also begin testing to provide any feedback. Refer to ZooKeeper to KRaft Migration for details on how to perform a live migration from ZooKeeper to KRaft and current limitations.
目前正在预览将现有的基于 ZooKeeper 的 Kafka 集群迁移到 KRaft,我们预计它将在3.6版本中投入生产使用。建议用户开始计划迁移到 KRaft,并开始测试以提供任何反馈。有关如何执行从ZooKeeper 到 KRaft 的实时迁移以及当前限制的详细信息,请参阅 ZooKeeper-to-KRaft-Migration。
3.2. 3.x 和 ZooKeeper 支持(3.x and ZooKeeper Support)
原文引用:The final 3.x minor release, that supports ZooKeeper mode, will receive critical bug fixes and security fixes for 12 months after its release.
最后一个支持 ZooKeeper 模式的 3.x 小版本将在发布后的12个月内获得关键的错误修复和安全修复。
3.3. ZooKeeper 和 KRaft 时间表(ZooKeeper and KRaft timeline)
原文引用:For details and updates on tentative timelines for ZooKeeper removal and planned KRaft feature releases, refer to KIP-833.
有关删除 ZooKeeper 和计划发布 KRaft 功能的暂定时间表的详细信息和更新,请参阅 KIP-833。
四. 操作 ZooKeeper(Operationalizing ZooKeeper)
原文引用:Operationally, we do the following for a healthy ZooKeeper installation:
- Redundancy in the physical/hardware/network layout: try not to put them all in the same rack, decent (but don't go nuts) hardware, try to keep redundant power and network paths, etc. A typical ZooKeeper ensemble has 5 or 7 servers, which tolerates 2 and 3 servers down, respectively. If you have a small deployment, then using 3 servers is acceptable, but keep in mind that you'll only be able to tolerate 1 server down in this case.
- I/O segregation: if you do a lot of write type traffic you'll almost definitely want the transaction logs on a dedicated disk group. Writes to the transaction log are synchronous (but batched for performance), and consequently, concurrent writes can significantly affect performance. ZooKeeper snapshots can be one such a source of concurrent writes, and ideally should be written on a disk group separate from the transaction log. Snapshots are written to disk asynchronously, so it is typically ok to share with the operating system and message log files. You can configure a server to use a separate disk group with the dataLogDir parameter.
- Application segregation: Unless you really understand the application patterns of other apps that you want to install on the same box, it can be a good idea to run ZooKeeper in isolation (though this can be a balancing act with the capabilities of the hardware).
- Use care with virtualization: It can work, depending on your cluster layout and read/write patterns and SLAs, but the tiny overheads introduced by the virtualization layer can add up and throw off ZooKeeper, as it can be very time sensitive
- ZooKeeper configuration: It's java, make sure you give it 'enough' heap space (We usually run them with 3-5G, but that's mostly due to the data set size we have here). Unfortunately we don't have a good formula for it, but keep in mind that allowing for more ZooKeeper state means that snapshots can become large, and large snapshots affect recovery time. In fact, if the snapshot becomes too large (a few gigabytes), then you may need to increase the initLimit parameter to give enough time for servers to recover and join the ensemble.
- Monitoring: Both JMX and the 4 letter words (4lw) commands are very useful, they do overlap in some cases (and in those cases we prefer the 4 letter commands, they seem more predictable, or at the very least, they work better with the LI monitoring infrastructure)
- Don't overbuild the cluster: large clusters, especially in a write heavy usage pattern, means a lot of intracluster communication (quorums on the writes and subsequent cluster member updates), but don't underbuild it (and risk swamping the cluster). Having more servers adds to your read capacity.
在操作上,我们为一个健康的 ZooKeeper 安装执行以下操作:
- 物理/硬件/网络布局中的冗余:尽量不要把它们都放在同一个机架、像样的(但不要狂热)硬件、尽量保持冗余的电源和网络路径等。一个典型的 ZooKeeper 集成有5或7台服务器,分别能容得下2至3台服务器。如果您有一个小型部署,那么使用3台服务器是可以接受的,但请记住,在这种情况下,您只能容忍1台服务器停机。
- I/O 隔离:如果您进行大量写入类型的通信,您几乎肯定会希望事务日志位于专用磁盘组上。对事务日志的写入是同步的(但为了性能而进行批处理),因此,并发写入会显著影响性能。ZooKeeper 快照可以是这样一个并发写入源,理想情况下应该写入与事务日志分离的磁盘组中。快照是异步写入磁盘的,因此通常可以与操作系统和消息日志文件共享。您可以使用dataLogDir 参数将服务器配置为使用单独的磁盘组。
- 应用程序隔离:除非你真的了解要安装在同一个盒子上的其他应用程序的应用程序模式,否则最好隔离运行 ZooKeeper(尽管这可能是与硬件功能的平衡)。
- 小心使用虚拟化:它可以工作,这取决于您的集群布局、读/写模式和 SLA,但虚拟化层引入的微小开销可能会累积起来,并影响 ZooKeeper,因为它对时间非常敏感。
- ZooKeeper 配置:它是 Java,确保给它“足够”的堆空间(我们通常用 3-5G 运行它们,但这主要是由于我们这里的数据集大小)。不幸的是,我们没有一个好的公式,但请记住,允许更多的 ZooKeeper 状态意味着快照可能会变大,而大的快照会影响恢复时间。事实上,如果快照变得太大(几 GB),那么您可能需要增加 initLimit 参数,以便给服务器足够的时间来恢复和加入集合。
- 监控:JMX 和4个字母的单词(4lw)命令都非常有用,在某些情况下它们确实重叠(在这些情况下,我们更喜欢4字指令,它们看起来更可预测,或者至少,它们与 LI 监控基础设施配合得更好)
- 不要过度构建集群:大型集群,尤其是在写操作频繁的使用模式中,意味着大量的集群内通信(写操作和随后的集群成员更新的配额),但不要过度构建(并有淹没集群的风险)。拥有更多的服务器会增加您的读取容量。
原文引用:Overall, we try to keep the ZooKeeper system as small as will handle the load (plus standard growth capacity planning) and as simple as possible. We try not to do anything fancy with the configuration or application layout as compared to the official release as well as keep it as self contained as possible. For these reasons, we tend to skip the OS packaged versions, since it has a tendency to try to put things in the OS standard hierarchy, which can be 'messy', for want of a better way to word it.
总的来说,我们尽量让 ZooKeeper 系统尽可能小,以处理负载(加上标准的增长容量规划),并尽可能简单。与官方版本相比,我们尽量不在配置或应用程序布局上做任何花哨的事情,并尽可能保持其独立性。出于这些原因,我们倾向于跳过操作系统打包版本,因为它有一种倾向,试图将东西放在操作系统标准层次结构中,这可能是“混乱的”,因为没有更好的表达方式。
