This text can be used to support the installation in any Ubuntu 20.04 server clusters, and this is the beauty of well-designed layered software. Furthermore, if you have more nodes, you can distribute the software as you like. The text assumes you know Linux command line, including ssh, vim, and nano.
I do not recommend starting with less than three Raspberries since you need to set the communication, and both Zookeeper and Kafka requires an odd number of nodes. If you are trying a single node this guide may be used. Still, the performance is likely to be disappointing in a Raspberry — for single node I suggest a virtual machine with a reasonable amount of RAM and processor.

#cluster #kafka #hadoop #hive #spark #big data

A Data Science/Big Data Laboratory — part 2 of 4
1.35 GEEK