As industrial Internet-of-Things (IIoT) applications produce a blinding amount of data 24/7, relational and key- value-based databases struggle to keep up.

It’s a problem the team at the Tsinghua University School of Software has been working on for going on a decade. The result is IoTDB, which recently graduated from the Apache Software Foundation incubator to become a top-level project.

It reached this status “at a time of confluence of database, Internet-of-Things (IoT) and AI technologies in conjunction with a wider adoption of Industry 4.0 and automation approaches to further enable remote work and increased efficiencies,” said C. Mohan, retired IBM Fellow, former chief scientist at IBM India, and a member of the US National Academy of Engineering.

As a Distinguished Visiting Professor working with the team, “I have seen this project reach maturity and build up a vibrant open source community around it,” he said.

From Chinese University

As a PhD student at the Chinese university starting around 2012, Xiangdong Huang, now vice president of the Apache project, was assigned to manage the time-series data being generated by the minute by a large company’s 200,000 machines. Reading the data from Oracle quickly proved to be too slow and buying a more advanced license too expensive.

They decided to try NoSQL — Cassandra — but ran into performance problems with it as well.

“Apache Cassandra is good, and we used five nodes to manage all the data,” he said, explaining, “The user may create more than 5,000 tables in Cassandra, and they do not want to buy more servers to form a larger cluster (as the budget is limited). From then on, I spent about two years to read the source code of Cassandra, and did some modifications on Cassandra, and spent a lot of effort to use the limited server resources to provide better performance, which made us tired. … Even [though] we spent a lot of effort, we find it hard to use five nodes to reach 10 million data points written per second.”

They tried saving a packet of hundreds of data points to Cassandra as key-value pairs. But that meant maintaining everything themselves and still ran into performance issues and limitations with the data structure.

They then decided to create a time-series database from scratch. Professor Jianmin Wang came up with the idea of donating the project to ASF in 2018 as a way to get more people involved.

#data #edge / iot #profile #data-science

IoTDB Provides Data Management for Industrial Edge IT
1.15 GEEK