by Brown CS alum Ayush Bhardwaj
Microservices have been transforming the computing landscape with web-scale infrastructures like Facebook, Google, Amazon, and telecom infrastructures like AT&T and Ericsson adopting them. The microservices paradigm has proven to promote better scalability, fault tolerance, and deployability. However, it also significantly increases the space of configuration options and performance problems, rendering traditional approaches to management ineffective.
Recent efforts to address this problem have embraced Artificial Intelligence for IT Operations (AIOps). However, training effective AI models requires significant amounts of data and, in some instances, a framework for quickly exploring or analyzing model performance. Digital twins, or simulators, have effectively enabled AI-based management frameworks within other domains like manufacturing, industrial, and automotive.
In our recent paper, we proposed the design of KubeKlone, the first comprehensive digital twin for modeling cloud-native microservices applications. KubeKlone is motivated by the need for accurate, efficient, and general model training. It satisfies these goals by decoupling the simulation of microservices from the training of machine learning (ML) models while simultaneously ensuring efficiency and simplifying model design.
KubeKlone introduces a queue-based simulator that abstracts infrastructure details and focuses on modeling with queues, and resource contentions across host and network components. To simplify model design, KubeKlone provides interfaces that hide simulator details and provides wrappers around popular ML packages.
The significance of KubeKlone lies in its potential to provide a great pathway for cost-effective and rapid AI and ML training in cloud computing. By harnessing the power of digital twins, KubeKlone empowers companies to simulate and optimize their microservices infrastructure using AI-based management frameworks. This approach eliminates the need for expensive and time-consuming trial-and-error testing, leading to cost savings and greater efficiency. Ultimately, KubeKlone accelerates the development and deployment of AI and ML models for cloud computing applications, resulting in a significant reduction in costs and enhanced performance.
You can read our paper here.