PhD position on Performance Observability in Cloud-Edge Continuum (1.0 FTE)

Updated: almost 2 years ago
Job Type: Temporary
Deadline: 08 Jul 2022

Nowadays large-scale software applications (e.g., streaming and gaming services, enterprise applications) are running on distributed systems that are evolving towards an interconnected continuum consisting of high-end cloud servers to mid-range fog gateways and low-end heterogeneous edge devices. On the one hand, such a scalable continuum continues to improve the agility, responsiveness and effectiveness of highly distributed applications. On the other hand, this paradigmatic shift has also introduced significant complexities to the resource and service management layers. These include challenges related to performance optimisation and maintenance in large-scale distributed systems. Of particular interest are the so-called performance variability and degradation issues, which often impact the reliability of the infrastructure and services. Simply put, variability and degradation refer to fluctuating application performance indicators due to anomalous performance events (e.g., resource interference, unpredictable workload, or correlated faults) in the underlying infrastructure, further inducing undesirable performance behaviour and service outages. Such issues will only aggravate due to cloud-edge systems' increasing scale and heterogeneity. This calls for the need to research cloud-edge management paradigms with more visibility and control over system parameters, infrastructure dependencies, and application constraints impacting the performance.

In this PhD position you focus on contributing to the theory and design of holistic performance observability (i.e. measuring the internal states of a system by examining its output) solutions in cloud-edge based systems. You will focus on diagnosis, prognosis, causal attribution and analysis of anomalous performance events, evaluating their impacts/trade-offs, localising their actual root cause (s), and suggesting or actuating mitigations. You contribute to our long-term objective to define open standards and develop observability benchmarks that enhances experiment reproducibility in cloud-edge systems.

You will:

  • analyse the literature to understand the state-of-the-art in edge-cloud performance observability, diagnosis, prognosis and their limiting factors in resource management;
  • design a generalised observability theory and understand the ramifications of design choices, restrictions, and trade-offs in large scale cloud-edge systems;
  • research and implement diagnosis and prognosis algorithms to infer the observable and non-observable anomalous events when localising their causality;
  • develop a performance management prototype that benchmarks degradation and variability scenarios in a controlled environment and validates them based on the developed observability theory and heuristics to provide early detection and mitigation of performance issues.

The position includes research as well as teaching. Approximately 30% of your time will be spent performing varying teaching (support) activities. We offer the opportunity to take significant steps towards acquiring a basic teaching qualification (BKO), which qualifies you as a teacher in the Dutch higher education system.



Similar Positions