The Twelfth International Workshop on Load Testing and Benchmarking of Software Systems (LTB 2024)

Workshop Agenda

Note: The workshop starts at 9:00 am London time (UTC) on May 7, 2024.

Click on the title to see details.

Session 1: Enhancing Infrastructure Reliability
09:00 - 09:15	Opening
09:15 - 10:30	Keynote	Wahab Hamou-Lhadj (Concordia University) Improving the Reliability of Software- Intensive Infrastructures Using AIOps. Abstract: Modern software systems rely on the recent advances in distributed system architectures, cloud computing, virtualization and containerisation, and Internet of Things to drive new ways of creating value and stimulating growth in diverse sectors of modern society. The fragmented and distributed nature of these systems, combined with the adoption of agile processes, DevOps and continuous delivery models, call for advanced system monitoring and analysis techniques. The massive amounts of data generated during system operations make timely detection and prevention of failures and anomalies a challenging task. To address this, many organizations are turning to the application of artificial intelligence (AI) to support IT operations (AIOps). According to the 2021 Gartner Market Guide for AIOps Platforms, “There is no future of IT operations that does not include AIOps." In this talk, I will discuss the enabling technologies behind AIOps and how AIOps can improve the reliability of large digital infrastructure and cloud intelligence. I will give examples of projects, discuss the challenges and future directions in this field. Bio: Dr. Wahab Hamou-Lhadj is a Professor at Concordia University, Montreal, Canada. He is also an Affiliate Researcher at NASA JPL, Caltech, Pasadena, USA. His research interests include software engineering, AI for software systems, AIOps, software observability, and model-driven engineering. He has been the principal investigator for several projects with various organizations. Several of the tools that were developed in his lab (e.g., TotalADS and CommitAssistant) have been successfully transferred to the industry, and are currently used by thousands of developers. His research project with Ubisoft was featured in major media outlets including The Globe and Mail, The Financial Post, Penticton Herald, Wired, and BNN Bloomberg. Dr. Hamou-Lhadj served on the organization and program committees of major conferences such as ICSE, SANER, ICPC, ICSME, ICEIS, and MODELS. He is currently an Associate Editor of IEEE Transactions on Reliability. Dr. Hamou-Lhadj received his PhD from the University of Ottawa, Canada. He is a Senior Member of IEEE, and a long-lasting member of ACM. He is also a frequent contributor to the OMG-Certified Expert in BPM (OCEB) and OMG-Certified UML Professional (OCUP) certification programs.
10:30 - 11:00	Coffee Break
Session 2: Innovations in Performance Testing: Strategies and Technologies
11:00 - 11:30	Industry Talk	René Schwietzke (Xceptance) How to Sell Performance Test Results To a Diverse Crowd Abstract: Load and performance testing is an essential part of the modern application lifecycle. But sharing test status and talking about goals is extremely difficult, in part because not everyone speaks the same language. On the one hand, there is the business impact, on the other hand there are the technical implications of design decisions or what should be done to improve performance. Performance testing generates an enormous amount of data. Most of the data is only relevant in the event of problems, but there is still a lot of data that needs to be analyzed and presented. The consumers of the results have different expectations and expertise. Let's find out what data is measured, what data is required, and how much it can and should be condensed. The final result should be a simple message to the consumers of the test results without the need to reiterate on basic testing concepts. This presentation discusses general criteria for a load and performance test, explains them in detail, and shows how a final rating is applied to aid the decision process. This is an industry view, based on daily testing work for international ecommerce vendors and similar online businesses. Xceptance's 20 years of performance consulting and the challenges we have faced have strongly influenced this presentation. It will give an idea of how to set, evaluate, communicate and ultimately sell performance results to a very diverse audience, ranging from developers to CEOs. It will also shed some light on the massive amount of data that load testing produces if one really wants to cover all the bases. The presentation will use lots of real-world examples and numbers. It will also challenge the audience to come up with better metrics, evaluation strategies, and effective ways to detect patterns and interesting behavior.
11:30 - 12:30	Research Talk	Oleksandr Kachur and Aleksei Vasilevskii Self-Service Performance Testing Platform for Autonomous Development Teams. Abstract: In the modern fast paced and highly autonomous software development teams, it’s crucial to maintain a sustainable approach to all performance engineering topics, including performance testing. The high degree of autonomy often results in teams building their own frameworks that are not used consistently and may be abandoned due to lack of support or integration with existing infrastructure, processes and tools. To address these challenges, we present a self-service performance testing platform based on open-source software, that supports distributed load generation, historical results storage and a notification system to trigger alerts in Slack messenger. In addition, it integrates with GitHub Actions to enable developers running load tests as part of their CI/CD pipelines. We'd like to share some technical solutions and the details of the decision making process behind the performance testing platform in a scale-up environment, our experience building this platform and, most importantly, rolling it out to autonomous development teams and onboarding them into the continuous performance improvement process.
12:00 - 12:30	Research Talk	Konstantinos Chalkias, Jonas Lindstrøm, Deepak Maram, Ben Riva, Arnab Roy, Joy Wang, and Alberto Sonnino Fastcrypto: Pioneering Cryptography Via Continuous Benchmarking. Abstract: In the rapidly evolving fields of encryption and blockchain technologies, the efficiency and security of cryptographic schemes significantly impact performance. This paper introduces a comprehensive framework for continuous benchmarking in one of the most popular cryptography Rust libraries, fastcrypto. What makes our analysis unique is the realization that automated benchmarking is not just a performance monitor and optimization tool, but it can be used for cryptanalysis and innovation discovery as well. Surprisingly, benchmarks can uncover spectacular security flaws and inconsistencies in various cryptographic implementations and standards, while at the same time they can identify unique opportunities for innovation not previously known to science, such as providing a) hints for novel algorithms, b) indications for mix-and-match library functions that result in world record speeds, and c) evidences of biased or untested real world algorithm comparisons in the literature. Our approach transcends traditional benchmarking methods by identifying inconsistencies in multi-threaded code, which previously resulted in unfair comparisons. We demonstrate the effectiveness of our methodology in identifying the fastest algorithms for specific cryptographic operations like signing, while revealing hidden performance characteristics and security flaws. The process of continuous benchmarking allowed fastcrypto to break many crypto-operations speed records in the Rust language ecosystem. A notable discovery in our research is the identification of vulnerabilities and unfair speed claims due to missing padding checks in high-performance Base64 encoding libraries. We also uncover insights into algorithmic implementations such as multi-scalar elliptic curve multiplications, which exhibit different performance gains when applied in different schemes and libraries. This was not evident in conventional benchmarking practices. Further, our analysis highlights bottlenecks in cryptographic algorithms where pre-computed tables can be strategically applied, accounting for L1 and L2 CPU cache limitations. Our benchmarking framework also reveals that certain algorithmic implementations incur additional overheads due to serialization processes, necessitating a refined `apples to apples' comparison approach. We identified unique performance patterns in some schemes, where efficiency scales with input size, aiding blockchain technologies in optimal parameter selection and data compression. Crucially, continuous benchmarking serves as a tool for ongoing audit and security assurance. Variations in performance can signal potential security issues during upgrades, such as cleptography, hardware manipulation or supply chain attacks. This was evidenced by critical private key leakage vulnerabilities we found in one of the most popular EdDSA Rust libraries. By providing a dynamic and thorough benchmarking approach, our framework empowers stakeholders to make informed decisions, enhance security measures, and optimize cryptographic operations in an ever-changing digital landscape.
12:30 - 14:00	Lunch Break
Session 3: The Future of Performance Testing: Scaling and Isolation Techniques
14:00 - 15:00	Keynote	David Daly (MongoDB) Scaling Performance Testing to Millions of Distinct Results. Abstract: We invest a lot of time and effort into performance testing at MongoDB. We want to ensure that each release of MongoDB is faster and better than the previous one. We have learned a lot from our performance testing, and that has driven positive changes into our software. Because the testing has been useful, we have progressively done more of it and measured more things per test. The total number of things we can measure for any version of MongoDB has skyrocketed and is now in the millions. We have much more information, but it has become harder to derive useful meaning due to the sheer volume of information. Over the past year we have invested in a number of efforts focused on deriving more meaning from all of our information. Ultimately, we want to make it easy to answer a number of questions about our software. Does a given change make the software faster? Is a given feature faster today than our previous release? If we release a new version today, will customer workloads run faster? How can I make this feature faster? Why did my change make things slower? In this talk we will cover our performance infrastructure, how its scale has challenged us, and the work we have done to address those challenges to ultimately derive more meaning from our tests. Bio: David is a staff engineer at MongoDB focused on server performance. He currently focuses on performance testing infrastructure and tools to increase our understanding of how MongoDB's software performs for its customers. He helped build and design MongoDB's performance testing infrastructure from the bottom up. At various times this required focusing on complete end-to-end automation, control of test noise and variability, working around test noise, and building processes to make sure that issues identified by the infrastructure were properly recognized and addressed. At other times David has focused on: Asking hard questions about MongoDB performance and then trying to answer them (or having someone else try to answer them); Challenging assumptions and commonly accepted wisdom around MongoDB performance; Encouraging everyone at MongoDB to think about performance, including adding new performance tests relevant to their ongoing work (e.g., adding new performance tests for new features or refactors); And explaining the current state of performance to others.
15:00 - 15:30	Research Talk	Simon Volpert, Sascha Winkelhofer, Stefan Wesner, Daniel Seybold, and Jörg Domaschka. Exemplary Determination of Cgroups-Based QoS Isolation for a Database Workload Abstract: An effective isolation among workloads within a shared and possibly contended compute environment is a crucial aspect for industry and academia alike to ensure optimal performance and resource utilization.Modern ecosystems offer a wide range of approaches and solutions to ensure isolation for a multitude of different compute resources. Past experiments have verified the effectiveness of this resource isolation with micro benchmarks. The effectiveness of QoS isolation for intricate workloads beyond micro benchmarks however, remains an open question. This paper addresses this gap by introducing a specific example involving a database workload isolated using Cgroups from a disruptor contending for CPU resources. Despite the even distribution of CPU isolation limits among the workloads, our findings reveal a significant impact of the disruptor on the QoS of the database workload. To illustrate this, we present a methodology for quantifying this isolation, accompanied by an implementation incorporating essential instrumentation through eBPF. This not only highlights the practical challenges in achieving robust QoS isolation but also emphasizes the need for additional instrumentation and realistic scenarios to comprehensively evaluate and address these challenges.
15:30 - 16:00	Coffee Break
Session 4: Evolving Performance Testing: Insights and Technical Challenges
16:00 - 16:30	Industry Talk	Alexander Podelko (Amazon) Performance Testing Transformation Abstract: Performance testing is transforming to adjust to industry trends. While we may not see it clearly in more traditional companies, we see drastic changes in organization using the latest approaches to software development and IT. Integrating into agile development (shift-left / continuous performance testing) is needed when performance risks should be mitigated. Automation and Continuous Integration (CI) become necessary as we get to multiple iterations and shrinking times to verify performance. Integration with other methods of performance risk mitigation (including performance and capacity management – shift right) is important to build performance testing into DevOps. However, all these adjustments make performance testing more integrated with everything else, so the transformation may happen in many different ways and is defined by context. What, how, and when we need to test and how it is built into larger processes differ drastically. There are numerous challenges – and there are different ways to address them. In particular, the following challenges appears to be typical and there are examples of different approaches to them depending on context: •Integration •Coverage optimization •Variability / noise reduction •Change point detection •Advanced analysis •Operations / Maintenance
16:30 - 17:00	Research Talk	David Georg Reichelt, Lubomir Bulej, Reiner Jung, and André van Hoorn. Overhead Comparison of Instrumentation Frameworks. Abstract: Application Performance Monitoring (APM) tools are used in the industry to gain insights, identify bottlenecks, and alert to issues related to software performance. The available APM tools generally differ in terms of functionality and licensing, but also in monitoring overhead, which should be minimized due to use in production deployments. One notable source of monitoring overhead is the instrumentation technology, which adds code to the system under test to obtain monitoring data. Because there are many ways how to instrument applications, we study the overhead of five different instrumentation technologies (AspectJ, ByteBuddy, DiSL, Javassist, and pure source code instrumentation) in the context of the Kieker open-source monitoring framework, using the MooBench benchmark as the system under test. Our experiments reveal that ByteBuddy, DiSL, Javassist, and source instrumentation achieve low monitoring overhead, and are therefore most suitable for achieving generally low overhead in the monitoring of production systems. However, the lowest overhead may be achieved by different technologies, depending on the configuration and the execution environment (e.g., the JVM implementation or the processor architecture). The overhead may also change due to modifications of the instrumentation technology. Consequently, if having the lowest possible overhead is crucial, it is best to analyze the overhead in concrete scenarios, with specific fractions of monitored methods and in the execution environment that accurately reflects the deployment environment. To this end, our extensions of the Kieker framework and the MooBench benchmark enable repeated assessment of monitoring overhead in different scenarios.

Call for Papers

Software systems (e.g., smartphone apps, desktop applications, telecommunication infrastructures and enterprise systems, etc.) have strict requirements on software performance. Failing to meet these requirements may cause business losses, customer defection, brand damage, and other serious consequences. In addition to conventional functional testing, the performance of these systems must be verified through load testing or benchmarking to ensure quality service.

Load testing examines the behavior of a system by simulating hundreds or thousands of users performing tasks at the same time. Benchmarking compares the system's performance against other similar systems in the domain. The workshop is not limited by traditional load testing; it is open to any ideas of re-inventing and extending load testing, as well as any other way to ensure systems performance and resilience under load, including any kind of performance testing, resilience / reliability / high availability / stability testing, operational profile testing, stress testing, A/B and canary testing, volume testing, and chaos engineering.

Load testing and benchmarking software systems are difficult tasks that require a deep understanding of the system under test and customer behavior. Practitioners face many challenges such as tooling (choosing and implementing the testing tools), environments (software and hardware setup), and time (limited time to design, test, and analyze). Yet, little research is done in the software engineering domain concerning this topic.

Adjusting load testing to recent industry trends, such as cloud computing, agile / iterative development, continuous integration / delivery, micro-services, serverless computing, AI/ML services, and containers poses major challenges, which are not fully addressed yet.

This one-day workshop brings together software testing and software performance researchers, practitioners, and tool developers to discuss the challenges and opportunities of conducting research on load testing and benchmarking software systems. Our ultimate goal is to grow an active community around this important and practical research topic.

We solicit two tracks of submissions:

Research or industry papers:

Short papers (maximum 4 pages)
Full papers (maximum 8 pages)

Presentation track for industry or research talks:

Extended abstract (maximum 700 words)

Research/Industry papers should follow the standard ACM SIG proceedings format and need to be submitted electronically via EasyChair (LTB 2024 track). Extended abstracts for the presentation track need to be submitted as "abstract only'' submissions via EasyChair as well. Accepted papers will be published in the ICPE 2024 Companion Proceedings. Submissions can be research papers, position papers, case studies, or experience reports addressing issues including but not limited to the following:

Efficient and cost-effective test executions
Rapid and scalable analysis of the measurement results
Case studies and experience reports on load testing and benchmarking
Leveraging cloud computing to conduct large-scale testing
Load testing and benchmarking on emerging systems (e.g., adaptive/autonomic systems, AI, big data systems, and cloud services)
Continuous performance testing
Load testing and benchmarking in the context of agile software development process
Using performance models to support load testing and benchmarking
Building and maintaining load testing and benchmarking as a service
Efficient test data management for load testing and benchmarking
Context-driven performance testing
Performance / load testing as an integral part of the performance engineering process
Load testing serverless computing platforms and the unique challenges caused by granular and short-lived containers
Challenges, methods and tools associated with load testing complex AI applications, including those that use generative AI and large language models (LLMs)
Load testing and benchmarking for energy consumption of software systems (embedded and IoT systems, cloud computing, green computing).

Instructions for Authors from ACM

By submitting your article to an ACM Publication, you are hereby acknowledging that you and your co-authors are subject to all ACM Publications Policies, including ACM's new Publications Policy on Research Involving Human Participants and Subjects. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy.

Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. ACM has been involved in ORCID from the start and we have recently made a commitment to collect ORCID IDs from all of our published authors. The collection process has started and will roll out as a requirement throughout 2022. We are committed to improve author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts. Please note that double blind reviews will not be enforced.

Important Dates

Paper Track (research and industry papers):

Abstract submission (optional):	~~January 19, 2024 AOE;~~January 22, 2024, AOE;
Paper submission:	~~January 26, 2024 AOE;~~February 02, 2024, AOE;
Author notification:	February 23, 2024;
Camera-ready version:	March 8, 2024

Presentation Track:

Extended abstract submission:	February 2, 2024, AOE;
Author notification:	February 23, 2024, AOE;
Workshop date:	May 7, 2024

Organization:

Chairs:

Marios Fokaefs	York University, Canada
Filipe Oliveira	Redis, USA
Naser Ezzati-Jivan	Brock University, Canada

Program Committee:

TBD

The Twelfth International Workshop onLoad Testing and Benchmarking of Software Systems (LTB 2024)