diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index 78217a9abc..5ee728b00e 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -21,7 +21,7 @@ jobs: check_filenames: true # MarkDown files in docs/available_software/detail are skipped because they are auto-generated skip: '*.pdf,.git,*.json,./docs/available_software/detail/*.md' - ignore_words_list: Fram,fram,ND,nd + ignore_words_list: Fram,fram,ND,nd,Linz # - name: Markdown Linting Action # uses: avto-dev/markdown-lint@v1.2.0 diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml index 042012333e..bcad5c6139 100644 --- a/docs/blog/.authors.yml +++ b/docs/blog/.authors.yml @@ -119,3 +119,8 @@ authors: description: European Molecular Biology Laboratory, Germany avatar: https://avatars.githubusercontent.com/u/44709261?v=4 slug: https://github.com/stefanomarangoni495 + admccartney: + name: Adam McCartney + description: Austrian Scientific Computing (ASC), TU Wien + avatar: https://avatars.githubusercontent.com/u/35410331?v=4 + slug: https://github.com/adammccartney diff --git a/docs/blog/posts/2026/03/MUSICA-v2-32-Matthias_Heisler.jpg b/docs/blog/posts/2026/03/MUSICA-v2-32-Matthias_Heisler.jpg new file mode 100644 index 0000000000..c089c86cd8 Binary files /dev/null and b/docs/blog/posts/2026/03/MUSICA-v2-32-Matthias_Heisler.jpg differ diff --git a/docs/blog/posts/2026/03/eessi-musica.md b/docs/blog/posts/2026/03/eessi-musica.md new file mode 100644 index 0000000000..78712e46ba --- /dev/null +++ b/docs/blog/posts/2026/03/eessi-musica.md @@ -0,0 +1,279 @@ +--- +authors: [admccartney] +date: 2026-03-27 +slug: eessi-musica +--- + +# Choosing EESSI as a base for MUSICA + +
+ ![MUSICA](MUSICA-v2-32-Matthias_Heisler.jpg){width=75%} +
(c) Matthias Heisler 2026
+
+ +MUSICA (Multi-Site Computer Austria) is the latest addition to Austria's +national supercomputing infrastructure. The system's compute resources +are distributed across three locations in Austria: Vienna, Innsbruck, +and Linz. We describe the process that led to the adoption of EESSI +as a base for the software stack on the MUSICA system at the Austrian +Scientific Computing (ASC) research center. + + + +The background section aims to provide a brief history of how cluster +computing at ASC has evolved, with a particular focus on the various +incarnations of the software stack. We outline our motivations for +redesigning a system that delivers the software stack, for initial +use on the MUSICA HPC system. We describe the timeline of events that +lead to the experiments with EESSI and EasyBuild, and offer details of +the two complementary approaches of building a software stack that we +compared. Finally, we offer a critical reflection on our experiments +and outline our ultimate reason for choosing to use EESSI as a base and +blueprint for the software stack. + + +## Background + +The ASC (formerly VSC) is a national center for high performance +computing and research powered by scientific software. The flagship +cluster VSC-1 was in service from 2009-2015, succeeded by a series of +clusters (2-5)[^1]. VSC 4 and 5 are the two clusters that remain in +service as of 2026, they will be joined in April 2026 by a new cluster +MUSICA, which stands for Multi-Site Compute Austria. MUSICA is a GPU +centric cluster run on OpenStack and has so far been the main testing +ground for our initial experiments with EasyBuild and EESSI. + +The management of the software stack at ASC evolved along the following +lines: + ++ VSC 1, 2: Initially catered to small groups of expert users, all + software was installed manually + ++ VSC 3, 4: Still partially managed by hand. A set of scripting tools for + structuring software directory trees. These tools were initially copied + from Innsbruck and adapted to work on the VSC. Use of Tcl modules was also + adopted at this time. + ++ VSC 4, 5: Spack introduced (reduced the need for custom install + scripts, install lots of software quickly, pull in dependencies + automatically) + +## Motivation + +Internal discussions led to a comprehensive understanding of where +the current software stack was lacking and where it would ideally be. +During the discussions, members of the user support team were able +to clearly articulate the various use cases generated by users. This +lead to setting a number of high level goals that were used to derive +requirements. At a very high level, some of the more important goals can +be summarized as: + + - Improved reproducibility and redeployment. + - Establishment of clear release cycles. + - Creation of a more organized and user-friendly representations for the + cluster users. + +We articulated what an ideal software stack should look like, and we +identified a number of issues with the way the software stack was +currently managed. + +### Tooling & Presentation + +The way that we had been using Spack and Tcl Modules had lead to a +fairly unmanageable situation on our clusters. To meet user requests +for software, we adopted a pragmatic approach. This lead to a situation +in which a myriad of software variants were installed into the shared +file system hosting the systems' software. This quickly lead to a +fairly overwhelming presentation of available modules to the user. +Another major issue here was that there were significant issues around +de duplication. We don't know the root cause of this, it may just have +been a misconfigured Spack. In any case, we ended up in an untenable +situation where certain dependencies would get installed many times +over. For example, there were multiple installs of the same OpenMPI +version on the system, all built slightly differently and most untested +on the systems. This meant that there was no way to indicate to the user +which version of a particular software was the one that worked. + +### Build procedure hard to reproduce + +During the last operating system upgrade, the need for a more automated +build process was painfully felt. Because most software was built ad-hoc +in response to user request, sometimes the only record of the build procedure +were the build artefacts themselves. This meant manually going over a very large +software repository and rebuilding everything more or less by hand for the new +operating system. + +### Poor bus factor + +This one refers to the well known metric from software engineering about +the degree of shared knowledge on a specialized domain within the team. +How many people would have to be hit by a bus before the team could +cease to carry out its work? In this particular case, the knowledge about the +software stack was concentrated in one or two individuals. + +## Searching + +As outlined above, the numerous issues with the current stack +established the frame in which to search for a set of tools and methods +to ease the realisation of the high level goals for the software +stack. To reiterate, manageability and user-friendliness were top of the list. + + +### Timeline + +We formed the The Software And Modules (SAM) working group in Q4 2024. +SAM consists of 5 people that are dedicating the majority of their +time to exploring possible alternative ways of building, managing and +presenting the software stack to users. The members draw on expertise +from different areas, notably from their work on the user-support, +sysadmin and platform teams. The goal for the new software stack was to +have it up and running on the new MUSICA system towards the end of 2025. + ++ *Summer 2024*: + Initial meetings that highlighted the need to reform the management + of software so that it could be easy to use, transparent and logical, + as well as tested and performant. This is the first mention of + EESSI/EasyBuild as possible alternatives to Spack and Lmod as an alternative + to Tcl Modules. + ++ *Autumn 2024*: + Working group established and a broad set of tools and approaches were + compared, namely: + + an installation of Spack with Environment modules + + an installation of Guix + + an installation of EESSI + These tools were evaluated against a set of high level user + requirements that we agreed. The outcome was to focus on Easybuild and + EESSI. + ++ *Winter 2024 - Spring 2025*: + Made the strategic decision to have EESSI installed on the MUSICA + system. Decided to run a small experiment whereby a small software + stack would be built and installed, in order to compare and contrast + approaches - "EESSI on the side" vs. "EESSI as a base" + ++ *Summer 2025*: + In June 2025, the system entered a closed test phase. In this phase + the system was open to a small number of power users. The core + software provided by EESSI. The custom stack is extended during this + phase, in response to user software requests that center mostly around + proprietary software. + ++ *Autumn 2025 - Winter 2025/2026*: + In November 2025 the MUSICA open test phase began. At this stage + anyone with an existing account at ASC was granted access to the + system upon request. At the end of the open test phase, users + participated in a survey. Generally the response was quite positive + towards the setup of the system. + + - Users categorized their usage according to scientific domain, the largest + groups were: + Physics (45), AI (41), Chemistry (24), Data Science (15), Bioinformatics (11) + + - In response to a question as to whether the module system was used, or if + the user relied on individual installations: 32 used the module system; 24 + preferred an individual installation; 43 used a mixture of both. + + - What did users used to build, install or run their software? Of 99 + respondents: + + 63 Conda/Pip + + 21 EESSI-extend + + 16 None of these + + 15 Containers + + 13 buildenv + + 5 Spack + + - 5/77 comments on the experience of compiling software on the + system explicitly mention using `LD_LIBRARY_PATH`. Despite having + highlighting the recommendation to use the `buildenv` modules when + compiling, the users preferred their own approach. + Generally the `buildenv` modules and usage of rpath wrappers is not + that well understood on the SAM team, so it's hard to explain to + users *why* the should be using this approach. + + +## Experiments + +### Test stack + +The following programs were agreed upon as a way to come in to contact +with specific workflows, such as writing easyconfig files, writing +custom easybuild hooks, installing commercial software, installing gpu +specific application software. + ++ AOCC 5.0.0 ++ Intel Compilers ++ Vasp 6.5.0 ++ 1 Commercial software (starccm, mathematica) ++ NVHPC ++ VASP 6.5.0 GPU ++ Containers (singularity, docker, nvidia) + +### EESSI on the side + +This approach in a sense represents the traditional way to build a +software stack, building everything directly on the host (Rocky9), and +relying on system libraries. It used scripts and wrappers from the sse2 +toolkit from National Supercomputer Centre at Linköping University as +a way to manage and structure the modules and software installations. +The software builds were a mixture of EasyBuild scripts and makefiles. +EESSI was offered as a module in its pure form and in general users were +discouraged from using EESSI-extend, or at their own risk. + +### EESSI as a base + +With this approach, we leveraged EESSI-extend extensively and aimed to +build the whole stack with the compatibility layer from EESSI as a base. +The learning curve for building software more or less moved back and +forth between three distinct phases, leveraging the various possible +settings for the EESSI-extend module. + ++ Phase 0 -> EESSI_USER_INSTALL ++ Phase 1 -> EESSI_SITE_INSTALL ++ Phase 2 -> EESSI_PROJECT_INSTALL=/cvmfs/software.asc.ac.at + + +## Reflections + +### EESSI on the side + +By comparison, it was much quicker and easier to build all the software +in list using this approach. It also offers a lot of control to the +sysadmin who builds the software and doing things like tweaking or +modifying module files in place was possible. The downsides were +reproducibility and portability, there would be obvious work involved +with building the stack again upon the next OS upgrade. That said, +everything worked much more smoothly than with EESSI-extend, it was +possible to build all the software that was listed and run basic tests +with Slurm. We had some open questions around interoperability between +custom modules and EESSI, and whether it would be problematic to mix +modules from the two independent stacks without running into issues +(probably not due to different libc versions). + + +### EESSI as a base + +By the end of the closed test phase of MUSICA, the engineering team +chose EESSI as the foundation for the software stack. While this approach +introduced complexity into our build and installation workflows, it +enabled us to meet certain key requirements for the MUSICA software +infrastructure. + +Specifically, we leveraged CVMFS to distribute the software stack across +the three sites - Vienna, Linz, and Innsbruck. EESSI offers access +to approximately 1960 modules that are ready to load on the target +architecture. Setting up EESSI was quite straight forward, and despite +team members finding the many options of installing with EESSI-extend +module too complex, adopting this method aligned with modern practices +for managing HPC software. EESSI is open source, well documented, and +maintained by colleagues within Europe's HPC ecosystem. + +Engaging with EESSI's documentation, source code, and community proved +valuable. We identified a reusable blueprint that we could adapt to fit +our specific needs. Despite the initial learning curve, this approach +provided long-term benefits in terms of maintainability and scalability. + + +--- + +[^1]: