AI Supercomputing Infrastructure Engineer

Updated: about 2 months ago
Location: Bristol, ENGLAND
Job Type: FullTime

The Bristol AI Supercomputing Centre runs the Isambard-AI National Artificial Intelligence Research Resource, and the Isambard3 Tier-2 Supercomputer. Isambard-AI is expected to be the most powerful supercomputer in the UK and amongst the most powerful in Europe.

The AI Supercomputing team owns the entire process of developing and operating the centre’s compute and software infrastructure, which includes:

  • The sourcing of hardware and system design.
  • The deployment of huge software-defined infrastructure using tools such as Kubernetes and terraform.
  • Building and operating platforms to enable researchers to conduct leading-edge research using the systems.
  • Optimising and refining software to ensure environmental and economic efficient use.  
  • As one of the largest Open AI Research Resources internationally, we are committed to catalysing an AI transformation in the research and development community.

    In this role, you will work as part of the AI Supercomputing Team to build and operate primarily the infrastructure and compute platforms that researchers use for their work. You do not need to be an ML/DL or computational research domain expert to deliver world-class infrastructure, but you do need to quickly obtain a deep technical understanding of new domains. You should enjoy being self-directed and identifying the most important problems to solve as the team matures with standardized tools and processes around stability, robust service delivery and scaling.    

    As a member of the AI Supercomputing Team, you will;  

    • Use tools such as bash, Terraform, Kubernetes and Python.
    • Design and operate large, highly available supercomputing services managed as software-defined infrastructures, and integrated as complete computational experiments using tools such as JupyterHub, Kubernetes and CSM (Cray System Management). 
    • You will experience designing and operating massive-scale GPU and combined CPU/GPU workloads across these services.  
    • You will design and debug platforms, and will work closely with researchers as you co-design solutions that will enable the development and operation of new algorithms and software to solve leading-edge research problems.  

    You will find this work exciting if you:  

    • Want to help build and maintain some of the largest, modern software-defined supercomputing systems.
    • Would enjoy working with world class domain and AI researchers as your primary workload .
    • Have built small to large clusters or dabbled in building your own physical or software-defined systems, and have motivation to scale up to something massive and nationally impactful .
    • Love building large distributed, highly available systems, and want to see them used for truly open national-scale research .

    For our AI Supercomputing Infrastructure Engineer role, you’ll need:

    • Domain expertise in 1 or more areas from SysOps, NetOps, DevOps, SecOps, MLOps or Research Software Engineering.
    • Degree (or equivalent practical experience) in computer science, computational or ML/AI research or in a natural science with a high degree of competence in computer science or computational research.
    • Good organisational skills to manage not just your own workload but also that of less experienced members of the team.

    The available job description provides a full view of the person specification.

    We're investing heavily into our technical and academic AI capability in 2024. For more information on Isambard AI, and to view other current AI related opportunities click here . 

    Contract type: Open Ended 

    Work pattern: Monday - Friday, 35 hours per week

    Grade: K

    Salary: The salary range for this role is £48,350 - £54,395, with the potential for an additional skills based supplement.

    School/Unit: The Faculty of Science and Engineering

    This advert will close at 23.59pm on Sunday 10th March.

    For informal enquiries, please contact Simon McIntosh-Smith, Professor in High Performance Computing, School of Computer Science [email protected] .


    We recently launched our strategy  to 2030 tying together our mission, vision and values.


    The University of Bristol aims to be a place where everyone feels able to be themselves and do their best in an inclusive working environment where all colleagues can thrive and reach their full potential. We want to attract, develop, and retain individuals with different experiences, backgrounds and perspectives – particularly people of colour, LGBT+ and disabled people - because diversity of people and ideas remains integral to our excellence as a global civic institution.


    Available documents

    The Bristol AI Supercomputing Centre runs the Isambard-AI National Artificial Intelligence Research Resource, and the Isambard3 Tier-2 Supercomputer. Isambard-AI is expected to be the most powerful supercomputer in the UK and amongst the most powerful in Europe.

    The AI Supercomputing team owns the entire process of developing and operating the centre’s compute and software infrastructure, which includes:

  • The sourcing of hardware and system design.
  • The deployment of huge software-defined infrastructure using tools such as Kubernetes and terraform.
  • Building and operating platforms to enable researchers to conduct leading-edge research using the systems.
  • Optimising and refining software to ensure environmental and economic efficient use.  
  • As one of the largest Open AI Research Resources internationally, we are committed to catalysing an AI transformation in the research and development community.

    In this role, you will work as part of the AI Supercomputing Team to build and operate primarily the infrastructure and compute platforms that researchers use for their work. You do not need to be an ML/DL or computational research domain expert to deliver world-class infrastructure, but you do need to quickly obtain a deep technical understanding of new domains. You should enjoy being self-directed and identifying the most important problems to solve as the team matures with standardized tools and processes around stability, robust service delivery and scaling.    

    As a member of the AI Supercomputing Team, you will;  

    • Use tools such as bash, Terraform, Kubernetes and Python.
    • Design and operate large, highly available supercomputing services managed as software-defined infrastructures, and integrated as complete computational experiments using tools such as JupyterHub, Kubernetes and CSM (Cray System Management). 
    • You will experience designing and operating massive-scale GPU and combined CPU/GPU workloads across these services.  
    • You will design and debug platforms, and will work closely with researchers as you co-design solutions that will enable the development and operation of new algorithms and software to solve leading-edge research problems.  

    You will find this work exciting if you:  

    • Want to help build and maintain some of the largest, modern software-defined supercomputing systems.
    • Would enjoy working with world class domain and AI researchers as your primary workload .
    • Have built small to large clusters or dabbled in building your own physical or software-defined systems, and have motivation to scale up to something massive and nationally impactful .
    • Love building large distributed, highly available systems, and want to see them used for truly open national-scale research .

    For our AI Supercomputing Infrastructure Engineer role, you’ll need:

    • Domain expertise in 1 or more areas from SysOps, NetOps, DevOps, SecOps, MLOps or Research Software Engineering.
    • Degree (or equivalent practical experience) in computer science, computational or ML/AI research or in a natural science with a high degree of competence in computer science or computational research.
    • Good organisational skills to manage not just your own workload but also that of less experienced members of the team.

    The available job description provides a full view of the person specification.

    We're investing heavily into our technical and academic AI capability in 2024. For more information on Isambard AI, and to view other current AI related opportunities click here . 

    Contract type: Open Ended 

    Work pattern: Monday - Friday, 35 hours per week

    Grade: K

    Salary: The salary range for this role is £48,350 - £54,395, with the potential for an additional skills based supplement.

    School/Unit: The Faculty of Science and Engineering

    This advert will close at 23.59pm on Sunday 10th March.

    For informal enquiries, please contact Simon McIntosh-Smith, Professor in High Performance Computing, School of Computer Science [email protected] .


    We recently launched our strategy  to 2030 tying together our mission, vision and values.


    The University of Bristol aims to be a place where everyone feels able to be themselves and do their best in an inclusive working environment where all colleagues can thrive and reach their full potential. We want to attract, develop, and retain individuals with different experiences, backgrounds and perspectives – particularly people of colour, LGBT+ and disabled people - because diversity of people and ideas remains integral to our excellence as a global civic institution.


    Available documents

    Similar Positions