HPC Engineer

Updated: 24 days ago
Location: Los Angeles, CALIFORNIA

The University of Southern California’s (USC’s) Information Technology Services is seeking a talented High-Performance Computing (HPC) Engineer with an exceptional commitment to service excellence to join its team. As the HPC Engineer, you will be an integral member of the Advanced Research Computing team, collaborating with diverse and talented team members to help solve multidimensional information technology problems, improve customer experience, and generate value for our campus stakeholders across a broad base of departments and constituencies.

THE TEAM

ITS has embarked on a major digital transformation initiative to continually improve services for faculty, staff, and students in support of USC’s ascent as a leading institution of higher education. The ITS vision aligns strategy, business, and services; affirms ITS cultural values; empowers cross-functional teamwork; embraces world-class best practices; and promotes innovation, excellence, agility, and efficiency. To achieve this vision, ITS is committed to providing a modern technology infrastructure that is resilient and delivers the performance necessary to meet the demands of a growing customer base, training in the latest technologies for its highly productive and motivated workforce, outstanding customer experience, and technology services that are aligned with the university’s mission to provide exceptional learning opportunities for students. ITS is creating a workplace where employees can develop cutting-edge skills, take pride in the services they provide, and have access to the roles and career paths that align to their abilities and potential. We are looking for top talent to join us on our journey.

ITS CULTURE

USC’s ITS organization represents a diverse and talented team, committed to supporting a collaborative culture and delivering secure and innovative IT services, core to the mission of USC. ITS values accountability, excellence, and commitment to exceptional customer experience. ITS strives for a supportive and inclusive culture that encourages employees to do their best work every day and where individuals are recognized and celebrated for their contributions.

ABOUT USC

USC is the leading private research university in Los Angeles—a global center for arts, technology, and international business. With more than 47,500 students, we are located primarily in Los Angeles but also in various US and global satellite locations. As the largest private employer in Los Angeles, responsible for $8 billion annually in economic activity in the region, we offer the opportunity to work in a dynamic and diverse environment, in careers that span a broad spectrum of talents and skills across a variety of academic and professional schools and administrative units. As a USC employee and member of the Trojan Family—the faculty, staff, students, and alumni who make USC a great place to work—you will enjoy excellent benefits, including a variety of well-being programs designed to help individuals achieve work-life balance.

Come join the ITS team and work as a trusted partner in shaping an environment of innovation and excellence for the university.

MINIMUM QUALIFICATIONS

The candidate for the position of HPC Engineer must meet the following qualifications:

  • Bachelor’s degree in a relevant field such as computer science, computer information systems, etc., or equivalent combination of education, training, and experience.
  • Two years of experience in one of the following fields: information technology, system administration, or high-performance computing.
  • Familiarity with low-latency/high-bandwidth, interconnected infrastructure (including Infiniband, 10/100GigE, and others).
  • Expertise with HPC system software cluster management tools, job schedulers, and other HPC tools including Slurm, Ansible, and more.
  • Proficiency with fundamental programming skills (Bash, Python, C/C++ or similar languages). Expertise with administration, monitoring, and maintaining secure Linux/Unix operating systems (CentOS).
  • Knowledge of HPC storage (FC, SAS) principles, file systems (NFS, Lustre, BeegFS, ZFS, etc.), and compute node storage.
  • Familiarity with shared and distributed memory parallelism (OpenMP, MPI), and accelerators (GPUs).
  • Excellent written and oral communication skills, and the ability to establish strong, positive working relationships and rapport with diverse groups of team members. Ability to drive technical leadership and management of complex, large-scale computing system projects.
  • Proficiency with multi-vendor management, security and network/Internet protocols.
  • Demonstrated expertise in design configuration and planning, with excellent organization skills, and the ability to identify and resolve problems and manage performance.
  • Excellent written and oral communication skills, with experience presenting technical topics to nontechnical audiences.
  • Ability to establish processes for maintaining system performance and managing best-in-class standards.

PREFERRED QUALIFICATIONS

The ideal candidate for the position of HPC Engineer has the following qualifications:

  • Bachelor's degree in a relevant field, such as computer science, computer information systems, etc.
  • Four or more years of experience in one of the following fields: information technology, HPC system administration, network engineering, or large-scale HPC file systems.
  • Familiarity with cloud computing and container technologies.

THE WORK YOU WILL DO

The HPC Engineer works with other HPC Engineering Team members and collaborates with technical leadership in the design, development, installation, and maintenance of software for the High-Performance Computing (HPC) systems. The HPC Engineer is responsible for supporting the planning, implementation, availability, performance, security, maintenance, and repair of high-performance computing infrastructure. The HPC Engineer participates in multi-vendor management, security, and network/Internet protocols for the ITS organization. As a member of ITS, the HPC Engineer demonstrates ITS values in action.

Job Accountabilities

The HPC Engineer:

  • Supports day-to-day operations for the Advanced Research Computing team by monitoring computing resource performance, managing configurations, and addressing security administration. Applies revisions to system firmware and software. Engages and collaborates with vendors to assist with support activities as required.
  • Develops new HPC software deployment plans, custom scripts, and testing procedures to ensure operational reliability for USC researchers. Trains technical ITS staff in the use of new software and hardware, either developed or acquired.
  • Maintains and manages HPC researcher accounts and logins for staff and USC research groups. Installs, modifies, and maintains various research software applications for access on HPC clusters. Provides researcher support and documentation for software applications and programs.
  • Designs, installs, configures, and performs document management for cluster infrastructure, including operating systems, job schedulers, resource managers, provisioning managers, configuration managers, network devices, and other components.
  • Investigates, debugs, and addresses researcher inquiries and requests efficiently through a customer issue ticketing system. Communicates complex technical concepts in simple, straightforward language.
  • Explores emerging technologies and technical developments to address expanding analytical requirements. Identifies new services and develops implementation plans. Stays current with best practices in the HPC field. Maintains collaborative relationships with peer HPC research organizations.
  • Contributes to an inclusive environment that values differences by building and maintaining collaborative relationships with team members, peers, and ITS leaders. Actively embodies ITS values and behaviors including accountability, ethics, and best-in-class customer service. Contributes to a culture of trust and transparency by sharing information broadly, openly, and deliberately.
  • Supports the vision for Advanced Research Computing. Works closely with team members and management to implement and support effective solutions for ARC. Maintains currency with technology, standards, and best practices. Supports process improvement efforts within the team and across ITS.
  • Performs other related duties as assigned or requested. The university reserves the right to add or change duties at any time.

The annual base salary range for this position is $100,289.48 - $115,000. When extending an offer of employment, the University of Southern California considers factors such as (but not limited to) the scope and responsibilities of the position, the candidate’s work experience, education/training, key skills, internal peer equity, federal, state, and local laws, contractual stipulations, grant funding, as well as external market and organizational considerations. #LI-MM1


Minimum Education: Bachelor's degree, Combined experience/education as substitute for minimum education. Minimum Experience: 2 years Minimum Expertise: Familiarity with low-latency/high-bandwidth, interconnected infrastructure (including Infiniband, Myrinet, 10GigE, and others). Expertise with HPC system software cluster management tools, job schedulers, and other HPC tools including slurm, salt, xcat, and more. Proficiency with fundamental programming skills (Bash, PERL, Python, or similar languages). Expertise with administration, monitoring, and maintaining secure Linux/Unix operating systems (CentOS, Solaris). Knowledge of HPC storage (FC, SAS) principles, file systems (samfs/qfs, beegfs, zfs, etc.), and computer node storage (NFS). Familiarity with shared and distributed memory parallelism (OpenMP, MPI), and accelerators (GPUs). Excellent written and oral communication skills, and the ability to establish strong, positive working relationships and rapport with diverse groups of team members. Ability to drive technical leadership and management of complex, large-scale computing system projects. Proficiency with multi-vendor management, security and network/Internet protocols. Demonstrated expertise in design configuration and planning, with excellent organization skills, and the ability to identify and resolve problems and manage performance. Excellent written and oral communication skills, with experience presenting technical topics to non-technical audiences. Ability to establish processes for maintaining system performance and managing best-in-class standards.

Similar Positions