Site Reliability Engineer, Digital Transformation

Updated: about 1 month ago

22-Mar-2024

Harvard Business School

65374BR


Position Description

Be a pioneer in business, education, and global impact by joining the Harvard Business School HBS) Digital Transformation team - a “startup with assets,” where you will have the chance to deploy digital- and emerging-technology education solutions. Where else can you make a difference at the intersection of cutting-edge technology, world-class education, noble purpose, and timeless legacy?

We are building educational and research solutions powered by Generative AI (GenAI) that scale across hundreds of courses and to hundreds of thousands of users. Our products assist educators and students alike with intelligent, adaptive capabilities that make education more accessible, engaging, and effective.

As a Site Reliability Engineer at Harvard Business School (HBS), you will play a crucial role in ensuring the high availability, performance, security, and scalability of our cloud-based solutions and services. You will work closely with our development and operations teams to build and maintain robust, efficient, and reliable systems on the AWS platform. You will work at the intersection of software engineering and systems engineering to build and run large-scale, fault-tolerant systems that balance speed of deployment with stability and operating at peak efficiency while also managing costs.

  • Design, implement, and maintain scalable, reliable, and efficient systems on the AWS platform.
  • Automate the deployment, scaling, and management of applications using AWS services such as EC2, S3, RDS, Lambda, CloudFormation, etc.
  • Monitor system performance, troubleshoot issues, and implement solutions to ensure optimal operation and uptime.
  • Implement solutions that enable running multiple GenAI workflows using shared infrastructure, while ensuring high throughput, low latency, and speed of deployment.
  • Provide a platform for machine learning (and other exciting workloads) allowing developers to move quickly and experiment.
  • Collaborate with development teams to optimize applications for the cloud and implement best practices for cloud-native development.
  • Implement and manage continuous integration and deployment (CI/CD) pipelines.
  • Develop and maintain disaster recovery plans and conduct regular system backups.
  • Ensure security compliance and best practices throughout the AWS infrastructure.
  • Document system configurations, processes, and procedures.
  • Develop runbooks and recipes for on-call support as part of a rotation schedule to resolve critical issues outside of regular business hours.
  • Adhere to standard methodologies in architectural design, testing (unit, integration, visual, and regression), and scrum methodology.
  • Evaluate developer platform designs, technical decisions, and code to ensure all are high quality, efficient, and well documented.
  • Develop and lead all aspects of Container Orchestration Platform, a diverse ecosystem of multiple applications.
  • Complete other responsibilities as assigned.

Basic Qualifications

  • Minimum of five years’ post-secondary education or relevant work experience

Additional Qualifications and Skills

  • Bachelor’s degree in computer science or a related technical field, or equivalent combination of education and experience is required.
  • Experience developing and operating mission-critical systems as a Site Reliability Engineer, Sr. DevOps Engineer, or related role.
Additional/Desired Qualifications:
  • Excellent understanding of Linux configuration and administration.
  • Strong experience with Python.
  • 4+ years of experience in software engineering, with a proven understanding of containerization and Infrastructure as Code.
  • Experience with automation tools (Terraform, Ansible, Puppet) and CI/CD pipelines.
  • Familiarity with monitoring and observability tools (Prometheus, Splunk, Grafana, ELK stack).
  • Familiarity with production-level Generative AI workflows, such as retrieval augmented generation, model deployment, versioning, evaluation pipelines, etc.
  • Strong understanding of network protocols and security.
  • Extensive knowledge and hands-on experience in AWS Cloud infrastructure and Services, including CI/CD and IaC provisioning tools such as Jenkins, ArgoCD, Scalr, Terraform and Github Actions.
  • Experience in AWS and familiarity with running containerized services.
  • Knowledge of best practices in observability and monitoring for Docker or Kubernetes clusters at scale with experience in cost optimization tools.

Additional Information

This role is offered as a hybrid (some combination of onsite and remote) where you are required to be onsite at our Boston, MA based campus a determined number of days per month. Specific days and schedule will be determined between you and your manager.

We may conduct candidate interviews virtually (phone and/or via Zoom) and/or in-person for this role.

A cover letter is required to be considered for this opportunity.

Harvard Business School will not offer visa sponsorship for this opportunity.

Culture of Inclusion: The work and well-being of HBS is profoundly strengthened by the diversity of our network and our differences in background, culture, national origin, religion, sexual orientation, and life experiences. Explore more about HBS work culture here https://www.hbs.edu/employment .


About Us

Founded in 1908 as part of Harvard University, Harvard Business School (www.hbs.edu ) is located on a 40-acre campus in Boston. The School offers two full-time MBA and PhD programs, more than 175 Executive Education programs, and certificates and courses through Harvard Business School Online. For more than a century, Harvard Business School faculty have drawn on their research, connection to practice, global expertise, and passion for teaching to educate leaders who make a difference in the world. The School and its curriculum attract the boldest thinkers and the most collaborative learners who will shape the practice of business and entrepreneurship around the globe.


Benefits

We invite you to visit Harvard's Total Rewards website (https://hr.harvard.edu/totalrewards ) to learn more about our outstanding benefits package, which may include:

  • Paid Time Off: 3-4 weeks of accrued vacation time per year (3 weeks for support staff and 4 weeks for administrative/professional staff), 12 accrued sick days per year, 12.5 holidays plus a Winter Recess in December/January, 3 personal days per year (prorated based on date of hire), and up to 12 weeks of paid leave for new parents who are primary care givers.
  • Health and Welfare: Comprehensive medical, dental, and vision benefits, disability and life insurance programs, along with voluntary benefits. Most coverage begins as of your start date.
  • Work/Life and Wellness: Child and elder/adult care resources including on campus childcare centers, Employee Assistance Program, and wellness programs related to stress management, nutrition, meditation, and more.
  • Retirement: University-funded retirement plan with contributions from 5% to 15% of eligible compensation, based on age and earnings with full vesting after 3 years of service.
  • Tuition Assistance Program: Competitive program including $40 per class at the Harvard Extension School and reduced tuition through other participating Harvard graduate schools.
  • Tuition Reimbursement: Program that provides 75% to 90% reimbursement up to $5,250 per calendar year for eligible courses taken at other accredited institutions.
  • Professional Development: Programs and classes at little or no cost, including through the Harvard Center for Workplace Development and LinkedIn Learning.
  • Commuting and Transportation: Various commuter options handled through the Parking Office, including discounted parking, half-priced public transportation passes and pre-tax transit passes, biking benefits, and more.
  • Harvard Facilities Access, Discounts and Perks: Access to Harvard athletic and fitness facilities, libraries, campus events, credit union, and more, as well as discounts to various types of services (legal, financial, etc.) and cultural and leisure activities throughout metro-Boston.

Job Function

Information Technology


Department Office Location

USA - MA - Boston


Job Code

I0758P Applications Professional IV


Work Format

Hybrid (partially on-site, partially remote)


Department

Digital Transformation


Union

00 - Non Union, Exempt or Temporary


Pre-Employment Screening

Education, Identity


Commitment to Equity, Diversity, Inclusion, and Belonging

Harvard University views equity, diversity, inclusion, and belonging as the pathway to achieving inclusive excellence and fostering a campus culture where everyone can thrive. We strive to create a community that draws upon the widest possible pool of talent to unify excellence and diversity while fully embracing individuals from varied backgrounds, cultures, races, identities, life experiences, perspectives, beliefs, and values.


EEO Statement

We are an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability status, protected veteran status, gender identity, sexual orientation, pregnancy and pregnancy-related conditions, or any other characteristic protected by law.


LinkedIn Recruiter Tag (for internal use only)

#LI-KR1



Similar Positions