HPC Systems Manager (IT Systems Manager I)

Frederick National Laboratory for Cancer Research
Hood College, MD
Closing date
Sep 26, 2023

View more

Science, Computer Science and IT
Organization Type
HPC Systems Manager (IT Systems Manager I)

Job ID: req3558
Employee Type: exempt full-time
Division: Enterprise Information Technology
Facility: Frederick: Ft Detrick
Location: PO Box B, Frederick, MD 21702 USA

The Frederick National Laboratory is a Federally Funded Research and Development Center (FFRDC) sponsored by the National Cancer Institute (NCI) and operated by Leidos Biomedical Research, Inc. The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human clinical trials, applications of nanotechnology in medicine, and rapid response to emerging threats of infectious diseases.

Accountability, Compassion, Collaboration, Dedication, Integrity and Versatility; it's the FNL way.

Position Overview:


Within the Enterprise Information Technology (EIT) group our mission is to develop an enterprise-level, consolidated information technology infrastructure that provides exceptional IT capabilities to the Frederick National Laboratory for Cancer Research (NCI-Frederick/FNLCR) in support of basic, translational, and clinical cancer and AIDS research.

The Frederick National Laboratory's EIT group is seeking an experienced High-Performance Computing (HPC) Manager/Engineer to lead our talented team, enhance our HPC cluster, and optimize our community workflows and customer outreach. As a part of our team, you'll work with people ready to help you reach higher, grow your potential, and do more. We value every experience, perspective, and skill. Working at FNLCR comes down to engaging, empowering, and inspiring great minds around the world and taking every opportunity to make our work and you better.

KEY ROLES/RESPONSIBILITIESThe EIT HPC Systems Manager will lead and be an integral member of the Frederick National Lab's Research Compute team that supports a variety of compute needs, from high-bandwidth CryoEM to genomic sequencing. Your team will collaborate with diverse scientific community members to solve multidimensional information technology problems, improve customer experience, and solve complex problems across a broad base of scientific disciplines.

This is an excellent opportunity for you to apply your technical HPC/Storage/Networking experience to solve our customer's problems today and set the roadmap of our cluster's future. Your team will partner with EIT's storage, networking and science leadership to optimize high-bandwidth workflows for optimal transfer, compute, analysis and sharing.
  • Work with scientific researchers to architect, implement, and deploy: HPC clusters, high-capacity, high-bandwidth storage, and scientific software applications necessary to support scientific research
  • Manage and grow a small and technically strong team of HPC engineers who develop, build, and deploy HPC systems that are part of our product
  • Partner with enterprise storage and networking teams to optimize workflows and workloads needed by scientific labs with large data generators
  • Model, characterize, and tune the performance of HPC systems to achieve the most efficient and cost-effective solution
  • Manage the HPC capacity plan, develop deployment schedules, and identify critical science deliverables
  • Identify and manage risks for the HPC systems and develop mitigation plan
  • Perform without considerable direction and mentor and supervise employees if needed


To be considered for this position, you must minimally meet the knowledge, skills, and abilities listed below:
  • Possession of Bachelor's degree from an accredited college/university according to the Council for Higher Education Accreditation (CHEA) or four (4) years relevant experience in lieu of degree. Foreign degrees must be evaluated for U.S. equivalency
  • In addition to the education requirement, a minimum of four (4) years of progressively responsible experience, including two (2) years of experience in a manager capacity
  • Experience in managing Linux and Windows systems in a high-throughput, data intensive environment
  • Experience as a technical lead and/or managing a technical team
  • Solid knowledge of HPC systems, storage, high-speed interconnect, and GPU architecture
  • Experience with batch control software such as SLURM
  • Strong understanding of Linux internals
  • Broad experience with high performance storage systems, NFS, SMB, POSIX
  • Familiarity with system performance analysis, monitoring, and tuning
  • Ability to obtain and maintain a clearance


Candidates with these desired skills will be given preferential consideration:
  • Five (5) years of experience in managing Linux and Windows systems in a high-throughput, data intensive environment, including three (3)+ years as a technical lead and/or managing a technical team
  • Experience with programming in a variety of languages, both traditional and nontraditional
  • SLURM, GPU, HPC Architecture, Linux
  • Experience with container technologies and associated infrastructure
  • Experience with Cloud and hybrid models
  • Knowledge of emerging computing technologies
  • Knowledge of various microarchitectures and developing firmware
  • Ability to rapidly evaluate scientific research on new and emerging technologies
  • Possession of excellent client-facing or consulting skills
  • Excellent written and verbal communication skills

Equal Opportunity Employer (EOE) | Minority/Female/Disabled/Veteran (M/F/D/V) | Drug Free Workplace (DFW)


Get job alerts

Create a job alert and receive personalized job recommendations straight to your inbox.

Create alert