High Performance Computing (HPC) Linux Systems Administrator
- Employer
- Santa Clara University
- Location
- Santa Clara, California, United States
- Salary
- Competitive Salary
- Date posted
- Sep 3, 2024
View more categoriesView less categories
- Position Type
- Faculty Positions, Computer Sciences & Technology, Administrative, IT & Technology, Technology Administration/Other
- Employment Type
- Full Time
Job Details
Position Type:Regular
Hiring Range:
$112,100 - 131,800 / annual; Compensation will be based on education, experience, skills relevant to the role and internal equity.
Pay Frequency:Annual Company Overview:
Santa Clara University is a prestigious academic institution dedicated to advancing research, innovation, and education. We are seeking a skilled and experienced High Performance Computing (HPC) Systems Administrator to join our dynamic team and contribute to the optimization and management of our HPC infrastructure, supporting groundbreaking research across various disciplines.
Job Description:
As a High Performance Computing (HPC) Systems Administrator, as part of the team supporting the SCU High Performance Computing environment, you will play a pivotal role in the administration, maintenance, and optimization of our HPC systems within the academic environment. Working closely with researchers, faculty, and IT professionals, you will ensure the smooth operation of our computational infrastructure, enabling cutting-edge research and academic excellence. You will work closely with the SCU architectural leaders and systems administrators to provide systems automation, DevOps, and user support for research computing. You will collaborate with SCU researchers and scientists to advance cutting-edge research projects by enabling and optimizing their application pipelines for AI, Data Science. and graphics GPU processing. You will maintain and expand the existing HPC cluster and parallel storage systems as necessary.
The ideal candidate for this position is curious, creative, tenacious, and self-directed, and demonstrates a strong work ethic; is productive working independently as well as collaboratively; is analytical and can identify, define, interpret, and resolve both technical and human issues.
This position requires on-site support on a regular basis. On-campus vs. remote schedule will be hybrid on an as-needed basis depending on current tasks.
Key Responsibilities:
- Install, configure, and maintain HPC hardware and software
components, networking architectures, including InfiniBand fabric,
and parallel file systems.
- Oversee and monitor system performance, troubleshoot issues,
and optimize system configurations to ensure maximum efficiency and
reliability.
- Responsible for HPC facility, and incorporating industry best
practice into facility
- Provide advocacy and outreach across the university; train and
teach researchers and teams as needed.
- Develop and implement security measures to protect HPC systems
and data from unauthorized access and cyber threats.
- Contributes to the development of the HPC center's strategic
vision, and uses this vision to create a common focus.
- Manage user accounts, access permissions, and job scheduling in
accordance with university policies and best practices.
- Plan and execute system upgrades, patches, and maintenance
activities to minimize downtime and ensure system stability.
- Manage and document system configurations, procedures, and
troubleshooting guidelines to facilitate knowledge sharing and
maintain system integrity.
- Stay current with emerging technologies and industry trends in
HPC to recommend and implement innovative solutions that enhance
system performance and capabilities.
- Regularly consulted by faculty and staff on their complex
computational requirements.
- Monitors feedback from researchers to identify and address gaps
in services to constantly strive for quality and excellence.
- Collaborate with vendors and external partners to evaluate and
procure HPC hardware and software components as needed.
- Analyze user workflows to identify opportunities for
parallelism or efficiency improvements.
- Interact and collaborate with researchers and faculty to
understand their computational requirements and provide support and
guidance on utilizing HPC resources effectively.
- Responsible for the design and execution of innovative and
high-quality programs and services that meet the current and future
needs of SCU researchers
- Responsible for HPC facility, and incorporating industry best
practice into facility
- Provide advocacy and outreach across the university; train and
teach researchers and teams as needed.
- Provide expert computational and data analytic technical
assistance, including complex problem solving and programming
support across different departments
- Provide training and support to users on HPC system usage,
optimization techniques, data organization, storage, and sharing
best practices.
- Provide training in code-management best practices (such as
using Git, Github).
- Works with senior leadership to develop strategies and
implement tactics that will successfully ensure the fulfillment of
SCU's research-computing goals, and to enable and amplify the work
of SCU researchers across campus."
- Develops long-term, strategic relationships and partnerships
with providers of national resources (such as Access, Globus, NRP,
OSG), to assist researchers in finding, getting access to, and
optimizing their use; define and maintain gateways to those
resources.
- Facilitates the growth of corporate and foundation giving
- Support the development, execution and reporting on externally
supported research.
- Bachelor's degree in computer science, engineering, or a
related field; advanced degree required.
- 4 years of experience required
- Experience providing direct user support and customer service
with demonstrated success.
- Experience Installing, monitoring and optimizing the
performance of scientific applications in an HPC cluster.
- Five years of experience with systems automation scripting in
at least one of the following: bash, perl, python, puppet,
ansible.
- Demonstrated experience writing and editing complex scripts
used to perform system maintenance and administration.
- Linux systems administration experience including: automated OS
provisioning, software updates and package management, user
accounts management, filesystems and access management, compiling
software and kernel modules, versioning, environment modules.
- Hands-on experience with networking architectures, including
InfiniBand fabric.
- Experience with containers (e.g., Docker, Singularity).
- Ability to elicit and communicate technical and non-technical
information in a clear and concise manner.
- Self-motivated and works independently and as part of a team.
Demonstrates problem-solving skills. Able to learn effectively and
meet deadlines.
- Ability to write technical documentation in a clear and concise
manner.
- Understanding of system performance monitoring and actions that
can be taken to improve or correct performance.
- General knowledge of other areas of IT. Thorough understanding
of and experience with systems-related issues and actions that can
be taken to improve or correct performance.
- Strong analytical and problem-solving skills with a proactive
approach to identifying and resolving technical issues.
- Excellent communication and interpersonal skills with the
ability to collaborate effectively with researchers, faculty, and
IT professionals.
- Knowledge of cybersecurity principles and best practices for
securing HPC environments.
- Five years of experience in administering and supporting HPC
systems in an academic or research environment.
- Familiarity with open-source HPC technologies such as
OpenHPC.
- Experience with configuring, deploying and managing batch
queueing systems for HPC clusters such as SGE, LSF, or Slurm.
- Experience with distributed file systems for HPC clusters (such
as BeeGFS, Lustre).
- Experience with installation and integration of tools and
software (such as compilers, scientific applications) in a shared
cluster environment (e.g., modules).
- Proficiency with using source code version control systems for
continuous integration and testing methods (e.g., git, svn).
- Experience with MySQL/MariaDB: installation, data extracts and
loads.
- Experience with developing systems monitoring dashboards (e.g.
Grafana, Prometheus, Tableau) and using monitoring tools (e.g.
Nagios, Ganglia).
- MS or PhD with adequate understanding of the challenges
associated with the data analytics needed to answer scientific
questions and also the capabilities and limitations of an HPC
cluster and distributed file systems.
Equal Opportunity/Notice of Nondiscrimination
Santa Clara University is an equal opportunity/equal access/affirmative action employer fully committed to achieving a diverse workforce and complies with all Federal and California State laws, regulations, and executive orders regarding non-discrimination and affirmative action. Applications from members of historically underrepresented groups are especially encouraged. For a complete copy of Santa Clara University’s equal opportunity and nondiscrimination policies, see https://www.scu.edu/title-ix/policies-reports/
Telecommute
Santa Clara University is registered to do business in the following states: California, Nevada, Oregon, Washington, Arizona, and Illinois. Employees approved to telecommute are required to perform their work within one of these states.
Title IX of the Education Amendments of 1972
Santa Clara University does not discriminate in its employment practices or in its educational programs or activities on the basis of sex/gender, and prohibits retaliation against any person opposing discrimination or participating in any discrimination investigation or complaint process internally or externally. Information about Title IX can be found at www.scu.edu/title-ix . Information about Section 504 and the ADA Coordinator can be found at https://www.scu.edu/oae/ , (408) 554-4109, oae@scu.edu . Inquiries can also be made to the Assistant Secretary of Education within the Office for Civil Rights (OCR).
Clery Notice of Availability
Santa Clara University annually collects information about campus crimes and other reportable incidents in accordance with the federal Jeanne Clery Disclosure of Campus Security Policy and Campus Crime Statistics Act. To view the Santa Clara University report, please go to the Campus Safety Services website . To request a paper copy please call Campus Safety at (408) 554-4441. The report includes the type of crime, venue, and number of occurrences.
Americans with Disabilities Act
Santa Clara University affirms its commitment to employ qualified individuals with disabilities within the workplace and to comply with the Americans with Disability Act. All applicants desiring an accommodation should contact the Department of Human Resources , and 408-554-5750 and request to speak to Indu Ahluwalia by phone at 408-554-5750 or by email at iahluwalia@scu.edu.
Company
Located in the heart of Silicon Valley, Santa Clara University blends high-tech innovation with a social consciousness grounded in the Jesuit educational tradition.
We are committed to leaving the world a better place. We pursue new technology, encourage creativity, engage with our communities, and share an entrepreneurial mindset.
We are a close-knit, friendly, mission-driven campus that celebrates all cultures, lifestyles, perspectives and experiences. Diversity is critical to our mission.
Santa Clara recognizes the immense value that it brings to the workplace and the benefit to all when it is celebrated and prioritized.
The Bay Area boasts the quintessential California experience: picturesque beaches and hiking trails, sophisticated, unique dining and historic landmarks–hello, Golden Gate Bridge–perfect for any postcard.
The physical, emotional and mental health of our employees is extremely important to us. We offer a variety of dynamic programs, including meditation and nutritional seminars, to help you take care of your body, mind and soul.
From 401k's to one-on-one consultations with financial advisors, we’ll help you become better informed and more confident in building your retirement plan.
We provide opportunities for professional and educational development for eligible employees and their dependents, including tuition remission and reimbursement.
Get job alerts
Create a job alert and receive personalized job recommendations straight to your inbox.
Create alert