Systems Administrator III - Systems/Operations
Salary: $56,842.00 - $71,053.00 Annually
Job Type: FT Exempt Salaried Staff
Job Number: FY2300887
Closing: 5/31/2023 11:59 PM Mountain
Location: 800 W University Parkway, Orem
Division: Office of Information Technology
Position Announcement
Utah Valley University is currently taking applications for a
Systems Administrator position! The System Administrator will
develop reliable software systems and scalable automated solutions
to support Digital Transformations, IT operations and?on-call
duties. Implements best practices for availability, reliability,
and scalability to improve software systems and workflows. Utilizes
experience in system engineering, system administration and IT
operations to ensure timely achievement of project plans and goals.
Install, configure, and maintain computer systems and associated
peripheral equipment. Maintain operating efficiency and stability,
ensure the security and integrity of all systems and data, and
respond to outages and other issues. Perform application
administration activities such as creating, modifying, and deleting
users, optimizing remote access, and security access for messaging
systems, databases, and web applications. Coordinate with
department leadership to plan, design, and schedule the release of
all software, hardware, and operating system updates. Develop and
enhance processes and technical documentation, create back-up
procedures, test plans, and reports. Prioritize and respond to
requests for service and may provide escalation support to helpdesk
staff.??Oversee maintenance of room environment (e.g., cleaning,
cooling, and power).??Design, implement, and maintain site
reliability processes and systems that increase efficiency,
eliminate downtime, and maintain performance at scale across
platforms. Develop tools to ensure that the organization's services
(internally critical and/or externally visible systems) have
reliability and uptime appropriate to users' needs. Diagnose,
resolve, and escalate service-impacting issues. Develop monitoring
and alerting platforms to detect and resolve performance-impacting
issues.
Summary of Responsibilities
Performs day-to-day administration, maintenance, upgrades, and
operation of existing and recently developed systems including
Virtualization infrastructure (on-premises and cloud), Microsoft
Windows System administration, and Linux operating systems as well
as application systems and technologies. Ensure standard operating
procedures, runbooks, disaster recovery, and service catalog
definitions are matured and ready for production. Responsible for
lifecycle of product set maintenance, availability, reliability,
and performance reporting to decision makers and developers.
Perform tasks, as needed, to augment work needed on systems by
their engineers to ensure timely achievement of project plans and
goals. Advocate, contribute, recommend, and facilitate these
ever-improving standards and best practices through successful
adoption of change within UVU's Digital Transformation
department.
Ensures that the underlying infrastructure is running smoothly,
and that systems and tools are working as expected. Analyze
day-to-day functions and the processes of systems and network
management software to ensure they are performing within
predetermined specifications. In support of core systems
availability and reliability, integrate diverse monitoring
solutions for emerging and existing IT infrastructure using
automation and API tools in on-premises and cloud architectures.
Engineer centralized, enterprise-wide alerting and key performance
indicators that gives timely, actionable information to subject
matter experts, stakeholders, and leadership. SRE teams conduct
post-incident reviews, documenting findings and acting on lessons
learned. Following the incident resolution, the engineer will
revisit the issue and determine the cause. Build or optimize the
incident lifecycle to bolster reliability of services. Maintain
documentation and runbooks to ensure that teams get information
when they need it.
Develops operational tools and processes, builds reliable
systems, ensures compliance to operational standards, and provides
support to operational staff. This can be anything from adjustments
to monitoring and alerting to code changes in production. A SRE can
be tasked with building a homegrown tool from scratch to help with
weaknesses in software delivery or incident response and
management. SREs responsibilities include writing and developing
code to automate processes, such as analyzing logs, testing
production environments, and responding to any issues. Such
automation allows developers and engineers to focus their attention
on bug fixes and building new features rather than be burdened by
the day-to-day operational requirements needed in their
projects.
Provides leadership, communications, development, engineering,
automation, and feedback necessary for enterprise planning and
architecture. Timely and responsive work is key for providing what
went well or what went badly during a change/incident/problem
cycle. Participates in after-hours and weekend on-call rotation and
provide training to other on-call staff. Provide remote hands for
systems and application administrators that need physical and
virtual support within on-premises and cloud facilities. Perform
other job-related duties as assigned.
Minimum Qualifications
Graduation from an accredited institution with a bachelor's degree
in Information Technology or a related field plus three years of
work experience in IT
OR a combination of education and
experience in a related field totaling seven years.
Licenses or Certifications:
SRE Professional Certificate, Azure/AWS associate/practitioner
levels, ITIL/TOGAF, Docker Certified Associate, CKA
Knowledge, Skill, and Abilities
Knowledge
Knowledge of ITIL Change, Incident, and Problem
Management.
Knowledge of TCP/IP, firewall management, and operating system
configuration.
Proficient and current knowledge of industry trends, tools, and
processes.
Knowledge of Agile and iterative development process (e.g.
Scrum and Kanban).
Knowledge of automation and containerization technologies such
as Docker, Kubernetes, Ansible, Terraform, and SaltStack.
Knowledge of ITSM platforms such as Jira Service Management,
ServiceNow, or other.
Knowledge of Engineering practices: availability, reliability
and scalability, as well as disaster recovery
Knowledge of various automation tools as they are usually
responsible for building and integrating software tools to enhance
an organizational system's reliability and scalability.
Skills
Recognize key design, implementation, and process issues and
proactively craft and automate solutions.
Skill with system engineering and design for NOC/SOC
purposes.
Skill with scripting languages such as Perl, Power-shell, Bash,
Python.
Skills with most of the common programming languages including
javascript, HTML5, CSS, JQuery, Json, and PHP.
Skill with the design, implementation, and maintenance of
Active Directory, and/or LDAP directories.
Skills with TCP/IP, application network protocols, firewall
management, operating system configuration, anti-virus software,
and relational databases.
Practical Experience with various Monitoring solutions such as
Prometheus, PRTG, Site24x7, TestCafe, Selenium, Splunk, NewRelic,
Azure Monitor, and AWS CloudWatch.
Expertise in the major cloud providers such as Azure, AWS, and
Google Cloud.
Experience with alert management/on-call tools such as
PagerDuty, VictorOps, and Opsgenie.
Experience with instant communication and team collaboration
platforms like MS Teams, Slack, or Jitsi
Proven IT project planning and development skills
Abilities
Ability to read, write, and interpret technical documentation,
runbooks procedures manuals, and knowledge-base articles pertaining
to network systems and application management.
Ability to complete Root Cause Analysis (RCA) investigations
and write post incident reports.
Ability to improve team practices through code reviews,
handoffs of work and incidents.
Be on an on-call (PagerDuty) rotation to respond to incidents
that impact availability, and provide support for service engineers
with customer incidents.
Ability to debug production issues and build monitoring that
alerts on symptoms rather than on outages.
Ability to turn into repeatable actions and into
automation.
Ability to conduct and direct research into IT issues and
products, as required.
Ability to communicate technical ideas and concepts to a
non-technical audience.
EEO Statement:
UVU employment decisions are made on the basis of an applicant's
qualifications and ability to perform the job without regard to
race, color, religion, national origin, sex, sexual orientation,
gender identity, gender expression, age (40 and over), disability,
veteran status, pregnancy, childbirth, or pregnancy-related
conditions, genetic information, or other bases protected by
applicable federal, state, or local law.
To apply, please visit https://www.schooljobs.com/careers/uvu/jobs/4029866/systems-administrator-iii-systems-operations
jeid-5ad0164113424a469f41badd9b0636b4