Systems Administrator III - Systems/Operations

Employer: Utah Valley University
Location: Utah, United States
Salary: Salary Not specified
Date posted: May 5, 2023

Position Type: Administrative, Business & Administrative Affairs, Computer Services & Information Technology
Employment Level: Administrative
Employment Type: Full Time

Systems Administrator III - Systems/Operations

Salary: $56,842.00 - $71,053.00 Annually
Job Type: FT Exempt Salaried Staff
Job Number: FY2300887
Closing: 5/31/2023 11:59 PM Mountain
Location: 800 W University Parkway, Orem
Division: Office of Information Technology

Position Announcement
Utah Valley University is currently taking applications for a Systems Administrator position! The System Administrator will develop reliable software systems and scalable automated solutions to support Digital Transformations, IT operations and?on-call duties. Implements best practices for availability, reliability, and scalability to improve software systems and workflows. Utilizes experience in system engineering, system administration and IT operations to ensure timely achievement of project plans and goals. Install, configure, and maintain computer systems and associated peripheral equipment. Maintain operating efficiency and stability, ensure the security and integrity of all systems and data, and respond to outages and other issues. Perform application administration activities such as creating, modifying, and deleting users, optimizing remote access, and security access for messaging systems, databases, and web applications. Coordinate with department leadership to plan, design, and schedule the release of all software, hardware, and operating system updates. Develop and enhance processes and technical documentation, create back-up procedures, test plans, and reports. Prioritize and respond to requests for service and may provide escalation support to helpdesk staff.??Oversee maintenance of room environment (e.g., cleaning, cooling, and power).??Design, implement, and maintain site reliability processes and systems that increase efficiency, eliminate downtime, and maintain performance at scale across platforms. Develop tools to ensure that the organization's services (internally critical and/or externally visible systems) have reliability and uptime appropriate to users' needs. Diagnose, resolve, and escalate service-impacting issues. Develop monitoring and alerting platforms to detect and resolve performance-impacting issues.

Summary of Responsibilities

Performs day-to-day administration, maintenance, upgrades, and operation of existing and recently developed systems including Virtualization infrastructure (on-premises and cloud), Microsoft Windows System administration, and Linux operating systems as well as application systems and technologies. Ensure standard operating procedures, runbooks, disaster recovery, and service catalog definitions are matured and ready for production. Responsible for lifecycle of product set maintenance, availability, reliability, and performance reporting to decision makers and developers. Perform tasks, as needed, to augment work needed on systems by their engineers to ensure timely achievement of project plans and goals. Advocate, contribute, recommend, and facilitate these ever-improving standards and best practices through successful adoption of change within UVU's Digital Transformation department.

Ensures that the underlying infrastructure is running smoothly, and that systems and tools are working as expected. Analyze day-to-day functions and the processes of systems and network management software to ensure they are performing within predetermined specifications. In support of core systems availability and reliability, integrate diverse monitoring solutions for emerging and existing IT infrastructure using automation and API tools in on-premises and cloud architectures. Engineer centralized, enterprise-wide alerting and key performance indicators that gives timely, actionable information to subject matter experts, stakeholders, and leadership. SRE teams conduct post-incident reviews, documenting findings and acting on lessons learned. Following the incident resolution, the engineer will revisit the issue and determine the cause. Build or optimize the incident lifecycle to bolster reliability of services. Maintain documentation and runbooks to ensure that teams get information when they need it.

Develops operational tools and processes, builds reliable systems, ensures compliance to operational standards, and provides support to operational staff. This can be anything from adjustments to monitoring and alerting to code changes in production. A SRE can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident response and management. SREs responsibilities include writing and developing code to automate processes, such as analyzing logs, testing production environments, and responding to any issues. Such automation allows developers and engineers to focus their attention on bug fixes and building new features rather than be burdened by the day-to-day operational requirements needed in their projects.

Provides leadership, communications, development, engineering, automation, and feedback necessary for enterprise planning and architecture. Timely and responsive work is key for providing what went well or what went badly during a change/incident/problem cycle. Participates in after-hours and weekend on-call rotation and provide training to other on-call staff. Provide remote hands for systems and application administrators that need physical and virtual support within on-premises and cloud facilities. Perform other job-related duties as assigned.

Minimum Qualifications
Graduation from an accredited institution with a bachelor's degree in Information Technology or a related field plus three years of work experience in IT OR a combination of education and experience in a related field totaling seven years.

Licenses or Certifications:

SRE Professional Certificate, Azure/AWS associate/practitioner levels, ITIL/TOGAF, Docker Certified Associate, CKA

Knowledge, Skill, and Abilities

Knowledge

Knowledge of ITIL Change, Incident, and Problem Management.

Knowledge of TCP/IP, firewall management, and operating system configuration.

Proficient and current knowledge of industry trends, tools, and processes.

Knowledge of Agile and iterative development process (e.g. Scrum and Kanban).

Knowledge of automation and containerization technologies such as Docker, Kubernetes, Ansible, Terraform, and SaltStack.

Knowledge of ITSM platforms such as Jira Service Management, ServiceNow, or other.

Knowledge of Engineering practices: availability, reliability and scalability, as well as disaster recovery

Knowledge of various automation tools as they are usually responsible for building and integrating software tools to enhance an organizational system's reliability and scalability.

Skills

Recognize key design, implementation, and process issues and proactively craft and automate solutions.

Skill with system engineering and design for NOC/SOC purposes.

Skill with scripting languages such as Perl, Power-shell, Bash, Python.

Skills with most of the common programming languages including javascript, HTML5, CSS, JQuery, Json, and PHP.

Skill with the design, implementation, and maintenance of Active Directory, and/or LDAP directories.

Skills with TCP/IP, application network protocols, firewall management, operating system configuration, anti-virus software, and relational databases.

Practical Experience with various Monitoring solutions such as Prometheus, PRTG, Site24x7, TestCafe, Selenium, Splunk, NewRelic, Azure Monitor, and AWS CloudWatch.

Expertise in the major cloud providers such as Azure, AWS, and Google Cloud.

Experience with alert management/on-call tools such as PagerDuty, VictorOps, and Opsgenie.

Experience with instant communication and team collaboration platforms like MS Teams, Slack, or Jitsi

Proven IT project planning and development skills

Abilities

Ability to read, write, and interpret technical documentation, runbooks procedures manuals, and knowledge-base articles pertaining to network systems and application management.

Ability to complete Root Cause Analysis (RCA) investigations and write post incident reports.

Ability to improve team practices through code reviews, handoffs of work and incidents.

Be on an on-call (PagerDuty) rotation to respond to incidents that impact availability, and provide support for service engineers with customer incidents.

Ability to debug production issues and build monitoring that alerts on symptoms rather than on outages.

Ability to turn into repeatable actions and into automation.

Ability to conduct and direct research into IT issues and products, as required.

Ability to communicate technical ideas and concepts to a non-technical audience.

EEO Statement:

UVU employment decisions are made on the basis of an applicant's qualifications and ability to perform the job without regard to race, color, religion, national origin, sex, sexual orientation, gender identity, gender expression, age (40 and over), disability, veteran status, pregnancy, childbirth, or pregnancy-related conditions, genetic information, or other bases protected by applicable federal, state, or local law.

To apply, please visit https://www.schooljobs.com/careers/uvu/jobs/4029866/systems-administrator-iii-systems-operations

jeid-5ad0164113424a469f41badd9b0636b4

Systems Administrator III - Systems/Operations

Get job alerts