Service Reliability Engineer II

Make your world more Xait-ing! Our client, Xait, Inc., has a great opportunity for an experienced Service Reliability Engineer who wants to be part of an innovative, fast-growth SaaS team, with strong advancement opportunities.

Position Overview:

The Service Reliability Engineer, II (SRE, II) main responsibility is to take part in and to oversee the daily operation, maintenance, and integrity of the infrastructure and platform for products and services, as well as participating in technology projects and general information technology operations. Additionally, this position works closely with the SRE I position to address tiered issues and provide technical coaching for junior level team member(s).

The employee is responsible for ensuring that Xait adheres to applicable legislation, US Government Contract requirements, and standard procedures within relevant functional areas, demonstrates a high level of integrity and professionalism, and works collaboratively and effectively within all levels of the organization.

This position reports directly to the Xait, Inc. US Operations Manager and works in a matrixed organization to provide direct support to the Xait Operations & Security Director.

Principal duties and responsibilities:

  • Maintain the daily operation of technical infrastructure in accordance with ITIL and ISO27001 standards. As example, duties may include, but are not limited to:
  • Maintain all externally and internally hosted Xait systems and infrastructure regarding products and services; all internally and externally hosted Xait legacy systems; all internally and externally hosted Xait back office and other business support systems.
  • Provide fulfillment of Xait hosting Service Level Objectives (SLOs), Service Level Agreements (SLAs), and technical support of SLAs.
  • Maintain and manage the Xait general network, communication, and telephony infrastructure, when tiered response required from SRE, I.
  • Provide development and maintenance work on automation for platform, infrastructure, and application deployments.
  • Improve and build upon Xait automation tools for systems provisioning, monitoring, trending, and management.
  • Monitor relevant systems to predict, prevent, and react to events and incidents in a timely manner, with minimal lag time.
  • Provide logistics, service, and support regarding employee client workstations/laptops, related hardware, and infrastructure, when tiered response required from SRE, I.
  • Actively participate and contribute to technical projects within Xait Operations.
  • Participate in 24/7 on-call/duty function when requested by Operations & Security Director and approved by supervisor.
  • Strategize with SRE, other engineering teams, and cross-functional teams on complex problems, make decisions and recommendations regarding systems improvements after analyzing possible courses of action, keeping supervisor informed of changes and issues.
  • Perform performance analysis, provide proactive troubleshooting continual improvement, and capacity planning for production, virtualized or container-based environments.
  • Communicate effectively with fellow SREs and other engineering teams and describe problems succinctly, with sufficient detail for hand-off of problem to another team, or a peer for completion.
  • Manage real-time communications issues, when above SRE I skill level or capacity, during outages with both technical and non-technical audiences.
  • Develop documentation, procedures, and suggest policies that help to improve overall platform stability.
  • Participate in cross-functional team information sharing, testing, and solutioning, such as but not limited to outage reviews, to improve overall product stability.
  • Build relationships with development and other teams and technology leaders across the company
  • Other reasonable duties as assigned.

Qualifications Required:

  • Bachelor of Science Degree in computer science, mathematics, engineering, or other related technical field.
  • Minimum of 5 years of systems administration, programming, or closely related experience, OR a minimum of 3 years serving as Service Reliability Engineer, I or higher.
  • In lieu of the Bachelor of Science Degree, may consider a minimum of 7 years of systems administration, or 7 years of programming, or a combination of both, AND the following requirements:
  • AND
  • Minimum of 5 years of experience in troubleshooting and problem resolution.
  • Minimum of 5 years of experience multitasking in a high stake, high stress environment with competing priorities.
  • Intermediate or higher level of knowledge of Unix/Linux systems; Red Hat, Ubuntu knowledge highly desirable.
  • Intermediate or higher level of knowledge of Python.
  • Advanced certifications in areas of responsibility, such as, Azure Associate level certification or equivalent.
  • Legal ability to work for any US Employer on US Government Contracts per contract award standards.

Qualifications Preferred:

  • Bachelor of Science Degree in computer science, mathematics, physics, or engineering.
  • 6 or more years of experience working as a System Administrator or Network Engineer.
  • 4 or more years of experience working as a Service Reliability Engineer I, or higher.
  • 5 or more years of experience with relational database technologies and database administration.
  • Highly skilled in Unix/Linux systems, with experience in Red Hat, Ubuntu.
  • Highly skilled in development tools and languages, such as Python, Ansible, GitLab, Kubernetes, Terraform.
  • Expert level Azure Certification(s) such as DevOps Engineer, Solutions Architect, etc.

Knowledge, Skills & Abilities:

  • Ability to participate in and lead formal or informal group problem solving for immediate issues.
  • Ability to interact with others, employing a communication style appropriate to the person or audience.
  • Ability to follow IT procedures to complete work, but may recommend changes to departmental processes for continual improvement.
  • Ability to communicate technical information to internal and external partners at all levels of experience and understanding.
  • Skilled at presenting information to groups both remote and on-site.
  • Ability to work within a global, matrixed organization, with remote and on-site teams.
  • Ability to provide Tier II level resolution and support and provide input and advise for SRE, I issues.
  • Ability to provide support as needed in a 24/7/365, matrixed global environment.
  • Intermediate to high level of skill at troubleshooting and problem resolution.
  • Ability to adjust to the needs of the organization with competing deadlines and priorities.
  • Ability to handle stressful IT related situations and be capable of multitasking roles.
  • Able to travel domestically and internationally as needed, when and if required.
  • Ability to work and manage periodic on-call duty.
  • Legal ability to work for any US Employer on US Government Contracts per contract award standards.

Physical Demand and Work Environment:

  • Walk, talk, sit, stand, touch, hear, lift up to 20 lbs., sight, use of hands and fingers.
  • Domestic and International travel as required to meet the training, implementation, and satisfaction needs of the clients, in support of company goals.
  • Contact with internal and external work groups, clients, subordinate team, and management.
  • Occasional on call rotation to support production goals.
  • Smoke Free Tobacco Free Workplace.

This posting description is not designed to cover or contain a comprehensive listing of activities, duties, or responsibilities required by this position. Nothing in this posting description restricts management's right to assign or reassign duties and responsibilities to this job at any time.

Frontrunner Consulting Group, Inc. and Xait, Inc. are Equal Employment Opportunity Employers

Job Type: Full-time

Pay: $85,000.00 - $120,000.00 per year


  • 401(k)
  • 401(k) matching
  • Dental insurance
  • Disability insurance
  • Health insurance
  • Health savings account
  • Life insurance
  • Paid time off
  • Tuition reimbursement
  • Vision insurance


  • 8 hour shift


  • Bachelor's (Preferred)


  • Python: 3 years (Required)
  • System Administration, Programming, or closely related: 5 years (Required)
  • SRE, I or higher: 3 years (Required)
  • Unix/Linux, Red Hat, Ubuntu: 3 years (Preferred)


  • Advanced certifications in areas of responsibility? (Required)

Work Location:

  • One location

Work Remotely:

  • Temporarily due to COVID-19

Additional information

  • Remote status

    Temporarily remote

Or, know someone who would be a perfect fit? Let them know!

What we believe in

We are providing enterprise customers with software for document co-authoring, automation and collaboration.

We emphasize team productivity and manageability rather than individual creativity. And we offer parallel collaboration as well as facilitation of controlled collaboration. As such, XaitPorter is so much more than a writing tool or standard collaboration tool. XaitPorter is a solution that improves and optimizes both processes and documents. And at Xait, we are committed to seeing our customers succeed.

Our company vision is quite simply: 

Let`s make the world more xaiting!


Applicant tracking system by Teamtailor