Department

This position is within FSU’s Department of Information Technology Services (ITS)

The FSU College of Medicine Infrastructure and Operations team designs, builds, and manages infrastructure and servers to support other IT teams, faculty, staff, researchers, and students within the college. The team leverages the latest in automation and observability solutions to make complex work easier to accomplish.

Responsibilities

Design, build, automate, and optimize infrastructure using modern tools and site reliability engineering practices. Drive reliability, observability, and continuous improvement while managing primarily Windows servers in a hybrid cloud environment. Collaborate across teams and leverage automation and data to deliver secure, scalable, and customer-focused solutions.

Provision and manage server infrastructure: Deploy and manage Windows and Linux servers across a hybrid environment that includes Microsoft Azure and over a dozen geographically dispersed on-premises locations. This includes ensuring that all systems are secure by design, follow zero trust principles, and are scalable, observable, and aligned with business needs. Infrastructure is provisioned with reliability, maintainability, and consistency in mind, and observability is implemented prior to production to support proactive monitoring and data-informed decisions. The administrator will work to improve both the security and usability of systems, ensuring they meet both technical and customer needs. They will collaborate with cross-functional teams and stakeholders to deliver infrastructure components for projects, helping prioritize high-value work, assess feasibility, conduct security reviews of new systems and configurations, and timely delivery of a solution that meets customer needs. Collaborate with customers using the IT Service Management system, phone calls, instant messaging, and meetings as appropriate, with a focus on timely response and customer service.

Infrastructure and configuration as code: This role emphasizes defining infrastructure and configurations in code to ensure consistency, repeatability, and auditability. The administrator will use tools such as Terraform, Azure DevOps, Visual Studio Code, and scripting languages like PowerShell and Bash to manage infrastructure as code (IaC) and configuration as code (CaC). Using version control, adhering to coding standards, using secure coding practices, leveraging automated testing, and using test-driven development practices, are expected to ensure high-quality, secure, and maintainable code. Use AI-assisted tools to accelerate development, validation, and troubleshooting. The team values iterative improvement and rapid experimentation, using observability data and deployment feedback to guide future enhancements and refine infrastructure practices. Systems and configurations should be secure by design and conform to security policies and standards, including the automated validation of new deployments. Mitigate exploitable vulnerabilities. Observability solutions in Elastic will be used to monitor deployments and support data-informed decisions. Participate in pair programming sessions as appropriate to write code or troubleshoot deployments.

Automation: Automation is not just a task, it is a mindset and a strategic enabler of reliability, consistency, and scalability. The administrator will design and implement solutions that make work easier, reduce manual effort, improve system reliability, and streamline operations across provisioning, configuration, monitoring, and remediation. AI, scripting, workflow automation, or robotic process automation tools will be leveraged to enhance automation and reduce operational overhead. Experimentation is encouraged as part of the automation lifecycle, from testing new tools and workflows to refining existing processes based on real-world data and feedback. Monitor automations with Elastic to ensure continued success and identify data-informed opportunities for improvement. Collaboration with peers and stakeholders will help prioritize the highest-value automation opportunities and ensure that solutions are effective, secure, and aligned with business needs. These efforts directly support the team's ability to deliver reliable, scalable infrastructure with minimal manual intervention.

Network administration: The administrator will manage and troubleshoot enterprise-grade network infrastructure, including wireless access points, switches, routers, load balancers, and next-generation firewalls. A solid understanding of networking is essential to ensure secure, performant, and reliable infrastructure. Troubleshooting network issues will involve using packet captures, OS command outputs, diagnostic consoles, logs, or other means. Use Elastic or other network observability tools to make data-informed decisions. Ensure the security of data, information, and systems. Collaborate with network and security teams to validate new systems and configurations, expand observability, reduce exploitable vulnerabilities, implement new security controls, ensure operational resilience, and improve the usability of systems for customers.

Documentation and process improvement: Clear, concise, and up-to-date documentation is essential for operational continuity and knowledge sharing. The administrator will create and maintain system diagrams, deployment guides, and standard operating procedures (SOPs), that support repeatability, compliance, and reliability. They will identify opportunities for continuous improvement and refine existing processes. They will ensure that procedures align with FSU ITS Security Policies and Standards. Documentation will evolve alongside infrastructure and automation, and collaborative peer reviews will help ensure accuracy, clarity, and usability.

Support and incident response: The administrator will respond to system alerts and outages in accordance with established incident management procedures, collaborating with others as needed to ensure rapid resolution. They will participate in post-incident reviews, highlighting key data points and observability insights, to identify root causes and opportunities for improvements to systems or processes. Improvements will be implemented to prevent recurrence and enhance system reliability. Observability tools such as Elastic will be used to support rapid diagnosis and resolution, including the creation of new monitoring as appropriate. The role includes participation in an on-call rotation, typically one week per month, and may require after-hours support for deployments, changes, or emergency repairs, including on holidays and weekends as directed by IT management. The administrator will work to reduce the need for after-hours support by lowering the risk and complexity of changes, improving system reliability, and leveraging automated deployment solutions. Assist with security investigations as needed. Ensure incident response processes align with technical teams, IT management, and customer expectations.

Professional development: Continuous learning is a key expectation of this role. The administrator will engage in both assigned and self-directed professional development to stay current with evolving technologies, tools, and best practices. They are encouraged to explore and experiment with new ideas, including AI, automation, observability, and other innovations, and to contribute to team knowledge and process evolution. They are given the freedom to explore technical subjects that interest them. This commitment to growth supports both individual advancement and the team's ability to adapt to a rapidly changing technology landscape. Participation in internal communities of practice, knowledge-sharing sessions, and collaborative learning opportunities is encouraged. Our team embraces an environment of rapid experimentation, using data from each iteration to guide decisions and identify the next opportunity for refinement or innovation. Administrators are encouraged to test ideas, measure outcomes, and continuously evolve our systems and practices.

Qualifications

Bachelor's degree in Computer Science, MIS, or other appropriate degree and two years' experience or a high school diploma or equivalent and ix years of experience. (Note: or a combination of appropriate post high school education and experience equal to six years.)

Preferred Qualifications

Proven ability to learn new tools and technologies quickly, with a track record of self-directed learning and adaptability in fast-paced environments.

Demonstrated commitment to continuous learning and professional development.

Proficient in scripting for infrastructure automation using PowerShell, with the ability to write, debug, and maintain scripts independently or with tools like GitHub Copilot; familiarity with Python or Bash is a plus.

Experience using infrastructure and configuration as code tools such as Terraform, Ansible, PowerShell, or similar, with version control practices using Git, and integrated development environments like Visual Studio Code.

Experience creating and troubleshooting CI/CD pipelines using tools such as Azure DevOps, GitHub Actions, or GitLab to automate infrastructure deployment and configuration.

Experience provisioning and managing infrastructure in cloud environments such as Azure, AWS, or Google Cloud, with an understanding of repeatable deployment processes, and troubleshooting network connectivity with next-generation firewalls.

Experience deploying containers and familiarity with container orchestration technologies such as Kubernetes or Docker Swarm.

Proficient using observability tools such as Elastic, Dynatrace, Prometheus, Grafana, Splunk, Datadog, or others, to ingest new types of data, build dashboards and alerts, and derive insights for performance tuning and incident response.

Experience supporting Windows and Linux systems in an Active Directory domain, including deployment, configuration, and troubleshooting, as well as managing virtual infrastructure using platforms such as Hyper-V or VMware.

Experience leveraging AI tools to accelerate task completion and improve operational efficiency.

Demonstrated ability to write and troubleshoot firewall rules and quickly diagnose issues across firewalls, switches, and wireless access points from vendors such as Palo Alto, Juniper, Aruba, Arista, Fortinet, Extreme, Brocade, Cisco, or others, with a focus on identifying root causes across network, OS, and application layers.

Strong understanding of secure-by-design and zero trust principles, with experience applying secure configurations and patching strategies in operational environments.

Demonstrated experience in infrastructure projects by planning and executing technical tasks such as system deployments, launching new remote locations, or automating business processes. This includes prioritizing high-value work, ensuring long-term maintainability through documentation and repeatable processes, leveraging automation where appropriate, and working closely with cross-functional teams to drive project success.

Proficient in creating technical diagrams to communicate infrastructure design or operational workflows.

Helpful

Who is an ideal candidate for this position?

The ideal candidate is someone who is genuinely passionate about IT, someone who has probably built a home lab (or cloud lab) just for fun, and gets excited about scripting, automation, and making systems run better. They love learning new technologies and finding better ways to get things done, whether that is writing a PowerShell script, setting up workflow automation to schedule a calendar appointment, or integrating AI into their workflow to save time and reduce toil. They are the kind of person who starts the day looking forward to pushing code to the git repo, watching it flow through the deployment pipeline, and seeing it make a real impact on clinical, research, and education efforts.

This is a role for someone who loves solving puzzles, thrives in both independent and collaborative settings, and gets excited about working with modern tools to make infrastructure smarter, faster, and more reliable. They know how to communicate clearly, whether it is writing documentation, contributing in team meetings, or breaking down complex technical ideas for someone who just needs the bottom line. They understand how to capture and interpret network traffic when things go sideways, and they know how to turn that data into action. They are curious, self-driven, and motivated by the idea that their work helps support something bigger than just infrastructure, it enables clinicians caring for patients today, researchers working to improve health outcomes in the future, and students who will one day serve in clinical settings after graduation.

What is a typical day in this position?

A typical day begins with hands-on work such as writing scripts, deploying infrastructure, or addressing tasks that have not yet been automated. You may manually configure a Windows Server 2016 system as part of legacy support, or deploy a new Windows Server 2025 instance using the latest automation tools. After completing the manual work, you will often begin designing an automated solution to replace it, write the necessary scripts, and submit a pull request to improve future efficiency for the team.

You may participate in a pair programming session focused on scripting or automation, helping to refine logic, improve readability, or troubleshoot unexpected behavior. Throughout the day, you might respond to an automated alert or assist another IT team with a technical issue. When incidents arise, you will work alongside the rest of the team to investigate and resolve the issue, often using tools like Elastic to gather context and identify root causes. This triggers a post-incident review, where the team reflects on what did not go as planned and then takes action over the following days to prevent similar issues from occurring again.

Toward the end of the day, you will join a team meeting to share updates, align on progress, and discuss priorities. You will also take time to plan and prioritize your work for the next day. Once a week, you will meet one on one with your supervisor to check in, share feedback, and work on removing any roadblocks that may be slowing progress.

What can I expect in the first 60-90 days?

In your first few weeks, you will work through onboarding tasks, get familiar with our environment, and shadow team members to learn how we approach infrastructure, automation, and reliability. You will begin writing scripts in your first week and gradually take on more responsibility as you gain context. By the end of your first month, you will be contributing to deployments, writing automation, and participating in troubleshooting efforts using modern tools and practices.

You will be integrated into broader initiatives and ongoing projects early on, contributing where your skills align and learning from the team along the way. While our project list is always evolving, one constant is the hands-on work of provisioning and configuring new or replacement infrastructure; we always use the latest supported Windows Server or Linux systems, whether on-premises or in Azure. You can expect to be involved in meaningful, high-impact work from the start, with opportunities to shape and automate the infrastructure that powers our organization.

We value the unique experience and perspective each team member brings, and we look forward to learning from you just as much as you learn from us. Our collaborative and innovative environment encourages finding better ways to accomplish work and achieve goals. You will play a key role in helping us improve our processes and remove barriers to success. As you build confidence and context in your first few months, you will be fully supported as you prepare to join our on-call rotation, where your contributions will directly support our reliability goals. We are committed to fostering a goal-oriented, easy-going environment where you can thrive.

University Information

One of the nation's elite research universities, Florida State University preserves, expands, and disseminates knowledge in the sciences, technology, arts, humanities, and professions, while embracing a philosophy of learning strongly rooted in the traditions of the liberal arts and critical thinking. Founded in 1851, Florida State University is the oldest continuous site of higher education in Florida. FSU is a community steeped in tradition that fosters research and encourages creativity. At FSU, there’s the excitement of being part of a vibrant academic and professional community, surrounded by people whose ideas are shaping tomorrow’s news!

Learn more about our university and campuses.

FSU Total Rewards

FSU offers a robust Total Rewards package. Visit our website to learn more about our Compensation, Benefits, Wellness, Recognition, and Employee Development programs.

Use our interactive tool to calculate Total Compensation options based on potential salary, benefits and retirement contributions, earned leave, and other employment-related perks.

How To Apply

If qualified and interested in a specific job opening as advertised, apply to Florida State University at https://jobs.fsu.edu. If you are a current FSU employee, apply via myFSU > Self Service.

Applicants are required to complete the online application with all applicable information. Applications must include all work history up to ten years, and education details even if attaching a resume.

Considerations

This is an A&P position.

This position requires successful completion of a criminal history background check.

This position has been designated as eligible for primarily remote based on the current position/job functions. Employees are required to live in the Tallahassee area and report to campus as needed.

This position is being advertised as open until filled.

Equal Employment Opportunity

FSU is an Equal Employment Opportunity Employer.

Save Apply

Report job

InfraOps Reliability Administrator

Department

Responsibilities

Qualifications

Preferred Qualifications

Helpful

University Information

FSU Total Rewards

How To Apply

Considerations

Equal Employment Opportunity

Lead Site Reliability Engineer (AZURE) - Empower Product Group

Director of Site Reliability Engineering

Site Reliability Engineer

Reliability Engineer

Maintenance Reliability Technician