About this role
Our cloud infrastructure platform is self-managed, with multiple availability zones [AZ] across Europe, ensuring reliability and top-tier performance.
The challenge
We’re at a pivotal stage in the evolution of our cloud platform. To continue scaling efficiently and strengthening reliability, we are expanding our Operations & SRE capabilities. Our infrastructure supports mission-critical services for our customers, and ensuring performance, stability, and continuous improvement is at the core of our vision.
As a Site Reliability Engineer / Systems Administrator, your mission will be to monitor and optimize our cloud systems, automate processes, ensure effective incident management, and help us maintain a robust, scalable and secure infrastructure. You will play a key role in minimizing downtime, improving operational efficiency, and supporting sustainable growth.
You’ll be part of a highly collaborative engineering environment, working closely with DevOps, Product and Development teams to build reliable services from the ground up, enforce good operational practices and contribute to ongoing enhancements that impact thousands of users.
Collaboration will be essential. You will support critical infrastructure decisions, lead incident response, proactively detect risks and ensure that both technology and teams can continue to scale confidently.
What we expect from you
Proven experience managing large-scale cloud or MSP infrastructures.
Expert-level Linux systems administration (mandatory).
Experience with Windows Server (2012–2025) in production environments.
Strong troubleshooting skills across systems, networking, storage and application layers.
Solid networking knowledge: TCP/IP, DNS, load balancing, firewalling, BGP and network virtualization.
Experience with network storage solutions such as Ceph, NFS or similar technologies.
Familiarity with IaaS orchestration platforms such as CloudStack or similar.
Experience implementing and maintaining monitoring and observability tools such as Zabbix, Prometheus, Grafana and ELK.
Experience with Infrastructure as Code practices and automation using Ansible.
Experience designing or maintaining CI/CD pipelines.
Database knowledge: MySQL, MariaDB or PostgreSQL (advanced troubleshooting is a plus).
Strong understanding of ITIL processes for incident, problem and change management.
Strong documentation practices and commitment to operational excellence.
Analytical mindset focused on reliability, scalability and continuous improvement.
Excellent communication skills in Spanish and intermediate English.
Nice to have
Hands-on experience with CI/CD pipelines.
Experience optimizing distributed systems performance.
Advanced security and system hardening expertise.
Experience with ticketing systems and operational workflow optimization.
Tools & Technologies
Operating Systems: Linux, Windows Server
Automation: Ansible, scripting (Bash, Python, PowerShell)
CI/CD: Modern pipeline implementations
Monitoring & Observability: Zabbix, Prometheus, Grafana, ELK Stack
Storage: Ceph, NFS or similar
Orchestration: CloudStack, OpenStack
Databases: MySQL, MariaDB, PostgreSQL
Collaboration & ITSM: Tools aligned with ITIL practices