Jobless Developer
CommonAI C.I.C. logo
CommonAI C.I.C.

Posted 1 month ago

Open

AI Infrastructure Engineer (Storage)

Cambridge, England, United KingdomOn-siteFull-time

AI Summary

Infrastructure Engineer responsible for designing, deploying, and maintaining high-performance storage systems for AI and data workloads, with emphasis on distributed storage, automation, and cloud integration.

About this role

CommonAI CIC is a non-profit membership organisation, founded on a belief in collaborative engineering for the safe and responsible development of foundational AI technologies. A place where AI startups, enterprises large and small, public sector bodies and academia can share resources and knowledge, to codevelop and grow businesses, fast.

We support technology-focused start ups, each with unique data management challenges, and are seeking an experienced Infrastructure Engineer to help them design, deploy and maintain high-performance storage systems for their AI and data-driven workloads. The successful candidate will combine deep experience architecting and managing distributed, cloud, and tiered storage solutions with strong Linux and automation skills.

In this role you will:

  • Design, implement, and maintain storage platforms that support large-scale AI and data pipelines
  • Manage distributed storage systems such as Ceph, Lustre, or BeeGFS.
  • Oversee tiered storage architectures, optimising data movement across high-performance, object, and archival tiers.
  • Ensure data integrity, availability, and security across on-premises and cloud environments.
  • Develop automation and monitoring tools using Bash, Python, or similar scripting languages.
  • Manage and secure container images and related storage used for AI and ML workloads.
  • Integrate storage systems with public cloud services (AWS, Azure, GCP) and hybrid environments.
  • Troubleshoot complex storage and data flow issues, collaborating closely with AI platform and infrastructure teams.
  • Contribute to ongoing architecture improvements, performance tuning, and capacity planning.

Requirements

To be considered candidates should meet most of the following requirements:

  • Strong Linux system administration background.
  • Proven experience installing, configuring, and maintaining Ceph clusters or similar technologies in a production environment.
  • Familiarity with distributed filesystems (e.g., Lustre, BeeGFS) and cloud-based storage services (e.g. EC2).
  • Experience with tiered storage management and lifecycle data policies.
  • Scripting and automation proficiency (e.g. Bash, Python, Terraform/OpenTofu, Ansible).
  • Understanding of data security best practices and compliance considerations.
  • Experience working with container technologies (e.g. Docker, Kubernetes) and image storage registries.
  • Strong analytical, troubleshooting, communication and documentation skills.

We also value:

  • Knowledge of GPU compute environments or AI training infrastructure.
  • Experience with monitoring and observability tools (Prometheus, Grafana, etc.).
  • Contributions to open-source storage, data management, or infrastructure projects.
  • Familiarity with object storage systems (S3, RADOS Gateway, MinIO, etc.).

Benefits

  • A collaborative and supportive work environment.
  • The opportunity to have a high impact in a growing organisation.
  • Competitive salary package and pension.
  • Professional development opportunities.
  • Networking opportunities with influential people from across the tech sector and academia.
  • A vibrant office environment located a few minutes walk away from Cambridge train station.

CommonAI CIC is an equal opportunity employer and is committed to creating an inclusive and diverse workplace.

Skills

AnsibleAWSAzureBashBeeGFSCephCloud StorageData IntegrityData SecurityDistributed StorageDockerGCPGrafanaImage Storage RegistriesKubernetesLustreMinioMonitoringOn-premises StorageOpenTofuPrometheusPythonRADOS GatewayS3Terraform

Explore related jobs

Browse these categories