Posted 1 month ago

Senior Site Reliability Engineer

AustraliaRemoteFull-time

AI Summary

Senior Platform Engineer focusing on DevOps and SRE to improve reliability, implement best practices, and manage live production systems across multiple time zones.

About this role

About Megaport

We’re not your typical tech company – and we don’t want to be. Megaport is the global leader in Network as a Service (NaaS), and has transformed the way businesses connect to the cloud, data centers, and each other. We’re publicly listed on the Australian Stock Exchange and partnered with the biggest names in tech like Amazon, Microsoft, Google, Oracle, IBM, and more. Headquartered in Brisbane with a crew of over 400 people spread across Asia-Pacific, Europe, and the Americas, our employees enjoy an environment that is collaborative, supportive, and (actually) fun.

Our Team Culture

We’re a team of problem solvers, pixel pushers, code slingers, and cloud fanatics. Culture is more than a poster on the wall here – collaboration beats hierarchy, curiosity fuels our growth, and everyone’s voice matters. We take our work seriously, but not ourselves. We work across time zones to execute on our global vision, trust each other to get things done, and never compromise our values for commercial gain. Most importantly, we place our customers at the center of everything we do.

We’re committed to increasing representation in the tech industry and welcome applicants from all backgrounds. Don’t meet every requirement? That’s okay. If you’re excited about this role, we encourage you to apply.

The Role

As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry best practice within Megaport. You will work alongside talented team members in multiple timezones ensuring that systems are secure, maintainable and available. External to the team you will be engaging with stakeholders in requirements analysis and demonstrations. Technically you will be very hands on. Continually evolving your skills through a mix of peer reviews and research. Ultimately your obsession is customer success and ensuring company goals are met.

What You Will Be Doing

Improving production reliability and system resilience within an SRE scoped team

Championing high standards of work and industry best practices

Communicating with teams and stakeholders at all stages

Bringing fresh ideas to the table and encouraging others

Diving into complex technical problems with a can-do attitude

Working across numerous technologies in a fast-changing industry

Participating in on-call rotation, incident response, and blameless post-incident reviews

Writing code, handling alerts, improving solutions, and supporting others

Playing a crucial role in the success of your company and team

What We Are Looking For

5+ years administering Linux systems and related infrastructure in production environments

A collaborative SRE mindset, with familiarity around SLIs/SLOs/SLAs, error budgets, blast radius, and blameless postmortems

A focus on automation, reducing toil, and preventing problem recurrence

A track record of writing runbooks that work for the broader team, not just yourself

Strong Kubernetes and broader ecosystem fundamentals

Cloud infrastructure experience; AWS strongly preferred and bare-metal is a bonus

Strong tool development - Bash, plus either Python or Go preferred, or similar

Infrastructure-as-code tooling experience - Terraform preferred

CI/CD and version control, GitHub preferred

Database experience - one of Postgres, Cassandra, or ClickHouse preferred

Experience operating a production observability stack (metrics, logs, traces), with an eye for signal over noise

Comfortable working on live production infrastructure, with strong troubleshooting instincts and ownership of incident response

A history of continual professional development

A self-directed style suited to an async, globally distributed team, and comfortable picking up adjacent work when the situation calls for it

What We Offer

Flexible working environments

Birthday Leave

Generous study and training allowance + 5 days paid study leave

Creative, fun, and contemporary workspaces

Motivated team of industry experts and new talent

Celebrated success with ‘Legend’ and ‘Kudos’ Awards

Health and wellness program

Skills

AutomationAWSBashBlameless PostmortemsBlast RadiusCassandraCI/CDClickHouseCloud InfrastructureGitHubGOInfrastructure As CodeKubernetesLinuxObservability (metrics/logs/traces)On-call Incident ResponsePostgreSQLProduction TroubleshootingPythonRunbooksSLIs/SLOs/SLAsSRE FundamentalsTerraform

Senior Site Reliability Engineer

About this role

The Role

What You Will Be Doing

What We Are Looking For

What We Offer

Skills

Explore related jobs

More jobs at Megaport

Similar Automation jobs

Browse these categories