Jobless Developer
Megaport logo
Megaport

Posted 1 month ago

Open

Senior Site Reliability Engineer

AustraliaRemoteFull-time

AI Summary

Senior Platform Engineer focusing on DevOps and SRE to improve reliability, implement best practices, and manage live production systems across multiple time zones.

About this role

About Megaport
We’re not your typical tech company – and we don’t want to be. Megaport is the global leader in Network as a Service (NaaS), and has transformed the way businesses connect to the cloud, data centers, and each other. We’re publicly listed on the Australian Stock Exchange and partnered with the biggest names in tech like Amazon, Microsoft, Google, Oracle, IBM, and more. Headquartered in Brisbane with a crew of over 400 people spread across Asia-Pacific, Europe, and the Americas, our employees enjoy an environment that is collaborative, supportive, and (actually) fun.

Our Team Culture
We’re a team of problem solvers, pixel pushers, code slingers, and cloud fanatics. Culture is more than a poster on the wall here – collaboration beats hierarchy, curiosity fuels our growth, and everyone’s voice matters. We take our work seriously, but not ourselves. We work across time zones to execute on our global vision, trust each other to get things done, and never compromise our values for commercial gain. Most importantly, we place our customers at the center of everything we do.

We’re committed to increasing representation in the tech industry and welcome applicants from all backgrounds. Don’t meet every requirement? That’s okay. If you’re excited about this role, we encourage you to apply.

The Role

As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry best practice within Megaport. You will work alongside talented team members in multiple timezones ensuring that systems are secure, maintainable and available. External to the team you will be engaging with stakeholders in requirements analysis and demonstrations. Technically you will be very hands on. Continually evolving your skills through a mix of peer reviews and research. Ultimately your obsession is customer success and ensuring company goals are met.

What You Will Be Doing

  • Improving production reliability and system resilience within an SRE scoped team
  • Championing high standards of work and industry best practices
  • Communicating with teams and stakeholders at all stages
  • Bringing fresh ideas to the table and encouraging others
  • Diving into complex technical problems with a can-do attitude
  • Working across numerous technologies in a fast-changing industry
  • Participating in on-call rotation, incident response, and blameless post-incident reviews
  • Writing code, handling alerts, improving solutions, and supporting others
  • Playing a crucial role in the success of your company and team
  • What We Are Looking For

  • 5+ years administering Linux systems and related infrastructure in production environments
  • A collaborative SRE mindset, with familiarity around SLIs/SLOs/SLAs, error budgets, blast radius, and blameless postmortems
  • A focus on automation, reducing toil, and preventing problem recurrence
  • A track record of writing runbooks that work for the broader team, not just yourself
  • Strong Kubernetes and broader ecosystem fundamentals
  • Cloud infrastructure experience; AWS strongly preferred and bare-metal is a bonus
  • Strong tool development - Bash, plus either Python or Go preferred, or similar
  • Infrastructure-as-code tooling experience - Terraform preferred
  • CI/CD and version control, GitHub preferred
  • Database experience - one of Postgres, Cassandra, or ClickHouse preferred
  • Experience operating a production observability stack (metrics, logs, traces), with an eye for signal over noise
  • Comfortable working on live production infrastructure, with strong troubleshooting instincts and ownership of incident response
  • A history of continual professional development
  • A self-directed style suited to an async, globally distributed team, and comfortable picking up adjacent work when the situation calls for it
  • What We Offer

  • Flexible working environments
  • Birthday Leave
  • Generous study and training allowance + 5 days paid study leave
  • Creative, fun, and contemporary workspaces
  • Motivated team of industry experts and new talent
  • Celebrated success with ‘Legend’ and ‘Kudos’ Awards
  • Health and wellness program
  • Skills

    AutomationAWSBashBlameless PostmortemsBlast RadiusCassandraCI/CDClickHouseCloud InfrastructureGitHubGOInfrastructure As CodeKubernetesLinuxObservability (metrics/logs/traces)On-call Incident ResponsePostgreSQLProduction TroubleshootingPythonRunbooksSLIs/SLOs/SLAsSRE FundamentalsTerraform

    Explore related jobs

    Browse these categories