System Development Engineer - Incident Management, IT Services

apartmentAmazon placeSydney calendar_month 
Amazon Consumer Tier One Support (C-TOS) is the first line of defense for maintaining high availability in the Amazon Retail Website. We make customer impacting events shorter, less frequent, and less severe, by providing large scale event and incident management.
The Amazon Retail Website has hundreds of millions of customers globally who can be impacted by these types of incidents; the work we do to mitigate them helps real people at a tremendous scale. Our automated tooling quickly identifies the cause of an issue and helps mitigate the impact, and much of our engineers’ time is spent on projects to improve the tooling, automation, and processes to avoid future occurrences.

We help direct the resolution of an issue to the relevant service teams, and dive deep into those events retrospectively to drive improvements to our process. It's an exciting time to join our team as we are rapidly growing and expanding our offerings globally.

As a System Development Engineer on you will build tooling to automate the detection and resolution of issues within Amazon’s Retail Website infrastructure. You will also spend a portion of your time of your time directing the resolution of high visibility incidents by leading conference calls, taking notes to collect data and help improve our processes.
Using data and insights learned from those incidents you will drive further improvements into our automation, tooling, and processes so that the next event is shorter, less severe, or avoided entirely. You will participate on project teams to expand use of our tooling to additional areas across Amazon.

This position will be part of a globally distributed team of 20+ engineers across Austin, Dublin, and Sydney to allow for 24x7 coverage. Each group will work 10 hour shifts for 4 days a week. If you're looking for a team with great growth potential and an opportunity to make a huge impact, this is the team to join.

Responsibilities
  • Drive the resolution of large scale customer impacting issues as part of a globally rotating team
  • Design, build, and enhance incident detection and management tools
  • Participate in Agile sprints to evolve business processes and technologies
  • Create and review documentation; design new standard operating procedures
  • Identify and troubleshoot recurring platform issues and own projects to drive improvements
  • Mentor peers in your areas of technical and operational strength
Amazon is an equal opportunity employer.
  • Experience in automating, deploying, and supporting large-scale infrastructure
  • Experience programming with at least one modern language such as Python, Ruby, Golang, Java, C++, C#, Rust
  • Experience with Linux/Unix
  • Experience with CI/CD pipelines build processes- Experience with distributed systems at scale
Acknowledgement of country:

In the spirit of reconciliation Amazon acknowledges the Traditional Custodians of country throughout Australia and their connections to land, sea and community. We pay our respect to their elders past and present and extend that respect to all Aboriginal and Torres Strait Islander peoples today.

IDE statement:

Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer, and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, disability, age, or other legally protected attributes.

Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information.

If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.

apartmentAmazonplaceSydney NSW
and incident management to reduce the potential for failure and to accelerate recovery of critical workloads from disruption. We achieve these objectives by working closely with customers to develop runbooks and response plans customized to the context of each...
apartmentRentokil InitialplaceSydney NSW
and ensuring a secure IT environment. You will be responsible for a range of security-related tasks, from vendor security assessments to incident response and user awareness. Key Responsibilities:  •  Vendor Security Management: Ensure all new vendor security...
business_centerHigh salary

Analyst Programmer

apartmentGOSOL AUSTRALIA PTY LIMITEDplaceSydney NSW
protocols, firewall technologies, and intrusion detection/prevention systems.  •  Proficient in incident response methodologies and security incident management.  •  Excellent analytical and problem-solving skills with the ability to think critically and make...