Your Impact
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for improving the availability and reliability of some of the firm’s most critical platform services, and ensures they meet the requirements of our internal and external users. We are looking for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment.
The SRE team develops and maintains platforms and tools which help other engineering teams in Goldman Sachs to build and operate reliable and resilient systems. The platforms we offer range from centralized logging and tracing to monitoring and alerting and we provide tools to drive and improve SLIs / SLOs, capacity planning and management, operational readiness assessments, and deployment automation.
The products and services we provide to our internal customers are used by thousands of engineers every day. We believe that reliability is the most important feature of any system, and we are devoted to giving our engineers the tools they need to build and operate reliable products.
How You Will Fulfil Your Potential
As a developer in the SRE Windows Monitoring team you will work with internal customers, vendors, product owners, and SREs to design and develop end-to-end monitoring systems for Windows desktops and servers. You will run a production environment spanning cloud and on-prem datacenters. You will work with internal users to define observability features and drive their implementation.
Responsibilities
- Design, develop and support Windows monitoring
- Collaborate with other teams to onboard them onto SRE-owned platforms
Basic Qualifications
- Degree in computer science or engineering with at least 3 years industry experience
- Proficiency in C# and .Net and willingness to learn new languages and programming paradigms
- Excellent programming skills - developing, debugging, testing, and optimizing code
- Experience with the Windows Operating System and libraries
- Experience with algorithms, data structures and software design
Preferred Experience
- Experience with Microsoft SCOM
- Experience with distributed databases like Microsoft SQL Server, MongoDB and/or Elasticsearch