What We Do
At Goldman Sachs, our Engineers don’t just make things – we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action. Create new businesses, transform finance, and explore a world of opportunity at the speed of markets.
Engineering, which is comprised of our Technology Division and global strategists groups, is at the critical centre of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions. Want to push the limit of digital possibilities? Start here.
Who We Look For
The Runtime Platforms team delivers a highly available and scalable ecosystem for Goldman Sachs engineers worldwide, enabling them to run workloads across various compute platforms. As part of the firm's strategic move to increase the usage of containerization to package, deliver and run software, the runtime team plays a critical role in ensuring that we offer container platforms that are secure, reliable and operationally inexpensive so that business teams can focus on delivering value to shareholders. The platforms offered by the runtime team support some of the firm's most critical businesses and services both on-premises and in public cloud.
At Goldman Sachs, our culture is one of teamwork, innovation and meritocracy. We often say our people are our greatest asset and we take pride in supporting each colleague both professionally and personally. You will be joining a talented globally distributed team passionate about delivering the best possible experience to our users.
The ecosystem comprises a mix of proprietary software and kubernetes and related components to offer services such as
- Job scheduling
- Event streaming
- Log shipping
- Data warehouses
- Security infrastructure
RESPONSIBILITIES
These will differ depending on the specific team and role
- Own technical operations for systems that manage hundreds of thousands of compute cores
- Build observability for new deployments to ensure robustness from day one, as well as mature deployments to identify and implement improvements
- Troubleshoot and resolve various operational issues
- Lead real-time outage investigations and present postmortems to senior management
- Design and develop platform enhancements (Kubernetes specific), monitoring tooling and software, deployment and upgrade tooling.
- Define SLIs and SLOs and partner with development teams to ensure system are sufficiently well designed and instrumented
- Plan and manage deployments and migrations, including end-of-life programs
- Plan and implement robust business continuity and security programs
- Provide regional coverage for the platform and participate in the on-call support
QUALIFICATIONS
- Excellent problem-solving and automation skills
- Strong Linux fundamentals and system administration skills
- Good networking fundamentals (familiarity with TCP/IP, IP routing, firewalls, secure tunneling protocols)
- Experience working with distributed computing systems and Cloud computing environments
- Proficiency in at least one programming language; the team uses a mix of Go, Python and Erlang
- Able to operate effectively in a mission critical, highly regulated financial services environment
- Experience with Kubernetes (only necessary for the Kubernetes team)