Goldman Sachs employs thousands of engineers across many divisions. We write software that powers every aspect of our business. It's no surprise that we collect and analyze vast amounts of data relating to the the efficiency of our development processes, the value that we drive through software, and the effectiveness of how we leverage our internal tooling and services to empower development. Our data allows our internal businesses to optimize, focus their attention and manage costs. It allows us to assess the impact and performance of developer productivity initiatives, cost saving initiatives, and manage our huge software inventory.
What We NeedWe are seeking a highly skilled Data Software Engineer to design, implement, and maintain robust data systems and pipelines that empower our organization to leverage data effectively. The ideal candidate will come from a software engineering background, with commercial development experience in one or more object-oriented languages (Python, Go, Java, C#). Knowledge of data modeling, pipeline construction, data normalization and sanitization, and data governance would be highly advantageous. They will collaborate with cross-functional teams to ensure the availability, quality, and security of data to support business objectives.
Key Responsibilities- Data Pipeline Development
- Design, build, and maintain scalable and efficient data pipelines for processing and transforming large datasets.
- Ensure pipelines are optimized for reliability and performance.
- Data Modeling and Architecture
- Develop and maintain logical and physical data models tailored to business needs.
- Optimize data storage solutions for scalability and performance.
- Data Quality and Sanitization
- Implement processes for data normalization, deduplication, and cleaning to ensure high-quality datasets.
- Identify and resolve data inconsistencies, errors, and anomalies.
- Data Governance and Security
- Establish and enforce data governance standards, including policies for data access, compliance, and privacy.
- Implement security measures to protect sensitive data and ensure compliance with regulatory requirements.
- Monitoring and Optimization
- Develop monitoring solutions to ensure the health and reliability of data pipelines and systems.
- Continuously optimize performance, storage, and costs of data infrastructure.
- Education:
- Bachelor’s or Master’s degree in Computer Science, Engineering, Data Science, or a related field.
- Experience:
- 3+ years of experience in software engineering, data engineering or a related role.
- Proven experience with data modeling, ETL/ELT pipelines, and data architecture design.
- Proficiency in programming languages such as Python, Java, or Scala.
- Strong knowledge of SQL and relational databases (e.g., PostgreSQL, Sybase, SQL Server).
- Desirable Technical Skills:
- Experience with big data technologies (e.g., Hadoop, Spark) and cloud platforms (e.g., AWS, GCP, Azure).
- Familiarity with data integration tools (e.g., Apache Airflow, Talend, Informatica) and streaming technologies (e.g., Kafka, Flink).
- Experience with data warehouse technologies (e.g., Snowflake, Redshift, BigQuery).
- Soft Skills:
- Strong analytical and problem-solving abilities.
- Effective communication and collaboration skills to work with diverse teams.
- Attention to detail and a proactive approach to ensuring data integrity.
- Experience in conforming to data governance frameworks.
- Knowledge of data lake architectures and unstructured data processing.
- Familiarity with machine learning workflows and AI-driven data applications.
- Certifications in relevant technologies (e.g., AWS Certified Data Analytics, Google Professional Data Engineer).