Core Engineering - (Marquee) Site Reliability Engineer - Vice President - Tokyo
MORE ABOUT THIS JOB
The Marquee team at Goldman Sachs is responsible for delivering digital products to our institutional client base. We design and build highly scalable web platforms that provide access to Goldman Sachs content, portfolio analytics, risk, and execution services. These tools help to transform and simplify client experiences while generating new revenue streams and business models for a leader in global financial markets. Marquee is a product-driven team, composed of talented and passionate product managers, designers, and engineers working to change the expectation of institutional finance.
As a Site Reliability Engineer on the Marquee team, you will be responsible for managing and supporting highly-available distributed systems, ensuring the reliability of the production environment, building tools to reduce toil and increase insight into trouble spots, and implementing effective governance controls. You will work with a number of proprietary configuration management and deployment systems in close collaboration with other GS teams. Your Impact
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. At Goldman Sachs, SRE is responsible for the availability and reliability of our firm's most critical platform services, and ensures they meet the requirements of our internal and external users. We look for engineers who are motivated to collaborate with our businesses to build and run sustainable production systems, which can evolve and adapt to changes in our fast-paced, global business environment. RESPONSIBILITIES AND QUALIFICATIONS How will you fulfil your potential?
- Balance feature development velocity and reliability with well-defined SLOs.
- Run the Production environment by monitoring availability and taking a holistic view of system health.
- Drive incident management process and support a blameless post-mortems culture.
- Partner with development teams to improve services via rigorous testing and release procedures.
- Participate in system design consulting, platform management, and capacity planning.
- Create sustainable systems and services through automation and uplifts.
- BS degree in Computer Science or related technical field involving coding and / or systems engineering.
- Proficiency in one or more of the following: Go, Python, C, C++, Java, Perl, Ruby or shell scripting.
- Experience with algorithms, data structures and software design and/or Experience with UNIX operating systems internals and / or networking.
- Experience with distributed systems design, maintenance, and troubleshooting.
- Hands-on experience with debugging and optimizing code, as well as automation.
- Strong interpersonal skills, drive, and ownership.
- Coding beyond simple scripts.
- Solving novel problems from first principles