Systems Engineer, Google.com
Systems Engineer, Google.com - Los Angeles
This position is based in Atlanta, GA; Los Angeles, CA; Pittsburgh, PA; Mountain View, CA; New York, NY; San Francisco, CA or Seattle/Kirkland, WA. The area: Engineering, Google.com Engineering Google.com Engineering makes Google's services fast and reliable for hundreds of millions of users. This mission critical team (also known as Site Reliability Engineering) combines software development, networking, and systems engineering expertise to build and run large scale, massively distributed, fault-tolerant software systems and infrastructure. We hire creative engineers and technology enthusiasts who enjoy being challenged by problems of scale and complexity, with a strong desire to make services better for users. We routinely solve software and systems issues ranging from distributed change propagation on live serving systems, to designing and deploying intelligent load balancing systems for the largest user-facing services in the world. Our teams come from diverse backgrounds, and we are actively seeking new team members to bring fresh perspective to solving problems, along with the technical and soft skills needed to keep Google's services growing and reliable. The role: Systems Engineer, Google.com As a Systems Engineer working on Google's critical production applications and infrastructure, your mission will be to ensure Google is always fast, available, scalable and engineered to withstand unparalleled demand. You will design and develop the systems which run Google Search, Gmail, YouTube, Maps, Docs, Ads, Blogger, AppEngine, Google+ and more. You'll own the production services which comprise *.google.com, as well as key infrastructure like GFS, BigTable, MapReduce, Chubby and large-scale cloud computing clusters. You will also be driving performance and reliability from software and infrastructure at massive scale, where even the 0.01% case must be considered. You will encounter challenging, novel situations every day, and work with just about every other engineering and operations team at Google. You will be looked upon as an expert and advocate to fellow engineers on making design and reliability trade-offs in running large-scale services and engineering complex systems that fail gracefully and transparently to users. The most successful candidates for this role will have strong analytical and troubleshooting skills; fluency in coding, algorithms, and systems design; solid communication skills; and a desire to solve complex problems of scale which are uniquely Google. We are particularly interested in software engineers, systems administrators, and Unix programmers familiar with aspects of running web services at scale. Depth in networking technologies and Unix/Linux internals are strong pluses. Responsibilities: Manage availability, latency, scalability and efficiency of Google services by engineering reliability into software and systems Respond to and resolve emergent service problems; build tools and automation to prevent problem recurrence Review and influence new and evolving design, architecture, standards, and methods for operating services and systems Participate in software and system performance analysis and tuning, service capacity planning and demand forecasting Perform periodic on-call duty as part of a global team Minimum Qualifications: BA/BS degree in Computer Science or related field (In lieu of degree, 4 years relevant work experience) 3 years of relevant work experience, including with Unix/Linux systems requiring the use of languages like Python, C, C++, Java, Perl, Shell or PHP Technical troubleshooting and performance tuning experience Preferred Qualifications: 6 years relevant work experience, including in a high-volume or critical production service environment as well as experience leading short projects involving outside teams Experience coordinating or leading small cross-team technical projects Experience in OSes and systems (e.g. UNIX internals, device drivers, FreeBSD), open source tools (e.g. dtrace, ktrace), web service components (e.g. load balancing, LAMP stack), storage and clustering (e.g. column stores, Hadoop), scripting and programming languages (e.g. Erlang, Haskell, Scala or Scheme) Strong written and spoken English language skills