210 likes | 411 Views
Growing the Site Reliability Team at LinkedIn: Hiring is Hard. Greg Leffler Manager, Site Reliability https://linkedin.com/in/gleffler gleffler@linkedin.com. Who am I?. Site Reliability Manager (New York) MS in Industrial/Organizational Psychology Responsible for interview process for SREs
E N D
Growing the Site Reliability Team at LinkedIn: Hiring is Hard • Greg Leffler • Manager, Site Reliability • https://linkedin.com/in/gleffler • gleffler@linkedin.com
Who am I? • Site Reliability Manager (New York) • MS in Industrial/Organizational Psychology • Responsible for interview process for SREs • Took this responsibility as an IC, so originated from the bottom up • Team grew 10x from August 2011 to May 2014
Who are SREs at LinkedIn? • 100+ SREs • 5 sites, 2 countries • 1000+ SW Engineers • 8th busiest website in the world • 10k+ prod machines per DC: 2 DCs today +1 in 2014 • 300+ RESTfulservices, 300MM+ members • Services with 99th %ile latencies as low as 10 ms
What matters for a great company? • Funding? • Good idea? • Execution? • Product? • People.
Obligatory LinkedIn culture plug • Talent is our #1 operating priority. • Our culture is what sets us apart. We are committed to supporting the career transformation of our employees. • Transparency is encouraged and emphasized at every company all-hands meeting • Which occur every other week • Our commitment to our employees is emphasized in how we behave • Everyone is encouraged to do interviews! Yes, everyone. • ~60% of SREs participate in the interview process
What do we want from SREs? • Excited about LinkedIn and the SRE role • We have the luxury of being picky • Fit our culture and embody our values • These matter. If you haven’t set them or can’t articulate them, you need that 1st • Have the skills needed to do the job • These also matter. You need to know what these are before you screen for them • AND NOTHING ELSE
These don’t always work • Coding puzzles • “Fermi problems” • Algorithm design questions • If you were a zebra, what pattern would your stripes have? • Homework • Personality tests • Trivia (quick, which signal is #7 in RHEL 6.4 on x86?)
Here’s why • Industrial Psychology has figured this out already • Schmidt & Hunter, 1998 • The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings • Even if they hadn’t, you should collect your own data • And not rely on hunches or cargo cults • Further reading in the notes on this slide.
What does work? • Good funnel at the start • Realistic job previews • Structured interviews • Situational judgment tests
The LinkedIn SRE funnel • Sourcing/screening • Recruiter prescreen • Operationally-focused phone screen (TPS 1) • Code-focused phone screen (TPS 2) • By the time onsite, we expect they will pass. 82% 24%
How do we implement these? • Live Troubleshooting (Realistic job preview) • Systems Internals, Web Architecture (Structured interviews) • Triage & Investigation (Situational judgment test) • Host Manager (structured interview for culture and role fit) • Lunch (not an interview… or is it?)
Live Troubleshooting • Here’s a broken service (in EC2) • Fix it • (As realistic as it gets) • No ‘man voldemort’ • You are probably the 1st person in the world to troubleshoot the exact situation in question
Technical Modules • Added structure and scoring guidelines • Scoring guidelines are what matter • Consistency is the only way you can scientifically prove if these are working • High # of interviewers = need to be able to compare results
Triage & Investigation Module • Situational Judgment and Triage • It’s your first day oncall and the NOC calls to say the site is on fire. Here’s the alert board – what do you look at first? Why? • Assesses standard troubleshooting/investigation ability • “The CEO calls you and says ‘the site is slow’ – what do you do?” • “Disk is full. You delete a file but df still shows the disk being full. What’s wrong?”
Results of implementing changes • Happier candidates • In fact, no unhappy candidates • “The troubleshooting module was the most fun I’ve ever had in an interview” • “I thought the troubleshooting module was hard but I learned so much” • Happier interviewers • Some hesitation at first • Live Troubleshooting is stressful for the interviewer too! • Solve with training and apprenticeships
Data, data, data • We’re collecting scores from each module • Correlating them to performance ratings • Re-evaluating the utility of each module • If a module doesn’t predict performance, get rid of it • This is hard, especially with things people ‘need’ • However, if there’s no correlation, it is worthless.
How to make your process better • Make talent your first priority • Implement the good stuff from I/O psych • Realistic job previews • Situational judgment tests • STRUCTURED interviews • Collect data on interview performance (module scores) • Correlate this to job performance! • Re-evaluate your process
Want to experience it for real? • We’re hiring. See me afterwards. • Office hours are at 2 pm • Any hiring or culture related questions are fair game