An HBR issue once said: "Data Scientist is the sexiest job of the 21st century". Not entirely false, in my opinion. The impact an average individual Data Scientist can create in the current day with their work is much more than any moderate professional can. Data Science helps you make efficient decisions, automate slow and tedious processes and enhances your reach. If you were to scale a business, would you do it by hiring more and believing someone who has a fancy degree or using data to automate and make decisions? The answer is obvious.
Now that your company board has bought the power that Data Science brings to the table, the question is what next. If you are a lead Data Scientist at a company or part of the management at a company and want to build a Data Science team from scratch, how do you go about it? Frankly, there are no silver bullet solutions for this, but I can share some experiences from my start-up, ParallelDots, that might illuminate a part of the picture.
Data Scientist is not a single job profile.
Let's start with the job postings and CV screening process. What job profile do you put up? How do you screen resumes? Do you look at someone who has been a Data Scientist at XYZ Inc. for five years when looking for senior talent? Much more planning than this is required before you put up job openings and start screening. The organization knows it wants to focus on Data Science, but the person building the Data Science team will need a clearer picture than this. First, the data science team should figure out the focus areas by defining the problems it looks to solve.
Data Scientist is almost as varied a job profile as a Doctor is, for example. Sure, all doctors know some essential aspects of all fields, but they are specialists in one area. So, a surgeon can look into ENT cases and maybe diagnose well too, but there is a reason you go to a surgeon for surgery and an ENT specialist for an ear infection - expertise.
Steps to put job openings and resume screenings should be:
A) Start with identifying critical problems your organization needs to address using Data Science.
B) Understand different types of Data Science resumes - Tabular Machine Learning experts, Deep Learning experts, Statistical/Mathematical Modellers, Data Engineering experts, Analysts, Dashboard Designers. One can find a detailed description of such profiles and the tools they generally use here ( https://www.kdnuggets.com/2017/03/7-types-data-scientist-job-profiles.html )
C) Once you understand what type of Data Scientist profiles are the closest to your requirements, create a job description accordingly highlighting expertise in the tools people in that profile would use.
D) Just like employers grouping Data Science profiles, applicants also often apply to any open Data Scientist position without comparing the job description with their expertise. Many applicants can be filtered very simply by understanding whether their expertise is relevant to the company's data science use cases. At ParallelDots, we go even more specific by filtering for people knowing about Image Processing using Deep Learning for our retail Computer Vision platform.
Now that you have shortlisted relevant resumes, the next step is to interview candidates.
I think different people would have different views about what the right way to interview is. Whiteboard programming interviews and interviews about Machine Learning theory don't necessarily test the skills a candidate would apply in their day-to-day job. Knowing about the theory of 50 Machine Learning algorithms is not as helpful in the real world.
Capabilities a Data Scientist needs every day is a subset of:
A. Figuring out the usage of a new library/algorithm they would need to solve a problem.
B. Processing data into a better workable format.
C. Training a known Machine Learning/Deep Learning algorithm to get a baseline.
D. Hacking and fine tuning the algorithm
E. Making insights available on a dashboard
F. Some research to hone oneself professionally
I believe in interviewing a person on the day-to-day skills they will be needing. So, at ParallelDots, we design a few hour-long assignments [4-8 hours depending on the profile] for candidates to work on. We take care of the following:
1. We design new assignments for all open positions every quarter.
2. The assignments are on open data so that applicants can work on their rigs, laptops or google collab and share their work as a link. For candidates who cannot get selected, their assignment would add value to their Data Science portfolio and resume
3. We specifically ask them to demonstrate the capabilities listed above in this assignment. No other technical round.We have tried in-person coding rounds too, but we believe that puts too much pressure on candidates, so we give them the assignment to work at their leisure using the internet and any open-source code. Naturally, the assignment designer should be clever enough to make sure the task is not just solvable by copying code from GitHub but also needs some more efforts. Taking code from public Data Science repositories is perfectly fine even in professional settings, no points blocking it in an interview.
In my experience, there are two different attributes in people that define what type of tasks they should take up in the team. The attributes are:
* Being very good at open-ended tasks
* Being extremely good at similar or repetitive tasks
Almost all tasks in your Data Science ticket management system will relate to one out of the two attributes listed. I call these attributes adventure-seeking and responsibility-seeking, respectively.
Data Scientists who are good at the same tools and skills often tend to be either former or latter, or some brilliant ones have both the attributes. Why are these attributes important, you ask? In the real world, people good at solving open-ended problems and research are glorified, while most work in a company consists of similar or repetitive tasks. Due to this, people who are good at repetitive tasks and are often needed a lot by the company, remain dissatisfied with their work. If you think of it, it is crazy that someone valuable to the company due to their skills is unhappy with their jobs. The internet culture makes people hate their strengths in this case. Thus, the Data Science leaders will need to reassure everyone about their importance in the team constantly.
The open-ended or exploratory problems are not everyone's cup of tea. It's easy to realize that an open-ended project is not going anywhere after spending a lot of time on it and so such projects need to be judged critically from day one onwards. People working on such projects need to work with a great sense of urgency and be willing to break things rather than just spending a lot of time without experiments. At least in exploratory Data Science projects, attention to detail is very critical, or one will keep facing failures. So here you have attributes of people who need to work on open ended data science projects: sense of urgency, not very uncomfortable outside their zones and having attention to detail.
A lot of people don't have the aptitude for such projects for various reasons. Two traits you will see in people who are not good at open-ended projects while they are working on one:
1. They are perfectionists, running after an "ideal" way/theory. Such people spend too much time on one method or thought. Data Science is empirical, and you cannot just prejudge the right solution. The right way is often many rounds of trial and error and some learning from reading.
2. Their rate of failure is very high. Exploratory projects are often solved with minor incremental improvements and not multiple large failures followed by one success.
Responsibility seekers on the other hand want clear goals and will work extremely hard to achieve them. If you look at the task list of a Data Science team, most of them are clear, well-defined tasks for which you can set a deadline. Responsibility seekers are really good at these. Adventure Seekers on the other hand, are not people who want to venture into unchartered waters as most of us would think, they are the ones who can handle the time-pressure and expectations outside their comfort zone and can still deliver.
Allocate tickets accordingly. Remember, most people would always want to do open-ended tasks as they are considered hip, despite their personality fit for other functions. As you might have figured out, Data Scientist may be a sexy job due to the end product it creates, but functionally, it is gritty and hands-on like any other vocation.
There is a famous joke on LinkedIn:
"CFO: What happens if we train them and they leave?
CEO: What happens if we don't, and they stay?"
As a Data Science leader, you have to make sure your employees acquire skills and learn constantly. Data science is possibly the fastest evolving field globally, and technical and theoretical skills have a half-life of 2 years at max. You would want all employees to take the initiative and stay well informed. However, not everyone is going to be equally open to learning. The best way is to have team activities where you make sure people learn something new every week.
At ParallelDots, we organize weekly sessions, where:
*Senior members read about a new and relevant Deep Learning paper or library and give a small talk to the team.
* Junior members take up a small seminar on something they are reading. Senior members help them understand concepts that are not too clear during the seminar.
Some members might not want to give a talk or seminar, but they must be present during seminars. The hope is that they will take away something from these sessions despite being passive participants. We observed that eventually, more and more people come up to give talks when we organized such sessions.
That’s it folks. These are my thoughts around building a team. If you have any questions, you can ping me on my twitter handle: @muktabh or on my Quora account: quora.com/Muktabh-Mayank. You can follow @paralleldots to get updates about our AI technology.