When interviewing for for a positions as a data scientist, you can always except the usual questions that have stereotypically dominated human resource meetings: “So tell me a bit about yourself.”, “What experiences have brought you here?”, “Is there anything that you believe you will bring to the company?”. Whilst those questions serve as a guideline to gauge your own background and interviewing skills, the technical questions will always be the one to look out for, because they will more likely than not determine the success of your application.
Lets start with a simple one: Are there any data scientists or companies that you hold in high regard?
Although this seems like it would still be part of the non-technical aspect of your interview, you may be surprised. Whilst there is no real wrong answer to this question, it does give the interviewer an idea of your tendencies, after all, you are more likely to reflect the people or entities that you are interested in. This question is also one you can take advantage of by bringing in technical aspects, explain why you hold specific entities to such high regard: is it an algorithm? A paper they’ve published? The improvement of certain facets of the data science field? The more detailed and technical the answer, the better.
This could lead to more technical questions such as:
“How would you be able to discern whether or not statistics published in any form of media have been manipulated to fit a narrative or are indeed presenting the data factually?”
Firstly, looking at the way the data is presented might already give you an indication of its truthfulness: Are the axis labelled? Is the data graphed appropriately? Is the graph clear and concise?
You can go further and research the statistics yourself: Who published it? Are they funded by any particular groups? Why was this data researched in the first place?
From there it should be fairly clear whether or not the statistics themselves are faulty or not.
“How would you link this to the concept of “chart junk”?
If a graph or chart is overly encumbered with figures, extra information or images it can distract from the statistics or worse, confuse an audience as to what it is they are trying to glean from the figure. Any illustration of data charts should be clear, concise, and easy to read, as you want the reader to be able to understand what you are conveying. If certain forms of media are misrepresenting data they are likely to manipulate the illustration of said data in a way that could affect its understanding.
“When considering data, what is selection bias and how can it be avoided?”
Selection bias can occur when you are sampling a non-random population, for example, the distribution of a specific species of trees in a forest. Should your method require random sampling, you would want to ensure you randomly sample grids all across your survey area, and not just specific pockets, as you want your data to be an accurate representation of the whole area. For example, focussing on areas where you know the target tree species is found will skew your results and you will not be able to put forward a viable conclusion.
“Within the same scenario, how would you screen for outliers and what would you do with them?”
There are many different methods to screen outliers, with the most popular being z-scores or box plots. Identifying outliers is important, especially if they skew your data, so it is equally important to point them out whilst interpreting the results and discussing how they interact with or affect your statistics. In certain cases, removing outliers and showing two versions of the same statistic with and without it may help illustrate your analysis.
With all things in mind; it is important to not only study technical aspects that may be called up during an interview-but also take time to learn about the company you’re interview with. Best of luck…