Good Morning. I know it’s been quite a long time since I have written on the site. Today I would like to share with you some questions that I have faced during my data science interviews at TCS and other companies. You can print this webpage out and keep it with you as a handy guide for your preparation.


  • Explain outlier detection with boxplot
  • What is IQR (Inter quartile range)
  • What are the various measures to summarize the distribution used? (they may also give a certain distribution as example)
  • What is the difference between Point Estimates and Confidence Interval?
  • What is p-value and what is its significance?
  • What are the different types of error in hypothesis testing?
  • What are the assumptions of Linear Regression ?
  • What is the difference between Correlation and Covariance?
  • What is bias-variance trade-off?
  • What is a confusion matrix?
  • What is the goal of A/B Testing?
  • What Is the Law of Large Numbers?
  • What is the Central Limit Theorem?

You can find the answer to all of these on our website. You can also check out our data science resources page.

Visualization & EDA

Before moving into the topic, let me introduce you to our newest part of the website, data viz, so definitely check that out !

  • Univariate vs Bivariate analysis – types of distributions in each
  • Handling missing values and outliers using visualizations
  • Sliding window viualizations
  • What are the different data types supported by Tableau?
  • What are the types of joins in Tableau?
  • What are the different filters in Tableau? Differentiate.
  • Explain some important features of Power BI.
  • What is the difference between Managed Enterprise BI and Self-service BI


A lot of you may have been waiting for this part, so here goes –

  • Logistic regression basics – cost function, usage
  • Why logistic regression is called regression
  • Why naive bayes is called naive
  • Assumptions of naive bayes
  • Outlier detection
  • Decision trees
  • Explain how decision trees work
  • Bagging vs boosting
  • Ensemble: random forest
  • What is logit in deep learning
  • kernel in SVM
  • Hinge loss
  • Cross entropy loss function
  • Categorical Cross entropy vs binary cross entropy vs multi label cross entropy
  • Gradient descent and its variants
  • Activation functions: sigmoid, tanh, relu ranges and uses
  • LSTM and the 3 gates
  • Your favourite ML algorithm – what is it and why?
  • Bias vs variance tradeoff
  • Confusion Matrix and accuracy metrics (precision / recall )
  • Word2vec in layman terms
  • Underfitting vs overfitting
  • Regularisation
  • Ridge lasso
  • K means clustering
  • Vanishing gradient
  • Regularisation in Neural Networks !

And with that, we move on. Again, if you need these topics in more detail let us know in the comments 😀

Recommendations Systems

Let me add a quick note before we move forward. There is one question that came up quite a few times, so I wanted to mention that –

  • how do we evaluate the performance of a recommender system whether it is performing better than the old system?
  • Whar are main difference between IOT and big data in recommendation system?
  • Are recommendation systems good for us?
  • How reinforcement learning involved in a recommendation system?
  • Recommendation Engines / Recommender Systems : What are the software platforms, approaches, algorithms?

Natural language processing

  • TF-IdF
  • why do we use IDF
  • word2vec
  • Stemming
  • lemmatization
  • RNN
  • long sequence problem
  • LSTM

NLP Advanced

  • attention mechanism
  • context vector
  • attention networks
  • encoder-decoder
  • transformers
  • bert


Now some questions I got on the Keras deep learning framework :-

  • shape of LSTM input in the case of word embeddings
  • return_state – what is it?
  • What are return_sequence parameters and their impact on output shape?
  • Types are layers in keras ?
  • What is Sequence Preprocessing in keras ?
  • What is activation function ?
  • What are Different Types of initializers in keras ?

Basic python & coding

This section is also quite important, and please don’t overlook it.

  • List vs tuple
  • Pull request in git
  • Fibonacci sequence in python
  • Intersection/ union of 2 lists in python
  • Frequency of elements in a list in python
  • What’s PEP 8?
  • How is linked list implemented in Python
  • How memory management is done in Python?
  • Is Python a compiled language or an interpreted language?
  • What are Decorators?
  • What is the difference between Mutable datatype and Immutable datatype? Is string mutable?
  • What is Dictionary Comprehension? Give an Example
  • What is the difference between xrange and range function?
  • What is monkey patching in Python?
  • Define encapsulation in Python?
  • How do you do data abstraction in Python?
  • What is __init__() in Python?
Doubts? WhatsApp me !