Good Morning. I know it’s been quite a long time since I have written on the site. Today I would like to share with you some questions that I have faced during my data science interviews at TCS and other companies. You can print this webpage out and keep it with you as a handy guide for your preparation.

## Statistics • Explain outlier detection with boxplot
• What is IQR (Inter quartile range)
• What are the various measures to summarize the distribution used? (they may also give a certain distribution as example)
• What is the difference between Point Estimates and Confidence Interval?
• What is p-value and what is its significance?
• What are the different types of error in hypothesis testing?
• What are the assumptions of Linear Regression ?
• What is the difference between Correlation and Covariance?
• What is a confusion matrix?
• What is the goal of A/B Testing?
• What Is the Law of Large Numbers?
• What is the Central Limit Theorem?

You can find the answer to all of these on our website. You can also check out our data science resources page.

## Visualization & EDA

Before moving into the topic, let me introduce you to our newest part of the website, data viz, so definitely check that out !

• Univariate vs Bivariate analysis – types of distributions in each
• Handling missing values and outliers using visualizations
• Sliding window viualizations
• What are the different data types supported by Tableau?
• What are the types of joins in Tableau?
• What are the different filters in Tableau? Differentiate.
• Explain some important features of Power BI.
• What is the difference between Managed Enterprise BI and Self-service BI

## ML & DL

A lot of you may have been waiting for this part, so here goes –

• Logistic regression basics – cost function, usage
• Why logistic regression is called regression
• Why naive bayes is called naive
• Assumptions of naive bayes
• Outlier detection
• Decision trees
• Explain how decision trees work
• Bagging vs boosting
• Ensemble: random forest
• What is logit in deep learning
• kernel in SVM
• Hinge loss
• Cross entropy loss function
• Categorical Cross entropy vs binary cross entropy vs multi label cross entropy
• Gradient descent and its variants
• Activation functions: sigmoid, tanh, relu ranges and uses
• LSTM and the 3 gates
• Your favourite ML algorithm – what is it and why?
• Confusion Matrix and accuracy metrics (precision / recall )
• Word2vec in layman terms
• Underfitting vs overfitting
• Regularisation
• Ridge lasso
• K means clustering
• Regularisation in Neural Networks !

And with that, we move on. Again, if you need these topics in more detail let us know in the comments 😀

## Recommendations Systems

Let me add a quick note before we move forward. There is one question that came up quite a few times, so I wanted to mention that –

• how do we evaluate the performance of a recommender system whether it is performing better than the old system?
• Whar are main difference between IOT and big data in recommendation system?
• Are recommendation systems good for us?
• How reinforcement learning involved in a recommendation system?
• Recommendation Engines / Recommender Systems : What are the software platforms, approaches, algorithms?

## Natural language processing

• TF-IdF
• why do we use IDF
• word2vec
• Stemming
• lemmatization
• RNN
• long sequence problem
• LSTM

• attention mechanism
• context vector
• attention networks
• encoder-decoder
• transformers
• bert

## Keras

Now some questions I got on the Keras deep learning framework :-

• shape of LSTM input in the case of word embeddings
• return_state – what is it?
• What are return_sequence parameters and their impact on output shape?
• Types are layers in keras ?
• What is Sequence Preprocessing in keras ?
• What is activation function ?
• What are Different Types of initializers in keras ?

## Basic python & coding

This section is also quite important, and please don’t overlook it.

• List vs tuple
• Pull request in git
• Fibonacci sequence in python
• Intersection/ union of 2 lists in python
• Frequency of elements in a list in python
• What’s PEP 8?
• How is linked list implemented in Python
• How memory management is done in Python?
• Is Python a compiled language or an interpreted language?
• What are Decorators?
• What is the difference between Mutable datatype and Immutable datatype? Is string mutable?
• What is Dictionary Comprehension? Give an Example
• What is the difference between xrange and range function?
• What is monkey patching in Python?
• Define encapsulation in Python?
• How do you do data abstraction in Python?
• What is __init__() in Python?