Good Morning. I know it’s been quite a long time since I have written on the site. Today I would like to share with you some questions that I have faced during my data science interviews at TCS and other companies. You can print this webpage out and keep it with you as a handy guide for your preparation.

## Statistics

- Explain outlier detection with boxplot
- What is IQR (Inter quartile range)
- What are the various measures to summarize the distribution used? (they may also give a certain distribution as example)
- What is the difference between Point Estimates and Confidence Interval?
- What is p-value and what is its significance?
- What are the different types of error in hypothesis testing?
- What are the assumptions of Linear Regression ?
- What is the difference between Correlation and Covariance?
- What is bias-variance trade-off?
- What is a confusion matrix?
- What is the goal of A/B Testing?
- What Is the Law of Large Numbers?
- What is the Central Limit Theorem?

You can find the answer to all of these on our website. You can also check out our data science resources page.

## Visualization & EDA

Before moving into the topic, let me introduce you to our newest part of the website, data viz, so definitely check that out !

- Univariate vs Bivariate analysis – types of distributions in each
- Handling missing values and outliers using visualizations
- Sliding window viualizations
- What are the different data types supported by Tableau?
- What are the types of joins in Tableau?
- What are the different filters in Tableau? Differentiate.
- Explain some important features of Power BI.
- What is the difference between Managed Enterprise BI and Self-service BI

## ML & DL

A lot of you may have been waiting for this part, so here goes –

- Logistic regression basics – cost function, usage
- Why logistic regression is called regression
- Why naive bayes is called naive
- Assumptions of naive bayes
- Outlier detection
- Decision trees
- Explain how decision trees work
- Bagging vs boosting
- Ensemble: random forest
- What is logit in deep learning
- kernel in SVM
- Hinge loss
- Cross entropy loss function
- Categorical Cross entropy vs binary cross entropy vs multi label cross entropy
- Gradient descent and its variants
- Activation functions: sigmoid, tanh, relu ranges and uses
- LSTM and the 3 gates
- Your favourite ML algorithm – what is it and why?
- Bias vs variance tradeoff
- Confusion Matrix and accuracy metrics (precision / recall )
- Word2vec in layman terms
- Underfitting vs overfitting
- Regularisation
- Ridge lasso
- K means clustering
- Vanishing gradient
- Regularisation in Neural Networks !

And with that, we move on. Again, if you need these topics in more detail let us know in the comments 😀

## Recommendations Systems

Let me add a quick note before we move forward. There is one question that came up quite a few times, so I wanted to mention that –

- how do we evaluate the performance of a recommender system whether it is performing better than the old system?
- Whar are main difference between IOT and big data in recommendation system?
- Are recommendation systems good for us?
- How reinforcement learning involved in a recommendation system?
- Recommendation Engines / Recommender Systems : What are the software platforms, approaches, algorithms?

## Natural language processing

- TF-IdF
- why do we use IDF
- word2vec
- Stemming
- lemmatization
- RNN
- long sequence problem
- LSTM

## NLP Advanced

- attention mechanism
- context vector
- attention networks
- encoder-decoder
- transformers
- bert

## Keras

Now some questions I got on the Keras deep learning framework :-

- shape of LSTM input in the case of word embeddings
- return_state – what is it?
- What are return_sequence parameters and their impact on output shape?
- Types are layers in keras ?
- What is Sequence Preprocessing in keras ?
- What is activation function ?
- What are Different Types of initializers in keras ?

## Basic python & coding

This section is also quite important, and please don’t overlook it.

- List vs tuple
- Pull request in git
- Fibonacci sequence in python
- Intersection/ union of 2 lists in python
- Frequency of elements in a list in python
- What’s PEP 8?
- How is linked list implemented in Python
- How memory management is done in Python?
- Is Python a compiled language or an interpreted language?
- What are Decorators?
- What is the difference between Mutable datatype and Immutable datatype? Is string mutable?
- What is Dictionary Comprehension? Give an Example
- What is the difference between xrange and range function?
- What is monkey patching in Python?
- Define encapsulation in Python?
- How do you do data abstraction in Python?
- What is __init__() in Python?