머신러닝 엔지니어 인터뷰
Most of the questions below from https://brainstation.io/career-guides/machine-learning-engineer-interview-questions
What is the difference between supervised learning and unsupervised learning?
The biggest difference is that unsupervised learning does not require explicitly labeled data, while supervised learning does – before you can do a classification, you must label the data to train the model to classify data into the correct groups.
- What are the different types of machine learning?
- Supervised Learning, Unsupervised Learning, Reinforcement Learning
- What is deep learning, and how does it contrast with other machine learning algorithms?
- What are the differences between machine learning and deep learning?
- What is the difference between artificial intelligence and machine learning?
- Deep learning is a type of machine learning, which is a subset of artificial intelligence.
- Explain the confusion matrix with respect to machine learning algorithms.
- A Confusion matrix is an N x N matrix used for evaluating the performance of a classification model, where N is the number of target classes.
- The matrix compares the actual target values with those predicted by the machine learning model.
- the True Positive is the number of ------
- This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.
- , False positive and that you can calculate things such as Precision and Recall from it.
- What’s the trade-off between bias and variance?
- Bias is the simplifying assumptions made by the model to make the target function easier to approximate.
- Variance is the amount that the estimate of the target function will change given different training data.
- Trade-off is tension between the error introduced by the bias and the variance.
- wikipedia
통계학과 기계 학습 분야에서 말하는 편향-분산 트레이드오프(Bias-variance tradeoff) (또는 딜레마(dilemma))는 지도 학습 알고리즘이 트레이닝 셋의 범위를 넘어 지나치게 일반화 하는 것을 예방하기 위해 두 종류의 오차(편향, 분산)를 최소화 할 때 겪는 문제이다.
편향-분산 분해는 학습 알고리즘의 기대 오차를 분석하는 한 가지 방법으로, 오차를 편향, 분산, 그리고 데이터 자체가 내재하고 있어 어떤 모델링으로도 줄일수 없는 오류의 합으로 본다. 편향-분산 트레이드 오프는 분류(classification), 회귀분석[1][2], 그리고 구조화된 출력 학습(structed output learning) 등 모든 형태의 지도 학습에 응용된다. 또한 사람의 학습에서 직관적 판단 오류(heuristics)의 효과성을 설명하기 위해 언급되기도 한다.
- Explain the difference between L1 and L2 regularization.
- The main intuitive difference between the L1 and L2 regularization is that L1 regularization tries to estimate the median of the data / while the L2 regularization tries to estimate the mean of the data to avoid overfitting.
- The model performs accurately on training data but fails to perform well on test data and also produces high error due to several factors such as collinearity, bias-variance impact and over modeling on train data.
- For example, when the model learns signals as well as noises in the training data but couldn’t perform appropriately on new data upon which the model wasn’t trained, the condition/problem of overfitting takes place.
- Overfitting simply states that there is low error with respect to training dataset, and high error with respect to test datasets.
- Various methods can be adopted, for avoiding overfitting of models on training data, such as cross-validation sampling, reducing number of features, pruning, regularization and many more.
- What’s your favorite algorithm, and can you explain it to me in less than a minute?
- My favorite algorithm is Naive Bayes classification algorithm based on Bayes’ Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.
- How is KNN different from k-means clustering?
- K-means clustering represents an unsupervised algorithm, mainly used for clustering, while KNN is a supervised learning algorithm used for classification.
- What is cross validation and what are different methods of using it?
- Cross-validation, sometimes called rotation estimation or out-of-sample testing to assess how the results of a statistical analysis will generalize to an independent data set.
- Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations.
- Explain how a ROC curve works.
- The ROC curve shows the trade-off between sensitivity (or TPR) and specificity (1 – FPR). Classifiers that give curves closer to the top-left corner indicate a better performance. As a baseline, a random classifier is expected to give points lying along the diagonal (FPR = TPR). The closer the curve comes to the 45-degree diagonal of the ROC space, the less accurate the test.
- The receiving operating characteristic is a measure of classifier performance. Using the proportion of positive data points that are correctly considered as positive and the proportion of negative data points that are mistakenly considered as positive.
- What’s the difference between probability and likelihood?
- Probability is used to finding the chance of occurrence of a particular situation, whereas Likelihood is used to generally maximizing the chances of a particular situation to occur.
- What’s the difference between a generative and discriminative model?
- In simple words, a discriminative model makes predictions on the unseen data based on conditional probability and can be used either for classification or regression problem statements. On the contrary, a generative model focuses on the distribution of a dataset to return a probability for a given example.
How is a decision tree pruned?- How can you choose a classifier based on a training set size?
- If the training set is small, high bias / low variance models (e.g. Naive Bayes) tend to perform better because they are less likely to overfit.
- If the training set is large, low bias / high variance models (e.g. Logistic Regression) tend to perform better because they can reflect more complex relationships.
- What methods for dimensionality reduction do you know and how do they compare with each other?
- PCA(Principal Component Analysis) and High Correlation Filter are my favorite methods for dimensionality reduction.
- PCA is one of the most common feature selection method.
- PCA is a technique which helps us in extracting a new set of variables from an existing large set of variables. These newly extracted variables are called Principal Components.
- A principal component is a linear combination of the original variables
- Principal components are extracted in such a way that the first principal component explains maximum variance in the dataset
- The second principal component tries to explain the remaining variance in the dataset and is uncorrelated to the first principal component
- The third principal component tries to explain the variance which is not explained by the first two principal components and so on
- High Correlation Filter
- We can calculate the correlation between independent numerical variables that are numerical in nature. If the correlation coefficient crosses a certain threshold value, we can drop one of the variables
- Define precision and recall.
- In pattern recognition, information retrieval and classification (machine learning), precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
- Precision (also called positive predictive value) is the fraction of relevant instances among the retrieved instances, while recall (also known as sensitivity) is the fraction of relevant instances that were retrieved. Both precision and recall are therefore based on relevance.
What’s a Fourier transform?- What’s the difference between Type I and Type II error?
- When should you use classification over regression?
- How would you evaluate a logistic regression model?
- What is Bayes’ Theorem? How is it useful in a machine learning context?
- Bayes theorem provides a way to calculate the probability of a hypothesis based on its prior probability, the probabilities of observing various data given the hypothesis, and the observed data itself.
- Describe a hash table.
- Hash Table is a data structure which stores data in an associative manner. In a hash table, data is stored in an array format, where each data value has its own unique index value. Access of data becomes very fast if we know the index of the desired data.
'인공지능-기계학습' 카테고리의 다른 글
스탠포드 기계학습 강의 Lecture 1 - Stanford CS229: Machine Learning - Andrew Ng (Autumn 20 (0) | 2022.02.27 |
---|---|
인공지능 이야기(간단 역사와 연구분류) (0) | 2022.02.20 |
인공지능 기계학습 용어 상식 - 지도학습/비지도학습/강화학습 (0) | 2021.07.04 |
[연재] 고객분석 2 - 워드 클라우드를 이용한 고객 성향 분석 및 판단 (0) | 2021.06.12 |
[연재] 고객분석 1- 고객 구매 행동 모델 및 성격유형 (0) | 2021.06.12 |