Machine Learning

Machine Learning and Artificial Intelligence are the trending technologies in 2019. Machine Learning (ML), globally recognized as a key driver of digital transformation. So if you want to become a part of it then you must need a sound knowledge and these interview question can also help you to some extent. Checkout these now !!

Q.1 Differentiate between supervised and unsupervised machine learning.
Firstly, supervised learning requires training labelled data. For instance, in order to do classification (a supervised learning task), we would require to first label the data which would use to train the model to classify data into your labelled groups. On the other hand unsupervised learning, does not require labelling data explicitly.
Q.2 How does a ROC curve works?
ROC curve refers to a graphical representation of the contrast between true positive rates and the false positive rate at various thresholds. RIOC curve often used as a proxy for the trade-off between the sensitivity of the model (true positives) in contrast to the fall-out or the probability it will trigger a false alarm (false positives).
Q.3 What is your favourite algorithm, and can you explain it in a minute?
Such a question asked to assess the understanding of the candidate of how to communicate complex and technical nuances with poise and the ability to summarize quickly and efficiently. It is very important to ensure you must have a choice and make sure to explain different algorithms so simply and effectively that a child could understand the basics quickly.
Q.4 How can you explain a Fourier transform?
Fourier transform is defined as a generic method to decompose generic functions into a superposition of symmetric functions. This considered as more intuitive tutorial puts it, given a smoothie, it’s how we find the recipe. The Fourier transform finds the set of cycle speeds, amplitudes and phases to match any time signal. A Fourier transform converts a signal from time to frequency domain — such that it is a very common way to extract features from audio signals or other time series such as sensor data.
Q.5 What do you understand by deep learning, and how does it contrast with other machine learning algorithms?
Deep learning is defined as a subset of machine learning that is concerned with neural networks and how to use back propagation and certain principles from neuroscience to more accurately model large sets of unlabelled or semi-structured data. Deep learning represents an unsupervised learning algorithm that learns representations of data through the use of neural nets.
Q.6 Differentiate between a generative and discriminative model?
Generative model helps to learn categories of data while a discriminative model will simply learn the distinction between different categories of data. Such that discriminative models will generally outperform generative models on classification tasks.
Q.7 What do you understand by the F1 score and how can you use it?
F1 score is defined as a measure of a model’s performance. F1 score is a weighted average of the precision and recall of a model, with results tending to 1 being the best, and those tending to 0 being the worst. We can use it in classification tests where true negatives don’t matter much.
Q.8 Give examples where ensemble techniques might be considered useful.
Since ensemble techniques uses a combination of learning algorithms to optimize better predictive performance. They typically reduce overfitting in models and make the model more robust. Thereafter list some examples of ensemble methods, from bagging to boosting to a “bucket of models” method and demonstrate how they could increase predictive power.
Q.9 How can we ensure that we are not overfitting with a model?
Since it is a simple restatement of a fundamental problem in machine learning therefore the possibility of overfitting training data and carrying the noise of that data through to the test set, thereby providing inaccurate generalizations. There are three main methods to avoid overfitting:
1. Keep the model simpler: reduce variance by taking into account fewer variables and parameters, thereby removing some of the noise in the training data.
2. Use cross-validation techniques such as k-folds cross-validation.
3. Use regularization techniques such as LASSO that penalize certain model parameters if they’re likely to cause overfitting.
Q.10 How can we handle missing or corrupted data in a dataset?
We could find missing/corrupted data in a dataset and either drop those rows or columns, or decide to replace them with another value. There are primarily two very useful methods - isnull() and dropna() that will help to find columns of data with missing or corrupted data and drop those values. Such that if you want to fill the invalid values with a placeholder value (for example, 0), you could use the fillna() method.
Q.11 What kind of experience do you have with Spark or big data tools for machine learning?
In order to answer this question it is very important to be familiar with the meaning of big data for different companies and the different tools that they will want. Here, spark is the big data tool most in demand now, which is able to handle immense datasets with speed. But in case if you do not have experience with the tools demanded, then you must take a look at job descriptions and see what other tools pop up that would be of interest.
Q.12 Differentiate between a linked list and an array?
Firstly, We define an array as an ordered collection of objects, where on the hand a linked list is a series of objects with pointers that direct how to process them sequentially. Secondly an array assumes that every element has the same size, unlike the linked list, where on the other hand a linked list can more easily grow organically. Thirdly, an array has to be pre-defined or re-defined for organic growth, where on the other hand shuffling a linked list involves changing which points direct where — meanwhile, shuffling an array is more complex and takes more memory.
Q.13 How can you implement a recommendation system for the company’s users?
In general there a lot of machine learning interview questions that involve implementation of machine learning models to a company’s problems. You are required to have a good research about the company and its industry in-depth, especially the revenue drivers the company has, types of users the company takes with reference to the industry it is in.
Q.14 In order to generate revenue how should we implement your machine learning skills?
One of the most tricky questions and your answer will demonstrate knowledge of what drives the business and how your skills could relate. For instance, if you were interviewing for music-streaming company, then you could remark that your skills at developing a better recommendation model could increase user retention, which would then increase revenue in the long run.
Q.15 What do you know about our current data process?
This type of question needs an well researched answer which could impart feedback in a manner that is constructive and insightful. The purpose of the interviewer is to try to gauge if you would be a valuable member of their team and if you would be able grasp the nuances of why certain things are set the way they are in the company’s data process based on company-or industry-specific conditions. The interviewer wants to see if you can be an intellectual peer to them or not.
Q.16 Where do you usually source datasets?
These kind of questions try to get at the heart of your machine learning interest. So someone who is truly passionate about machine learning will have gone off and done side projects on their own, and holds a good idea of what great datasets are out there.
Q.17 Illustrate some factors that explain the success and rise of deep learning?
Success of deep learning in the past decade can be explained by three main factors -
1. More data - The availability of massive labelled datasets allows us to train models with more parameters and achieve state-of-the-art scores. Other ML algorithms do not scale as well as deep learning when it comes to dataset size.
2. GPU - The training models on a GPU can reduce the training time by orders of magnitude compared to training on a CPU. Currently, cutting-edge models are trained on multiple GPUs or even on specialized hardware.
3. Improvements in algorithms - The ReLU activation, dropout, and complex network architectures have also been very significant factors.
Q.18 What do you understand by data augmentation and can you illustrate some examples?
Data augmentation can be defined as a technique for synthesizing new data by modifying existing data in such a way that the target is not changed, or it is changed in a known way. Computer vision is one of fields where data augmentation is very useful. There are many modifications that we can do to images such as resize, horizontal or vertical flip, rotate, add noise, deform, modify colours etc. Such that each problem needs a customized data augmentation pipeline. For instance on OCR, doing flips will change the text and won’t be beneficial; however, resizes and small rotations may help.
Q.19 What is a hash table?
We can define a hash table as a data structure that produces an associative array. Such that a key is mapped to certain values through the use of a hash function. They are often used for tasks such as database indexing.
Get Govt. Certified Take Test