Pandas Interview Questions

Checkout Vskills Interview questions with answers in Pandas to prepare for your next job role. The questions are submitted by professionals to help you to prepare for the Interview.

Q.1 What is Pandas?
Pandas is an open-source Python library that provides data structures and data analysis tools for working with structured data, such as tables and time series data.
Q.2 What are the two primary data structures in Pandas?
The two primary data structures in Pandas are Series and DataFrame.
Q.3 What is a Series in Pandas?
A Series is a one-dimensional array-like object in Pandas that can hold data of various types. It is similar to a column in a spreadsheet or a labeled array.
Q.4 What is a DataFrame in Pandas?
A DataFrame is a two-dimensional, tabular data structure in Pandas that consists of rows and columns. It is similar to a spreadsheet or a SQL table.
Q.5 How do you create a Series in Pandas?
You can create a Series by using the pd.Series() constructor and passing a list, array, or dictionary as data along with optional labels (index).
Q.6 How do you create a DataFrame in Pandas?
You can create a DataFrame by using the pd.DataFrame() constructor and passing data, columns, and index labels as arguments.
Q.7 What is the purpose of an index in a Pandas DataFrame?
The index in a Pandas DataFrame serves as a label for rows, allowing for efficient data access, alignment, and retrieval. It can be used to uniquely identify each row.
Q.8 How can you access the first few rows of a DataFrame?
You can use the df.head(n) method to access the first n rows of a DataFrame df.
Q.9 What is the primary function of the loc and iloc attributes in Pandas?
loc is used for label-based indexing (selecting rows and columns by labels), while iloc is used for integer-based indexing (selecting rows and columns by integer positions).
Q.10 How do you select a single column from a DataFrame?
You can select a single column by using square brackets (df['column_name']) or dot notation (df.column_name) on the DataFrame df.
Q.11 How do you select multiple columns from a DataFrame?
You can select multiple columns by passing a list of column names as df[['col1', 'col2']].
Q.12 How can you filter rows in a DataFrame based on a condition?
You can filter rows by using a boolean condition inside square brackets, such as df[df['column_name'] > value].
Q.13 How do you drop columns from a DataFrame?
You can drop columns using the df.drop(columns=['col1', 'col2']) method or by using the df.drop('col_name', axis=1) method to specify the column and axis.
Q.14 What is the purpose of the inplace parameter in Pandas methods?
The inplace parameter, when set to True, modifies the DataFrame in place, without creating a new DataFrame. When set to False (default), it returns a new DataFrame with the changes.
Q.15 How can you handle missing data (NaN) in a DataFrame?
You can handle missing data by using methods like df.dropna(), df.fillna(value), or df.interpolate() to remove, fill, or interpolate missing values, respectively.
Q.16 What is the significance of the groupby operation in Pandas?
The groupby operation allows you to group data based on one or more columns and perform aggregations or calculations within each group, facilitating data analysis and summarization.
Q.17 How do you calculate descriptive statistics for a DataFrame?
You can use the df.describe() method to calculate statistics like mean, standard deviation, min, max, and quartiles for numeric columns in a DataFrame.
Q.18 How can you sort a DataFrame by a specific column?
You can sort a DataFrame by a specific column using the df.sort_values(by='column_name') method, where 'column_name' is the name of the column to sort by.
Q.19 What is the purpose of the apply method in Pandas?
The apply method allows you to apply a custom function to each element or row in a DataFrame, making it useful for performing element-wise or row-wise operations.
Q.20 How do you merge two DataFrames in Pandas?
You can merge two DataFrames using the pd.merge() function, specifying the left and right DataFrames, keys, and the type of join (inner, outer, left, or right).
Q.21 What is the difference between an inner join and an outer join in Pandas?
In an inner join, only the matching rows from both DataFrames are included in the result. In an outer join, all rows from both DataFrames are included, with NaNs in non-matching areas.
Q.22 How can you pivot data in Pandas using the pivot method?
You can pivot data using the df.pivot(index='row_column', columns='column_column', values='value_column') method, which reshapes the DataFrame based on specified columns.
Q.23 What is the purpose of the pd.concat() function in Pandas?
The pd.concat() function is used to concatenate multiple DataFrames along a specified axis, either row-wise (axis=0) or column-wise (axis=1), creating a new DataFrame.
Q.24 What do you understand by Pandas?
Pandas is a Python language package which gives fast, flexible, and expressive data structures for processing relational or labelled data easily thus, helping in data analysis. Pandas is free software released under the three-clause BSD license.
Q.25 How can you rename columns in a Pandas DataFrame?
You can rename columns using the df.rename(columns={'old_name': 'new_name'}) method, where 'old_name' is the current column name, and 'new_name' is the new column name.
Q.26 What is the utility of Python pandas?
Pandas is a Python programming language library which is used for data manipulation and analysis by providing data structures and operations for manipulating numerical tables and time series.
Q.27 How do you create a new column in a DataFrame based on existing columns?
You can create a new column by using the assignment operator (df['new_column'] = ...) and specifying the calculation or expression based on existing columns.
Q.28 What do you understand by a Series in Pandas?
Pandas Series refers to a one-dimensional array which can store data of any type (integer, string, float, python objects, etc.) similar to a column in an excel sheet.
Q.29 How can you apply a function to each element in a DataFrame column?
You can apply a function to a column using the df['column'].apply(function) method, where function is the custom function to be applied to each element in the column.
Q.30 What is the use of Reindexing in Pandas?
Reindexing in Pandas reorders the existing data to match a new set of labels, inserts missing value (NA) markers in label locations where no data for that label existed and if specified, fill data for missing labels using logic (highly relevant to working with time series data)
Q.31 How can you change the data type of a column in a DataFrame?
You can change the data type of a column using the df['column_name'].astype(new_dtype) method, where new_dtype is the desired data type (e.g., 'int', 'float', 'str').
Q.32 What will list the top rows of the frame in Pandas?
In Pandas, df.head() will list the top rows of the frame
Q.33 What is the purpose of the pd.to_datetime() function in Pandas?
The pd.to_datetime() function is used to convert a column of date-like or string-like values to a datetime data type, making it suitable for date and time manipulation.
Q.34 List the different Types of Data structures in pandas?
There are 2 data structures in pandas: Series and DataFrames which are built on top of Numpy. Series is a one-dimensional data structure in pandas and DataFrame is the two-dimensional data structure.
Q.35 How do you perform arithmetic operations on DataFrame columns?
You can perform arithmetic operations on columns by using operators like +, -, *, /, or by using methods like df['col1'] + df['col2'] or df['col1'].add(df['col2']).
Q.36 How to create copy of series in Pandas?
The Series.copy() will create copy of series in Pandas.
Q.37 What is the role of the pivot_table method in Pandas?
The pivot_table method allows you to create pivot tables, summarizing data by aggregating values in rows and columns, with options for handling missing values and duplicate entries.
Q.38 What do you understand by Time Series in pandas?
A time series in pandas refers to an ordered sequence of data representing changes over time.
Q.39 How can you handle duplicate rows in a DataFrame?
You can handle duplicate rows using the df.drop_duplicates() method to remove duplicates based on specified columns, keeping the first occurrence by default.
Q.40 How does pandas supports Time Series data?
Pandas supports Time Series data by offering parsing capability for time series information from various sources and formats, generating sequences of fixed-frequency dates and time spans, ability to manipulate and convert date time with timezone information as well as resampling or converting a time series to a particular frequency with ability to perform date and time arithmetic with absolute or relative time increments
Q.41 How do you reset the index of a DataFrame?
You can reset the index of a DataFrame using the df.reset_index() method, which renumbers the index and optionally creates a new index column.
Q.42 What do you understand by Categorical Data in Pandas?
Categorical Data in Pandas refers to categorical variables in statistics which takes on a limited and usually fixed, number of possible values like gender, social class, blood type, country affiliation, etc. Values of categorical data are either in categories or np.nan.
Q.43 How can you select rows in a DataFrame based on multiple conditions?
You can select rows based on multiple conditions by using the & (and) and `.
Q.44 How Categorical Data in Pandas is used?
The Categorical Data in Pandas is useful in various use cases like by converting to a categorical and specifying an order on the categories, sorting and min/max will use the logical order instead of the lexical order or converting a string variable having only a few different values to a categorical variable to save memory and also as a signal to other Python libraries that this column should be treated as a categorical variable.
Q.45 What is the purpose of the crosstab function in Pandas?
The crosstab function is used to compute a cross-tabulation (frequency table) of two or more factors, providing a summary of the relationships between categorical variables.
Q.46 How to create a series in Pandas from dict in Python?
To create a series in Pandas from dict in Python, we will use Series() method without index parameter.
Q.47 How do you perform string operations on DataFrame columns?
You can perform string operations on columns containing text data using the .str accessor, which provides a wide range of string methods, such as .str.contains(), .str.split(), etc.
Q.48 What do you understand by a DataFrame in pandas?
DataFrame in pandas is a 2-dimensional size-mutable, usually heterogeneous tabular data structure having labelled axes or rows and columns. A Data frame has 3 components- the data, rows, and columns.
Q.49 What is the significance of the fillna method in Pandas?
The fillna method is used to fill missing or NaN values in a DataFrame or Series with specified values or strategies, ensuring that the data remains complete for analysis.
Q.50 How to create a Pandas DataFrame?
A Pandas DataFrame can be created by loading the datasets from datasets like SQL Database, CSV file, and Excel file or from the lists, dictionary, list of dictionary etc.
Q.51 How do you calculate the correlation between columns in a DataFrame?
You can calculate the correlation between columns using the df.corr() method, which computes the Pearson correlation coefficient between numeric columns by default.
Q.52 How to create an empty DataFrame in pandas?
The DataFrame() function creates a empty Pandas dataframe as: pd.DataFrame() and for an empty dataframe with three empty column (columns X, Y and Z)- pd.DataFrame(columns=[‘X’, ‘Y’, ‘Z’])
Q.53 What is the purpose of the pd.cut() function in Pandas?
The pd.cut() function is used for binning (categorizing) continuous data into discrete intervals, creating a new categorical column that represents data in a more structured way.
Q.54 How to delete columns from a Pandas Data Frame?
The drop() method is used for deleting a column from the DataFrame and the axis argument is either 0 if it indicates the rows and 1 if it drops the columns.
Q.55 How can you apply a custom function to a DataFrame row-wise?
You can apply a custom function to DataFrame rows using the df.apply(function, axis=1) method, where function operates on each row, receiving a Series representing the row.
Q.56 How to Delete the duplicate values from the columns from a Pandas Data Frame?
In Pandas, to delete the duplicate values from the column use the drop_duplicates() method.
Q.57 What is the role of the merge method in Pandas?
The merge method combines two or more DataFrames by aligning and joining rows based on common columns or keys, similar to database joins (e.g., inner, outer, left, right).
Q.58 How to Delete remove duplicate rows from the Pandas Data Frame?
In Pandas, to delete the duplicate rows from the Pandas Data Frame, we use df.drop_duplicates() method.
Q.59 How do you handle datetime data in Pandas?
You can handle datetime data in Pandas by using the pd.to_datetime() function, extracting date components (e.g., year, month) with .dt, and performing date arithmetic.
Q.60 Which pandas library is used to create a scatter plot matrix?
The Scatter_matrix pandas library is used to create a scatter plot matrix
Q.61 What is the purpose of the stack and unstack methods in Pandas?
stack is used to pivot columns into rows, creating a MultiIndex Series, while unstack pivots rows into columns, reshaping data based on specified levels of the index.
Q.62 What is pylab?
PyLab is a package that contains NumPy, SciPy, and Matplotlib into a single namespace.
Q.63 How can you apply a function element-wise to all columns in a DataFrame?
You can apply a function to all columns using the df.applymap(function) method, which applies function to each element in the DataFrame, returning a new DataFrame with the results.
Q.64 What is used for styling in Pandas?
The styling in Pandas is accomplished using CSS
Q.65 How do you create a DataFrame from an external data source (e.g., CSV file)?
You can create a DataFrame from an external data source using the pd.read_csv('file.csv') method, specifying the file path and options like delimiter, header, and encoding.
Q.66 What will iterate over a Pandas DataFrame?
We can iterate on the rows of the DataFrame by using for loop in combination along with calling the iterrows() function on the DataFrame.
Q.67 How can you calculate the sum, mean, and count of values in a specific column?
You can calculate the sum using df['col'].sum(), the mean using df['col'].mean(), and the count using df['col'].count() for a specific column 'col' in a DataFrame df.
Q.68 What will output the items of series A not present in series B in Pandas?
We use the isin() method in Pandas to get the items of series A not present in series B
Q.69 How do you handle categorical variables in Pandas?
You can handle categorical variables by converting them to the category data type using df['col'].astype('category'), which can save memory and improve performance for analysis.
Q.70 How you keep yourself updated of new trends in Pandas?
Pandas and data science are seeing newer development every year and I update myself by attending industry seminars, conferences as available online or offline.
Q.71 What is the purpose of the pd.to_numeric() function in Pandas?
The pd.to_numeric() function is used to convert a column to numeric data type, handling conversion errors or specifying how to treat non-numeric values (e.g., 'coerce' or 'ignore').
Q.72 How do you see yourself in next five year in Pandas?
I foresee a bright future as I will gain more skills and knowledge in the domain of Pandas and data science by adding new technologies as needed by my organization.
Q.73 How can you rename index labels in a Pandas DataFrame?
You can rename index labels using the df.rename(index={'old_label': 'new_label'}) method, specifying a dictionary that maps old labels to new labels.
Q.74 What are your strengths as a Pandas professional?
As a Pandas professional I am having extensive experience on the new data science technologies as well as managing the Pandas and Python. I also have the requisite managerial skills for managing team and achieve the assigned tasks.
Q.75 What is the difference between NaN and None in Pandas?
NaN represents missing or undefined numerical data, while None is a Python object representing missing or undefined data for non-numeric types (e.g., strings or objects).
Q.76 How do you prioritize Pandas related tasks?
Pandas based data analysis involves many tasks on a day to day basis. Tasks also need to be prioritized to accomplish the organizational goals as per the specified KPIs (key performance indicators). Prioritization of tasks is done on the basis of various factors like: the tasks relevance, urgency, cost involved and resource availability.
Q.77 How do you drop rows with missing values from a DataFrame?
You can drop rows with missing values using the df.dropna() method, which removes rows containing NaN values, optionally specifying the axis and subset of columns to consider.
Q.78 How you manage your time for Pandas related development?
Pandas based data analysis involves lots of tasks which need to be completed in a specific time frame. Hence time management is of utmost importance and is applied by: using to do lists, being aware of time wasters and optimizing work environment
Q.79 What is the purpose of the pivot method in Pandas?
The pivot method reshapes data in a DataFrame by specifying columns for rows, columns, and values, effectively creating a pivot table with rows and columns structured as desired.
Q.80 Why do you want the Pandas job?
I want the Pandas job as I am passionate about making companies more efficient by using new data analysis technologies especially Pandas and also take stock of present technology portfolio to maximize their utility.
Q.81 How can you change the order of columns in a Pandas DataFrame?
You can change the order of columns by selecting and rearranging them using double square brackets, such as df[['col3', 'col1', 'col2']] to change the column order as desired.
Q.82 What do you think of most important role of a Pandas professional?
As a Pandas professional my focus is to provide the effective and efficient programs and data analysis using pandas so as to fulfil the data analysis or data science related need. Adopting technologies which are more efficient and effective for the organization. Reducing costs without losing on quality or speed of production is the primary motto.
Q.83 How do you create a copy of a DataFrame in Pandas?
You can create a copy of a DataFrame using the df.copy() method, which duplicates the data and index, allowing you to modify the copy without affecting the original DataFrame.
Q.84 What is the purpose of the pd.to_csv() function in Pandas?
The pd.to_csv() function is used to export a Pandas DataFrame to a CSV file, allowing you to save data for later analysis or share it with others in a standard tabular format.
Q.85 How can you calculate the cumulative sum of a column in a DataFrame?
You can calculate the cumulative sum of a column using the df['col'].cumsum() method, which returns a Series with the cumulative sum of values in the specified column 'col'.
Q.86 What is the role of the merge_asof method in Pandas?
The merge_asof method performs an asof merge, aligning rows from two DataFrames based on the nearest key column, useful for time-series data where you want to match values by time.
Q.87 How do you apply a custom function to a specific column in a DataFrame?
You can apply a custom function to a specific column using the df['col'].apply(function) method, where function operates on each element of the specified column 'col'.
Q.88 What is the purpose of the set_index method in Pandas?
The set_index method is used to set a specific column as the DataFrame's index, allowing for faster access and alignment based on that column when performing operations and analysis.
Q.89 How can you perform element-wise operations on two DataFrames?
You can perform element-wise operations on two DataFrames by using operators like +, -, *, /, or by using methods like df1.add(df2) to perform addition element-wise between two DataFrames.
Q.90 What is the significance of the map method in Pandas?
The map method is used to replace values in a Series or DataFrame column with specified values or values from another Series, making it useful for data transformation and mapping tasks.
Q.91 How can you handle time zones in Pandas?
You can handle time zones by using the tz_localize() and tz_convert() methods to assign and convert time zones for datetime data, ensuring that time-based operations are accurate.
Q.92 What is the purpose of the resample method in Pandas?
The resample method is used for time-series data to resample and aggregate data at different time frequencies (e.g., daily to monthly) and perform operations like sum or mean on data points.
Q.93 How do you create a DataFrame from a dictionary of Series?
You can create a DataFrame from a dictionary of Series by using the pd.DataFrame({'col1': series1, 'col2': series2}) constructor, where each Series becomes a column in the DataFrame.
Q.94 How can you handle duplicate column names in a DataFrame?
You can handle duplicate column names by specifying unique column names or using the df.rename() method to rename columns with duplicate names to make them unique.
Q.95 What is the role of the nunique method in Pandas?
The nunique method calculates the number of unique values in a Series or DataFrame column, helping to assess the cardinality of data and identify columns with few unique values.
Q.96 How can you calculate the percentage change between consecutive rows in a DataFrame?
You can calculate the percentage change between consecutive rows in a DataFrame using the df['col'].pct_change() method, which computes the relative change from one row to the next.
Q.97 What is the purpose of the shift method in Pandas?
The shift method is used to shift (lag) the values in a Series or DataFrame by a specified number of periods, allowing you to compare current and past values or create time lags in data.
Q.98 How do you create a pivot table in Pandas using the pivot_table method?
You can create a pivot table using the df.pivot_table() method by specifying index columns, columns to pivot, values to aggregate, and aggregation functions (e.g., 'sum', 'mean').
Q.99 How can you merge two DataFrames with different index columns?
You can merge DataFrames with different index columns by using the left_on and right_on parameters in the pd.merge() function to specify the columns to use as keys for merging.
Q.100 What is the purpose of the rolling method in Pandas?
The rolling method is used for rolling window calculations on time-series data, allowing you to perform operations like moving averages or sums over a specified window of consecutive data.
Q.101 How do you apply a custom function to rows and columns simultaneously in a DataFrame?
You can apply a custom function to rows and columns simultaneously using the df.applymap(function) method for element-wise operations and the df.apply(function, axis=1) method for rows.
Q.102 How can you handle hierarchical (MultiIndex) columns in a DataFrame?
You can handle hierarchical columns by creating a MultiIndex using df.columns = pd.MultiIndex.from_tuples(tuples) and accessing columns using df['level1']['level2'] indexing.
Q.103 What is the purpose of the explode method in Pandas?
The explode method is used to transform columns containing lists or arrays into separate rows, expanding the DataFrame to make individual values in lists or arrays accessible as rows.
Q.104 How can you handle outliers in a DataFrame column?
You can handle outliers by using techniques such as winsorization, z-scores, or IQR-based methods to identify and optionally transform or remove outliers from a specific column in a DataFrame.
Q.105 What is the role of the pd.get_dummies() function in Pandas?
The pd.get_dummies() function is used for one-hot encoding categorical variables, converting them into binary columns, which is useful for machine learning models that require numeric input.
Q.106 How can you calculate the cumulative product of a column in a DataFrame?
You can calculate the cumulative product of a column using the df['col'].cumprod() method, which returns a Series with the cumulative product of values in the specified column 'col'.
Q.107 What is the purpose of the where method in Pandas?
The where method is used for conditional replacement of values in a DataFrame or Series, allowing you to replace values that meet a condition while keeping others unchanged.
Q.108 How do you calculate the difference between consecutive elements in a DataFrame column?
You can calculate the difference between consecutive elements in a column using the df['col'].diff() method, which computes the arithmetic difference from one row to the next.
Q.109 What is the role of the cumcount method in Pandas?
The cumcount method is used to calculate the cumulative count of occurrences in a DataFrame column, providing a running count of how many times each value has appeared in the data.
Q.110 How can you create a cross-tabulation (contingency table) in Pandas?
You can create a cross-tabulation using the pd.crosstab(index=df['col1'], columns=df['col2']) function, specifying two columns to tabulate, resulting in a contingency table of counts.
Get Govt. Certified Take Test