Python Interview Questions for Data Science in India in 2026 — Complete Guide From Pandas to NumPy for Freshers

Python has become the dominant language for data science in India, and every data analyst and data scientist interview now includes a Python round. If you are a fresher preparing for campus placements or your first job switch, mastering the Python interview questions for data science India 2026 covered in this guide will give you a serious competitive edge.

This guide covers all the Python interview questions for data science in India in 2026 that recruiters at companies like TCS, Infosys, Wipro, Amazon, Flipkart, Swiggy, and analytics-focused startups ask freshers and junior candidates. Every question includes a clear explanation, working code, and interview tips so you know not just the answer but also how to deliver it confidently.

Table of Contents

Why Python Is Now Central to Data Science Interviews in India 2026

Five years ago, many Indian companies still relied primarily on Excel and SAS for data work. In 2026, Python is the default tool for data science and analytics across the Indian tech industry. This has made Python interview questions for data science India 2026 a core screening component in nearly every hiring process.

Recruiters test Python in data science interviews because it reveals:

Logical thinking — can the candidate structure a solution step by step?
Library familiarity—Pandas, NumPy, Matplotlib, Scikit-learn proficiency
Code cleanliness — readable, efficient code shows professional maturity
Problem-solving speed — can the candidate produce working code under time pressure?

Understanding what is expected helps you approach Python interview questions for data science India 2026 with confidence rather than anxiety.

Section 1 — Core Python Basics (Every Fresher Must Know These)

These foundational Python interview questions for data science India 2026 appear in online assessments and first technical rounds.

Q1. What are the main data types in Python?

Answer: Python has the following core data types:

int — whole numbers (10, -5, 1000)
float decimal numbers (3.14, -0.5)
str — text strings (“hello,” “India”)
bool — True or False
list — ordered, mutable collection: [1, 2, 3]
tuple — ordered, immutable collection: (1, 2, 3)
dict — key-value pairs: {"name": "Arjun", "age": 25}
set — unordered unique values: {1, 2, 3}

For Python interview questions for data science in India in 2026, you must be fluent with lists, dictionaries, and tuples—they are used constantly in data manipulation.

Q2. What is the difference between a list and a tuple in Python?

Answer:

List — mutable (can be changed after creation): [1, 2, 3]
Tuple — immutable (cannot be changed): (1, 2, 3)

Use tuples when data should not change (like coordinates or database records). Use lists when you need to add, remove, or modify items dynamically.

python

my_list = [1, 2, 3]
my_list.append(4)    # Works fine

my_tuple = (1, 2, 3)
my_tuple[0] = 10     # Raises TypeError — tuples are immutable

Q3. What is list comprehension, and why is it used in data science?

Answer: List comprehension is a concise, Pythonic way to create a new list by applying an expression to each item in an existing iterable.

python

# Standard loop approach
squares = []
for x in range(10):
    squares.append(x**2)

# List comprehension — same result, cleaner code
squares = [x**2 for x in range(10)]

# With condition — squares of even numbers only
even_squares = [x**2 for x in range(10) if x % 2 == 0]

List comprehension is important in Python interview questions for data science India 2026 because it demonstrates Pythonic coding style that employers value.

Q4. What is the difference between deepcopy an assignment and a variable in Python?

Answer:

= creates a reference—both variables point to the same object in memory. Changes to one affect the other.
deepcopy creates a fully independent copy at all levels.

python

import copy

original = [[1, 2], [3, 4]]
shallow = original            # Same reference
deep = copy.deepcopy(original)

original[0][0] = 99
print(shallow[0][0])  # 99 — changed! (same object)
print(deep[0][0])     # 1 — unchanged (independent copy)

Understanding memory references is a nuanced Python interview question for the Data Science India 2026 topic that separates well-prepared candidates.

Q5. What are Python decorators, and how are they used?

Answer: A decorator is a function that wraps another function to add functionality without modifying it. Common in data science frameworks and APIs.

python

def timer_decorator(func):
    import time
    def wrapper(*args, **kwargs):
        start = time.time()
        result = func(*args, **kwargs)
        print(f"Execution time: {time.time() - start:.4f}s")
        return result
    return wrapper

@timer_decorator
def process_data(df):
    return df.groupby('category').sum()

Section 2 — Pandas Questions (Most Important for Data Science)

Pandas is the core library for data manipulation in Python and generates the most Python interview questions for Data Science India 2026 in any interview.

Q6. How do you load a CSV file and check basic info about a DataFrame?

python

import pandas as pd

df = pd.read_csv('sales_data.csv')

print(df.head())          # First 5 rows
print(df.tail())          # Last 5 rows
print(df.shape)           # (rows, columns)
print(df.info())          # Column names, types, non-null counts
print(df.describe())      # Statistical summary of numeric columns
print(df.columns)         # Column names
print(df.dtypes)          # Data type of each column

Q7. How do you handle missing values in a Pandas DataFrame?

python

# Check missing values
print(df.isnull().sum())
print(df.isnull().mean() * 100)  # As percentage

# Drop rows with any missing values
df_clean = df.dropna()

# Drop columns with more than 50% missing
df_clean = df.dropna(axis=1, thresh=len(df)*0.5)

# Fill numeric columns with mean
df['salary'].fillna(df['salary'].mean(), inplace=True)

# Fill categorical columns with mode
df['city'].fillna(df['city'].mode()[0], inplace=True)

# Forward fill — use previous value
df['price'].fillna(method='ffill', inplace=True)

Missing value handling is one of the highest-frequency Python interview questions for data science India 2026 topics because it is a daily task in real data work.

Q8. How do you filter rows in a Pandas DataFrame based on multiple conditions?

python

# Single condition
high_earners = df[df['salary'] > 100000]

# Multiple conditions — use & and | with parentheses
senior_high_earners = df[
    (df['salary'] > 100000) &
    (df['experience_years'] >= 5)
]

# Using isin for multiple values
metro_customers = df[df['city'].isin(['Mumbai', 'Delhi', 'Bengaluru'])]

# Using query method — more readable
result = df.query("salary > 100000 and department == 'Engineering'")

Q9. What is the difference between apply(), and applymap() in Pandas?

Answer:

map() — works on a Series; applies a function element-by-element
apply() — works on a Series or DataFrame; can apply row-wise or column-wise functions
applymap() (now map() in Pandas 2.x) — applies element-wise on an entire DataFrame

python

# map on Series
df['salary_category'] = df['salary'].map(
    lambda x: 'High' if x > 100000 else 'Medium' if x > 50000 else 'Low'
)

# apply on DataFrame column
df['name_length'] = df['name'].apply(len)

# apply row-wise
df['full_info'] = df.apply(
    lambda row: f"{row['name']} from {row['city']}", axis=1
)

Q10. How do you perform a groupby and aggregate in Pandas?

python

# Single aggregation
category_revenue = df.groupby('category')['revenue'].sum()

# Multiple aggregations
summary = df.groupby('department').agg(
    avg_salary=('salary', 'mean'),
    max_salary=('salary', 'max'),
    employee_count=('employee_id', 'count')
).reset_index()

# Multiple columns
region_product = df.groupby(['region', 'product'])['sales'].sum().reset_index()

GroupBy aggregation is the Pandas equivalent of SQL’s GROUP BY and appears in almost every set of Python interview questions for data science India 2026.

Q11. How do you merge two DataFrames in Pandas?

python

# Inner merge — only matching rows
merged = pd.merge(df_orders, df_customers, on='customer_id', how='inner')

# Left merge — all rows from left
left_merged = pd.merge(df_orders, df_customers, on='customer_id', how='left')

# Merge on different column names
merged = pd.merge(
    df1, df2,
    left_on='cust_id', right_on='customer_id',
    how='inner'
)

# Concatenate DataFrames vertically (stack rows)
combined = pd.concat([df_jan, df_feb, df_mar], ignore_index=True)

Q12. How do you sort a DataFrame and get the top N rows per group?

python

# Sort by single column
df_sorted = df.sort_values('revenue', ascending=False)

# Sort by multiple columns
df_sorted = df.sort_values(['department', 'salary'], ascending=[True, False])

# Top 3 rows per department (groupby + head)
top3_per_dept = (
    df.sort_values('salary', ascending=False)
    .groupby('department')
    .head(3)
)

Section 3 — NumPy Questions

Python interview questions for data science in India in 2026 always include NumPy—the foundation that Pandas is built on.

Q13. What is the difference between a Python list and a NumPy array?

Answer:

NumPy arrays are faster and more memory-efficient than lists for mathematical operations
Arrays are homogeneous (all elements are the same type); lists can mix types
NumPy supports vectorized operations—mathematical operations apply to all elements simultaneously without loops

python

import numpy as np

arr = np.array([1, 2, 3, 4, 5])

# Vectorised operation — no loop needed
doubled = arr * 2         # [2, 4, 6, 8, 10]
squared = arr ** 2        # [1, 4, 9, 16, 25]
above_three = arr[arr > 3]  # [4, 5] — boolean indexing

Q14. What are broadcasting and vectorization in NumPy?

Answer: Broadcasting allows NumPy to perform operations on arrays of different shapes by automatically “stretching” smaller arrays to match the shape of larger ones. Vectorization means applying operations to entire arrays without explicit Python loops—making code dramatically faster for large datasets.

python

arr = np.array([1, 2, 3, 4, 5])
print(arr + 10)    # [11, 12, 13, 14, 15] — scalar broadcast to all elements

matrix = np.array([[1, 2, 3], [4, 5, 6]])
row_add = np.array([10, 20, 30])
print(matrix + row_add)  # Row added to each row of matrix

Q15. How do you reshape, flatten, and transpose arrays in NumPy?

python

arr = np.arange(12)               # [0, 1, 2, ..., 11]
matrix = arr.reshape(3, 4)       # 3 rows, 4 columns
flat = matrix.flatten()          # Back to 1D
transposed = matrix.T            # 4 rows, 3 columns

Section 4 — Data Visualisation Questions

Plotting questions are a growing part of Python interview questions for data science in India’s 2026 rounds at product companies.

Q16. How do you plot a histogram in Python to understand data distribution?

python

import matplotlib.pyplot as plt
import seaborn as sns

# Matplotlib
plt.hist(df['salary'], bins=20, color='steelblue', edgecolor='white')
plt.title('Salary Distribution')
plt.xlabel('Salary (₹)')
plt.ylabel('Count')
plt.show()

# Seaborn (more aesthetically pleasing)
sns.histplot(df['salary'], kde=True, bins=20)
plt.title('Salary Distribution with KDE')
plt.show()

Q17. How do you create a correlation heatmap in Python?

python

import seaborn as sns
import matplotlib.pyplot as plt

corr_matrix = df[['salary', 'experience', 'performance_score']].corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Feature Correlation Heatmap')
plt.show()

Section 5—Machine Learning Basics (Freshers and Junior Level)

These Python interview questions for data science India 2026 are asked at companies that expect analysts to have ML exposure.

Q18. What is a train-test split, and why is it important?

python

from sklearn.model_selection import train_test_split

X = df.drop('target', axis=1)
y = df['target']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

The split ensures the model is evaluated on data it has never seen during training—preventing overfitting. Explaining this correctly is critical in Python interview questions for data science India 2026 ML rounds.

Q19. How do you encode categorical variables for a machine learning model?

python

# Label Encoding — for ordinal categories
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['city_encoded'] = le.fit_transform(df['city'])

# One-Hot Encoding — for nominal categories
df_encoded = pd.get_dummies(df, columns=['city'], drop_first=True)

# Using sklearn OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(sparse_output=False)
encoded = enc.fit_transform(df[['city']])

Q20. What is the difference between overfitting and underfitting?

Answer:

Overfitting—model learns the training data too well, including noise. High training accuracy, low test accuracy. Fix with regularization, more data, or a simpler model.
Underfitting — model is too simple to capture patterns. Low accuracy on both training and test data. Fix it with a more complex model or better features.

The bias-variance tradeoff that explains overfitting vs. underfitting is a fundamental Python interview question for data science in India in 2026 and a conceptual topic at all levels.

10 Quick-Fire Python Questions for Data Science Freshers in India

Q21. What is the difference between == `and` and or is in Python? → == checks value equality. is checks if both variables point to the exact same object in memory.

Q22. What is *args `*args`? → *args accepts any number of positional arguments as a tuple. **kwargs accepts any number of keyword arguments as a dictionary.

Q23. How do you find the shape of a Pandas DataFrame? → It df.shape returns a tuple of (rows, columns).

Q24. What is the purpose of apply reset_index() in Pandas? → Resets the DataFrame index back to 0, 1, 2… after operations like groupby or filtering that alter the index.

Q25. What is value_counts() in pandas? → Returns the count of unique values in a Series, sorted from most to least frequent. Ideal for quick frequency analysis.

Q26. How do you remove duplicates from a DataFrame? → df.drop_duplicates() or df.drop_duplicates(subset=['column']) from specific columns.

Q27. What is the difference between loc and? →loc uses label-based indexing. iloc uses integer position-based indexing.

Q28. How do you create a new column based on a condition? → df['new_col'] = np.where(df['salary'] > 50000, 'Senior', 'Junior')

Q29. What is lambda Python? → An anonymous single-line function: lambda x: x * 2

Q30. What does it df.pivot_table() do? → Creates a spreadsheet-style pivot table from a DataFrame—summarizing data by two dimensions with an aggregation function.

Comparison: Python Libraries for Data Science in India 2026

Library	Primary Use	Interview Frequency
Pandas	Data manipulation, cleaning	Very High
NumPy	Numerical computing, arrays	High
Matplotlib	Basic plotting	High
Seaborn	Statistical visualisation	Medium–High
Scikit-learn	Machine learning	Medium
Plotly	Interactive charts	Medium
SciPy	Scientific computing, stats	Medium

Image Suggestions

Image 1 — Placement: After the introduction, a young Indian fresher is writing Python code in a Jupyter Notebook on a laptop, with Pandas DataFrame output visible on screen in a well-lit study space. ALT text: “Python interview questions for data science India 2026 — fresher writing Pandas and NumPy code in Jupyter Notebook”

Image 2 — Placement: After the visualization section, a colorful data visualization dashboard with histograms and heatmaps generated using Python Matplotlib and Seaborn on a monitor. ALT text: “Python interview questions for data science India 2026 — Python Matplotlib and Seaborn charts for data analysis”

External Authority Links

Python Official Documentation—authoritative reference for all Python concepts
Pandas Official Documentation — complete Pandas API reference and tutorials
Kaggle — Python Courses for Data Science — free interactive Python data science courses
Analytics Vidhya — Python for Data Science India — India-focused Python learning resources
Real Python Tutorials — high-quality practical Python tutorials for all skill levels

FAQs: Python Interview Questions for Data Science India 2026

Q1. Which Python libraries are most important for data science interviews in India in 2026? Pandas and NumPy are by far the most important libraries for Python interview questions for data science in India in 2026. Matplotlib/Seaborn for visualization and Scikit-learn for ML basics are important at product and analytics companies.

Q2. Do freshers get asked machine learning questions in Python data science interviews? At IT services companies like TCS and Infosys, ML questions are rare at the fresher level. At product companies and startups, ML basics appear in Python interview questions for data science in India 2026 even for fresher roles. Knowing train-test split, linear regression, and basic evaluation metrics is usually sufficient.

Q3. How should freshers prepare Python for data science interviews in India? Work through all Pandas and NumPy basics, complete Kaggle’s free Python and Pandas courses, and practice on real datasets. Solving every question in this Python interview questions for data science India 2026 guide from memory is an excellent preparation strategy.

Q4. Is Jupyter Notebook used in Indian data science interviews? Many Indian companies conduct take-home assignments or live coding rounds in Jupyter notebooks. Being comfortable writing and explaining code in a notebook environment is valuable for Python interview questions for data science India 2026.

Q5. What Python projects should freshers build before interviews in India? Build an exploratory data analysis (EDA) project on a Kaggle dataset, a customer churn prediction model, and a sales dashboard using Matplotlib. These projects demonstrate practical application of all the Python interview questions for data science India 2026 concepts in a way that impresses recruiters.

Conclusion

Python mastery is what separates good data science candidates from great ones in India’s 2026 job market. The Python interview questions for data science in India 2026 in this guide cover every layer—from core Python data types to Pandas data manipulation, NumPy vectorization, visualization, and ML basics.

Do not just read this guide — write every code snippet yourself. Run it. Break it. Fix it. That hands-on practice is what converts theoretical knowledge into the real-world fluency that Python interview questions for data science in India in 2026 interviewers are actually testing for.

Which Python topic is giving you the most trouble? Comment below and we will share targeted practice problems to help you level up!