This project helps you identify the best set of skills to learn for maximum job opportunities and better salary outcomes. The goal is to combine two ideas: skill demand and skill pay. You will use job listings data to find which skills appear most often, which skills are linked with higher salaries, and which combinations give the best overall value. By the end, you will have a ranked list of “optimal skills” and a clear learning roadmap based on evidence.
Project Goal
Answer questions like:
- Which skills are most demanded for entry-level analytics roles?
- Which skills are more common in higher-paying listings?
- What skill combination gives the best balance of demand and pay?
Step 1: Build your dataset
Collect 40 to 120 job listings for roles like Data Analyst, Business Analyst, and Reporting Analyst. Save these fields:
- job_id
- title
- company
- location
- experience_min, experience_max
- salary_min, salary_max (if available)
- salary_text (raw)
- date_posted (optional)
- description_text
If only some listings contain salary, that is fine. You will use salary analysis only on the listings that include it.
Step 2: Clean salary and experience
Standardise salary into one unit (example: annual salary). Create a single salary value using the midpoint of salary_min and salary_max. Create a single experience value using the midpoint of experience_min and experience_max. Remove or flag listings with missing salary so they do not break salary comparisons.
Step 3: Create a skill dictionary and extract skill flags
Create a list of skills and keywords. Then generate binary columns like:
- has_python, has_sql, has_excel, has_powerbi, has_tableau
- has_statistics, has_data_cleaning, has_dashboard
- has_cloud, has_etl, has_communication
Keep your matching approach transparent and simple. The aim is to demonstrate analysis logic, not build a perfect NLP model.
Step 4: Calculate skill demand score
For each skill, calculate:
- demand_count: number of job posts that mention it
- demand_share: percentage of total job posts that mention it
Convert this into a demand score (for example normalise between 0 and 1).
Step 5: Calculate skill pay score
Using only job posts with salary:
- calculate average salary for jobs that mention the skill
- calculate average salary for jobs that do not mention the skill
- compute salary uplift for the skill (difference or percentage uplift)
Convert this into a pay score (normalise between 0 and 1).
Step 6: Build an optimal skill score
Create a combined score:
- optimal_score = 0.6 × demand_score + 0.4 × pay_score
You can adjust weights based on your goal. If you want maximum interviews, demand weight can be higher. If you want higher salary, pay weight can be higher.
Step 7: Analyse skill combinations
Identify the top 5 most common skill stacks such as:
- SQL + Excel + Power BI
- Python + SQL + Pandas
- Python + statistics + dashboarding
Calculate how many jobs mention each stack and the average salary for jobs containing the full stack. Rank stacks by a combined score.
Step 8: Visualise and present results
Create charts such as:
- scatter plot: demand vs pay uplift for each skill
- bar chart: top 10 optimal skills by combined score
- bar chart: top skill stacks by combined score
- table: recommended learning roadmap (foundation → intermediate → advanced)
Deliverables
- cleaned_job_data.csv
- notebook.ipynb with analysis
- 4–6 charts saved as images
- README explaining dataset, keyword list, scoring method, and key insights
This project is highly portfolio-friendly because it shows data cleaning, feature engineering, scoring logic, and decision-making based on evidence.


