π End-to-end data analysis project exploring job market trends, skills demand, and salary insights in the data industry.
This project performs an end-to-end analysis of real-world data job postings to uncover:
- π οΈ In-demand skills across roles
- π€ Role-based skill requirements
- π° Salary trends and growth
- π Job market structure (remote work, platforms, hiring patterns)
The goal is to help aspiring data professionals understand the current job market and make informed career decisions.
This project uses the Data Jobs Dataset created by Luke Barousse.
This dataset contains real-world data analytics job postings collected from multiple sources using automated scraping methods. :contentReference[oaicite:0]{index=0}
The full dataset (~230MB+) is not included in this repository due to size limitations.
Instead:
- A sample dataset is provided for demonstration
- All notebooks and scripts are designed to:
- Use the full dataset if available
- Automatically fall back to the sample dataset
This ensures the project runs smoothly for anyone cloning the repository.
To reproduce full results, load the dataset using:
import pandas as pd
from datasets import load_dataset
dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()python-data-jobs-analysis/
β
βββ data/
β βββ raw/
β β βββ data_jobs_sample.csv
β βββ processed/
β
βββ notebooks/
β βββ 1.data_cleaning.ipynb
β βββ 2.skills_analysis.ipynb
β βββ 3.role_analysis.ipynb
β βββ 4.salary_analysis.ipynb
β βββ 5.job_market_analysis.ipynb
β
βββ src/
β βββ data_cleaning.py
β βββ skills_processing.py
β
βββ README.md
The analysis is divided into four key components:
- Identifies the most in-demand skills globally
- Groups skills into categories (programming, cloud, tools)
- Highlights learning priorities
- Compares Data Analyst, Data Engineer, and Data Scientist roles
- Examines skill distribution and role expectations
- Analyzes salary benchmarks and growth
- Links skills with salary outcomes
- Identifies high-paying vs low-paying skills
- Compares salary distributions and geography
- Explores hiring patterns (full-time vs contract)
- Analyzes job platforms and remote trends
- Tracks demand over time
π Key Insights
- Python and SQL dominate the market (~19% each)
- Cloud skills (AWS, Azure) significantly increase salary potential
- Specialized tools (Spark, Kafka) are niche but high-paying
- Data Scientists earn the highest salaries (~$155K senior level)
- Data Engineers offer strong demand + salary balance
- Data Analysts have highest job volume but lower pay
- High-paying roles are strongly linked to:
- Cloud technologies
- Big data tools
- Remote roles offer higher average salaries (~10β15% more)
- ~90% of roles are full-time
- LinkedIn dominates job postings
- Remote work is increasing, especially in higher-paying roles
This project provides a data-driven roadmap for:
- What to learn first
- Which roles to target
- How to increase earning potential
- Which platforms to focus on
- How remote work impacts salary
- Where opportunities are growing
- Understanding talent demand
- Identifying high-value skill gaps
- End-to-end data analysis workflow
- Data cleaning and preprocessing of real-world datasets
- Exploratory data analysis (EDA)
- Feature engineering (skills extraction, salary mapping)
- Business insight generation from data
- Python (Pandas, NumPy)
- Data Visualization (Matplotlib, Seaborn)
- Jupyter Notebook
- Data Cleaning & Transformation
-
Clone the repository
-
Install dependencies:
pip install pandas matplotlib seaborn datasets -
Run the notebooks in order
The data job market rewards:
- Strong foundations (Python + SQL)
- Strategic specialization (Cloud, Big Data)
- Smart role selection
Dataset provided by Luke Barousse https://huggingface.co/datasets/lukebarousse/data_jobs
- GitHub: https://github.com/BushraaR5
- LinkedIn: (to be updated)
- Upwork: (to be updated)
Feel free to reach out for data analysis or Power BI projects.
If you found this project useful or have feedback, feel free to connect!