Skip to content

BushraaR5/python-data-jobs-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š Python Data Jobs Analysis

πŸš€ End-to-end data analysis project exploring job market trends, skills demand, and salary insights in the data industry.

πŸ“Œ Project Overview

This project performs an end-to-end analysis of real-world data job postings to uncover:

  • πŸ› οΈ In-demand skills across roles
  • πŸ‘€ Role-based skill requirements
  • πŸ’° Salary trends and growth
  • 🌍 Job market structure (remote work, platforms, hiring patterns)

The goal is to help aspiring data professionals understand the current job market and make informed career decisions.


πŸ“Š Dataset

This project uses the Data Jobs Dataset created by Luke Barousse.

This dataset contains real-world data analytics job postings collected from multiple sources using automated scraping methods. :contentReference[oaicite:0]{index=0}

⚠️ Data Availability Note

The full dataset (~230MB+) is not included in this repository due to size limitations.

Instead:

  • A sample dataset is provided for demonstration
  • All notebooks and scripts are designed to:
    • Use the full dataset if available
    • Automatically fall back to the sample dataset

This ensures the project runs smoothly for anyone cloning the repository.


πŸ“₯ Load Full Dataset

To reproduce full results, load the dataset using:

import pandas as pd
from datasets import load_dataset

dataset = load_dataset('lukebarousse/data_jobs')
df = dataset['train'].to_pandas()

πŸ“ Folder Structure

python-data-jobs-analysis/
β”‚
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw/
β”‚   β”‚   └── data_jobs_sample.csv
β”‚   └── processed/
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ 1.data_cleaning.ipynb
β”‚   β”œβ”€β”€ 2.skills_analysis.ipynb
β”‚   β”œβ”€β”€ 3.role_analysis.ipynb
β”‚   β”œβ”€β”€ 4.salary_analysis.ipynb
β”‚   └── 5.job_market_analysis.ipynb
β”‚
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data_cleaning.py
β”‚   └── skills_processing.py
β”‚
└── README.md

πŸ” Analysis Breakdown

The analysis is divided into four key components:

1️⃣ Skills Analysis

  • Identifies the most in-demand skills globally
  • Groups skills into categories (programming, cloud, tools)
  • Highlights learning priorities

2️⃣ Role Analysis

  • Compares Data Analyst, Data Engineer, and Data Scientist roles
  • Examines skill distribution and role expectations
  • Analyzes salary benchmarks and growth

3️⃣ Salary Analysis

  • Links skills with salary outcomes
  • Identifies high-paying vs low-paying skills
  • Compares salary distributions and geography

4️⃣ Job Market Analysis

  • Explores hiring patterns (full-time vs contract)
  • Analyzes job platforms and remote trends
  • Tracks demand over time

πŸ“Œ Key Insights

πŸ› οΈ Skills

  • Python and SQL dominate the market (~19% each)
  • Cloud skills (AWS, Azure) significantly increase salary potential
  • Specialized tools (Spark, Kafka) are niche but high-paying
image

πŸ‘€ Roles

  • Data Scientists earn the highest salaries (~$155K senior level)
  • Data Engineers offer strong demand + salary balance
  • Data Analysts have highest job volume but lower pay
image

πŸ’° Salary

  • High-paying roles are strongly linked to:
    • Cloud technologies
    • Big data tools
  • Remote roles offer higher average salaries (~10–15% more)
image

🌐 Job Market

  • ~90% of roles are full-time
  • LinkedIn dominates job postings
  • Remote work is increasing, especially in higher-paying roles
image

🎯 Why This Project Matters

This project provides a data-driven roadmap for:

πŸ‘¨β€πŸ’» Aspiring Data Professionals

  • What to learn first
  • Which roles to target
  • How to increase earning potential

πŸ’Ό Job Seekers

  • Which platforms to focus on
  • How remote work impacts salary
  • Where opportunities are growing

🏒 Recruiters / Businesses

  • Understanding talent demand
  • Identifying high-value skill gaps

πŸ’‘ What This Project Demonstrates

  • End-to-end data analysis workflow
  • Data cleaning and preprocessing of real-world datasets
  • Exploratory data analysis (EDA)
  • Feature engineering (skills extraction, salary mapping)
  • Business insight generation from data

πŸ›  Tools & Technologies

  • Python (Pandas, NumPy)
  • Data Visualization (Matplotlib, Seaborn)
  • Jupyter Notebook
  • Data Cleaning & Transformation

▢️ How to Run

  1. Clone the repository

  2. Install dependencies:

    pip install pandas matplotlib seaborn datasets
    
  3. Run the notebooks in order


πŸ“Œ Conclusion

The data job market rewards:

  • Strong foundations (Python + SQL)
  • Strategic specialization (Cloud, Big Data)
  • Smart role selection

πŸ™Œ Acknowledgment

Dataset provided by Luke Barousse https://huggingface.co/datasets/lukebarousse/data_jobs


πŸ“¬ Contact

Feel free to reach out for data analysis or Power BI projects.

If you found this project useful or have feedback, feel free to connect!

About

End-to-end analysis of data job market trends using Python (EDA, salary insights, skills demand)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors