Skip to content

DannyGetGrammy/SoundBridge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SoundBridge: Music Similarity Search & Recommendation System

SoundBridge is an pipeline that explores music similarity search and recommendation using audio analysis and machine learning. The system supports both audio-to-audio search and text-to-audio search.

The project compares two retrieval approaches:

  1. Handcrafted audio feature baseline using librosa features and cosine similarity.
  2. CLAP semantic retrieval using pretrained audio-text embeddings.

The project also includes:

  • a Streamlit frontend demo
  • a lightweight FastAPI backend
  • a SQLite metadata/search-log layer
  • retrieval evaluation and qualitative analysis

Project Repository

https://github.com/DannyGetGrammy/SoundBridge

Dataset

This project uses the Free Music Archive (FMA) dataset:

https://github.com/mdeff/fma

For this course project, I used a balanced subset of FMA Small:

  • 8 genres
  • 10 tracks per genre
  • 80 tracks total

The full raw dataset and processed audio files are not included in this GitHub repository because they are large and should be downloaded from the original FMA source.

What Is Included in This Repository

This repository includes source code, scripts, demo code, backend code, tests, configuration files, and small metadata/artifact manifest files needed to inspect the project structure.

api/          FastAPI backend
demo/         Streamlit demo apps
scripts/      dataset subset construction script
src/          preprocessing, feature extraction, retrieval, evaluation scripts
tests/        API tests
README.md     setup and usage instructions
requirements.txt
.gitignore

Large generated files are intentionally excluded from Git, including:

data/raw/
data/external/
data/processed/
data/app/
models/**/*.npy
models/**/*.joblib
models/**/*.index
outputs/waveforms/
outputs/spectrograms/
outputs/search_results/
outputs/logs/
outputs/analysis/
docs/qualitative_analysis.md
*.wav
*.mp3
*.flac
*.m4a

These files can be regenerated by running the pipeline.

Environment Setup

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Dataset Setup

Download the FMA Small audio files and metadata from the official FMA repository:

https://github.com/mdeff/fma

Expected local file structure:

data/raw/fma_small/
data/raw/fma_metadata/

The metadata folder should contain the FMA metadata files, especially:

data/raw/fma_metadata/tracks.csv

The audio folder should contain the FMA Small audio files. FMA Small audio files are organized like:

data/raw/fma_small/000/000002.mp3
data/raw/fma_small/001/001486.mp3

Because FMA audio files are large, they are not included in this repository.

Reproduce the Full Pipeline

Run all commands from the project root.

1. Build Balanced FMA Subset

python3 scripts/build_fma_subset.py --tracks_per_genre 10

Expected output:

data/subsets/fma_small_subset.csv

2. Preprocess Audio

python3 src/preprocess.py

Expected outputs:

data/metadata_processed.csv
data/processed/*.wav
outputs/logs/preprocessing_report.json

Each selected track is converted to mono, resampled to 22050 Hz, peak-normalized, and trimmed or padded to 30 seconds.

3. Extract Handcrafted Audio Features

python3 src/extract_features.py

Expected outputs:

data/features_audio.csv
models/baseline/feature_matrix.npy
models/baseline/track_ids.json
outputs/logs/feature_extraction_report.json

The handcrafted feature matrix should have shape:

(80, 64)

4. Generate Waveform and Mel-Spectrogram Images

python3 src/visualize_audio.py --num_tracks 16

Expected outputs:

outputs/waveforms/
outputs/spectrograms/
outputs/logs/visualization_report.json

5. Build Baseline Retrieval Index

python3 src/build_baseline_index.py

Expected outputs:

models/baseline/scaler.joblib
models/baseline/feature_matrix_scaled.npy
models/baseline/feature_matrix_normalized.npy
outputs/logs/baseline_index_report.json

6. Run Baseline Search

python3 src/search_baseline.py --track_id 1482 --top_k 5

7. Run Baseline Example Queries and Evaluation

python3 src/run_baseline_examples.py --top_k 5
python3 src/evaluate_baseline_retrieval.py --top_k 5

Expected baseline evaluation result:

Mean Precision@5 ~= 0.320

8. Generate CLAP Audio Embeddings

python3 src/embed_clap_audio.py

Expected outputs:

models/clap/clap_audio_embeddings.npy
models/clap/clap_track_ids.json
models/clap/clap_embedding_metadata.csv
outputs/logs/clap_audio_embedding_report.json

The CLAP embedding matrix should have shape:

(80, 512)

The first run may take time because the pretrained CLAP model needs to be downloaded and loaded.

9. Run CLAP Audio-to-Audio Search

python3 src/search_clap_audio.py --track_id 1482 --top_k 5

10. Run CLAP Text-to-Audio Search

python3 src/search_clap_text.py --query "dreamy ambient electronic music" --top_k 5

11. Run CLAP Examples and Evaluation

python3 src/run_clap_examples.py --top_k 5
python3 src/evaluate_clap_retrieval.py --top_k 5
python3 src/compare_retrieval_systems.py

Expected CLAP evaluation result:

Mean Precision@5 ~= 0.375
CLAP - Baseline ~= +0.055

12. Generate Qualitative Analysis

python3 src/generate_qualitative_analysis.py

Expected outputs:

outputs/analysis/qualitative_examples.csv
outputs/analysis/qualitative_summary.json
outputs/analysis/precision_comparison_table.csv
docs/qualitative_analysis.md

These qualitative analysis outputs are generated locally and are not committed to Git.

Streamlit Demo

After running the pipeline and generating the required artifacts, start the Streamlit frontend:

python3 -m streamlit run demo/streamlit_app.py

The Streamlit demo supports:

  • audio-to-audio search using handcrafted baseline features
  • audio-to-audio search using CLAP audio embeddings
  • text-to-audio search using CLAP text embeddings
  • audio playback
  • waveform and mel-spectrogram display
  • dataset browser
  • evaluation summary

Example text query:

dreamy ambient electronic music

Note: the first CLAP text query may take several seconds or longer on CPU because the model is loaded lazily.

FastAPI Backend

Initialize the SQLite database:

python3 api/init_db.py

Run the backend:

python3 -m uvicorn api.main:app --reload

If port 8000 is already in use:

python3 -m uvicorn api.main:app --reload --port 8001

Open API documentation:

http://127.0.0.1:8000/docs

If you use port 8001, open:

http://127.0.0.1:8001/docs

Example API calls:

curl http://127.0.0.1:8000/health

curl "http://127.0.0.1:8000/tracks?limit=5"

curl http://127.0.0.1:8000/genres

curl http://127.0.0.1:8000/metrics

curl -X POST http://127.0.0.1:8000/search/audio \
  -H "Content-Type: application/json" \
  -d '{"track_id":"1482","method":"clap","top_k":5}'

curl -X POST http://127.0.0.1:8000/search/text \
  -H "Content-Type: application/json" \
  -d '{"query":"dreamy ambient electronic music","top_k":5}'

Run Tests

python3 -m pytest tests/test_api.py

Expected result:

5 passed

Evaluation Summary

The project uses genre-overlap Precision@5 as a lightweight sanity check for retrieval quality.

System Mean Precision@5
Handcrafted Feature Baseline 0.320
CLAP Embeddings 0.375
Difference +0.055

This repository is intended as a reproducible codebase for the SoundBridge final project.

About

Music Similarity Search & Recommendation System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages