SoundBridge: Music Similarity Search & Recommendation System

SoundBridge is an pipeline that explores music similarity search and recommendation using audio analysis and machine learning. The system supports both audio-to-audio search and text-to-audio search.

The project compares two retrieval approaches:

Handcrafted audio feature baseline using librosa features and cosine similarity.
CLAP semantic retrieval using pretrained audio-text embeddings.

The project also includes:

a Streamlit frontend demo
a lightweight FastAPI backend
a SQLite metadata/search-log layer
retrieval evaluation and qualitative analysis

Project Repository

https://github.com/DannyGetGrammy/SoundBridge

Dataset

This project uses the Free Music Archive (FMA) dataset:

https://github.com/mdeff/fma

For this course project, I used a balanced subset of FMA Small:

8 genres
10 tracks per genre
80 tracks total

The full raw dataset and processed audio files are not included in this GitHub repository because they are large and should be downloaded from the original FMA source.

What Is Included in This Repository

This repository includes source code, scripts, demo code, backend code, tests, configuration files, and small metadata/artifact manifest files needed to inspect the project structure.

api/          FastAPI backend
demo/         Streamlit demo apps
scripts/      dataset subset construction script
src/          preprocessing, feature extraction, retrieval, evaluation scripts
tests/        API tests
README.md     setup and usage instructions
requirements.txt
.gitignore

Large generated files are intentionally excluded from Git, including:

data/raw/
data/external/
data/processed/
data/app/
models/**/*.npy
models/**/*.joblib
models/**/*.index
outputs/waveforms/
outputs/spectrograms/
outputs/search_results/
outputs/logs/
outputs/analysis/
docs/qualitative_analysis.md
*.wav
*.mp3
*.flac
*.m4a

These files can be regenerated by running the pipeline.

Environment Setup

Create and activate a virtual environment:

python3 -m venv venv
source venv/bin/activate

Install dependencies:

pip install --upgrade pip
pip install -r requirements.txt

Dataset Setup

Download the FMA Small audio files and metadata from the official FMA repository:

https://github.com/mdeff/fma

Expected local file structure:

data/raw/fma_small/
data/raw/fma_metadata/

The metadata folder should contain the FMA metadata files, especially:

data/raw/fma_metadata/tracks.csv

The audio folder should contain the FMA Small audio files. FMA Small audio files are organized like:

data/raw/fma_small/000/000002.mp3
data/raw/fma_small/001/001486.mp3

Because FMA audio files are large, they are not included in this repository.

Reproduce the Full Pipeline

Run all commands from the project root.

1. Build Balanced FMA Subset

python3 scripts/build_fma_subset.py --tracks_per_genre 10

Expected output:

data/subsets/fma_small_subset.csv

2. Preprocess Audio

python3 src/preprocess.py

Expected outputs:

data/metadata_processed.csv
data/processed/*.wav
outputs/logs/preprocessing_report.json

Each selected track is converted to mono, resampled to 22050 Hz, peak-normalized, and trimmed or padded to 30 seconds.

3. Extract Handcrafted Audio Features

python3 src/extract_features.py

Expected outputs:

data/features_audio.csv
models/baseline/feature_matrix.npy
models/baseline/track_ids.json
outputs/logs/feature_extraction_report.json

The handcrafted feature matrix should have shape:

(80, 64)

4. Generate Waveform and Mel-Spectrogram Images

python3 src/visualize_audio.py --num_tracks 16

Expected outputs:

outputs/waveforms/
outputs/spectrograms/
outputs/logs/visualization_report.json

5. Build Baseline Retrieval Index

python3 src/build_baseline_index.py

Expected outputs:

models/baseline/scaler.joblib
models/baseline/feature_matrix_scaled.npy
models/baseline/feature_matrix_normalized.npy
outputs/logs/baseline_index_report.json

6. Run Baseline Search

python3 src/search_baseline.py --track_id 1482 --top_k 5

7. Run Baseline Example Queries and Evaluation

python3 src/run_baseline_examples.py --top_k 5
python3 src/evaluate_baseline_retrieval.py --top_k 5

Expected baseline evaluation result:

Mean Precision@5 ~= 0.320

8. Generate CLAP Audio Embeddings

python3 src/embed_clap_audio.py

Expected outputs:

models/clap/clap_audio_embeddings.npy
models/clap/clap_track_ids.json
models/clap/clap_embedding_metadata.csv
outputs/logs/clap_audio_embedding_report.json

The CLAP embedding matrix should have shape:

(80, 512)

The first run may take time because the pretrained CLAP model needs to be downloaded and loaded.

9. Run CLAP Audio-to-Audio Search

python3 src/search_clap_audio.py --track_id 1482 --top_k 5

10. Run CLAP Text-to-Audio Search

python3 src/search_clap_text.py --query "dreamy ambient electronic music" --top_k 5

11. Run CLAP Examples and Evaluation

python3 src/run_clap_examples.py --top_k 5
python3 src/evaluate_clap_retrieval.py --top_k 5
python3 src/compare_retrieval_systems.py

Expected CLAP evaluation result:

Mean Precision@5 ~= 0.375
CLAP - Baseline ~= +0.055

12. Generate Qualitative Analysis

python3 src/generate_qualitative_analysis.py

Expected outputs:

outputs/analysis/qualitative_examples.csv
outputs/analysis/qualitative_summary.json
outputs/analysis/precision_comparison_table.csv
docs/qualitative_analysis.md

These qualitative analysis outputs are generated locally and are not committed to Git.

Streamlit Demo

After running the pipeline and generating the required artifacts, start the Streamlit frontend:

python3 -m streamlit run demo/streamlit_app.py

The Streamlit demo supports:

audio-to-audio search using handcrafted baseline features
audio-to-audio search using CLAP audio embeddings
text-to-audio search using CLAP text embeddings
audio playback
waveform and mel-spectrogram display
dataset browser
evaluation summary

Example text query:

dreamy ambient electronic music

Note: the first CLAP text query may take several seconds or longer on CPU because the model is loaded lazily.

FastAPI Backend

Initialize the SQLite database:

python3 api/init_db.py

Run the backend:

python3 -m uvicorn api.main:app --reload

If port 8000 is already in use:

python3 -m uvicorn api.main:app --reload --port 8001

Open API documentation:

http://127.0.0.1:8000/docs

If you use port 8001, open:

http://127.0.0.1:8001/docs

Example API calls:

curl http://127.0.0.1:8000/health

curl "http://127.0.0.1:8000/tracks?limit=5"

curl http://127.0.0.1:8000/genres

curl http://127.0.0.1:8000/metrics

curl -X POST http://127.0.0.1:8000/search/audio \
  -H "Content-Type: application/json" \
  -d '{"track_id":"1482","method":"clap","top_k":5}'

curl -X POST http://127.0.0.1:8000/search/text \
  -H "Content-Type: application/json" \
  -d '{"query":"dreamy ambient electronic music","top_k":5}'

Run Tests

python3 -m pytest tests/test_api.py

Expected result:

5 passed

Evaluation Summary

The project uses genre-overlap Precision@5 as a lightweight sanity check for retrieval quality.

System	Mean Precision@5
Handcrafted Feature Baseline	0.320
CLAP Embeddings	0.375
Difference	+0.055

This repository is intended as a reproducible codebase for the SoundBridge final project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoundBridge: Music Similarity Search & Recommendation System

Project Repository

Dataset

What Is Included in This Repository

Environment Setup

Dataset Setup

Reproduce the Full Pipeline

1. Build Balanced FMA Subset

2. Preprocess Audio

3. Extract Handcrafted Audio Features

4. Generate Waveform and Mel-Spectrogram Images

5. Build Baseline Retrieval Index

6. Run Baseline Search

7. Run Baseline Example Queries and Evaluation

8. Generate CLAP Audio Embeddings

9. Run CLAP Audio-to-Audio Search

10. Run CLAP Text-to-Audio Search

11. Run CLAP Examples and Evaluation

12. Generate Qualitative Analysis

Streamlit Demo

FastAPI Backend

Run Tests

Evaluation Summary

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
data		data
demo		demo
models		models
outputs		outputs
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

SoundBridge: Music Similarity Search & Recommendation System

Project Repository

Dataset

What Is Included in This Repository

Environment Setup

Dataset Setup

Reproduce the Full Pipeline

1. Build Balanced FMA Subset

2. Preprocess Audio

3. Extract Handcrafted Audio Features

4. Generate Waveform and Mel-Spectrogram Images

5. Build Baseline Retrieval Index

6. Run Baseline Search

7. Run Baseline Example Queries and Evaluation

8. Generate CLAP Audio Embeddings

9. Run CLAP Audio-to-Audio Search

10. Run CLAP Text-to-Audio Search

11. Run CLAP Examples and Evaluation

12. Generate Qualitative Analysis

Streamlit Demo

FastAPI Backend

Run Tests

Evaluation Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages