SoundBridge is an pipeline that explores music similarity search and recommendation using audio analysis and machine learning. The system supports both audio-to-audio search and text-to-audio search.
The project compares two retrieval approaches:
- Handcrafted audio feature baseline using librosa features and cosine similarity.
- CLAP semantic retrieval using pretrained audio-text embeddings.
The project also includes:
- a Streamlit frontend demo
- a lightweight FastAPI backend
- a SQLite metadata/search-log layer
- retrieval evaluation and qualitative analysis
https://github.com/DannyGetGrammy/SoundBridge
This project uses the Free Music Archive (FMA) dataset:
For this course project, I used a balanced subset of FMA Small:
- 8 genres
- 10 tracks per genre
- 80 tracks total
The full raw dataset and processed audio files are not included in this GitHub repository because they are large and should be downloaded from the original FMA source.
This repository includes source code, scripts, demo code, backend code, tests, configuration files, and small metadata/artifact manifest files needed to inspect the project structure.
api/ FastAPI backend
demo/ Streamlit demo apps
scripts/ dataset subset construction script
src/ preprocessing, feature extraction, retrieval, evaluation scripts
tests/ API tests
README.md setup and usage instructions
requirements.txt
.gitignore
Large generated files are intentionally excluded from Git, including:
data/raw/
data/external/
data/processed/
data/app/
models/**/*.npy
models/**/*.joblib
models/**/*.index
outputs/waveforms/
outputs/spectrograms/
outputs/search_results/
outputs/logs/
outputs/analysis/
docs/qualitative_analysis.md
*.wav
*.mp3
*.flac
*.m4a
These files can be regenerated by running the pipeline.
Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activateInstall dependencies:
pip install --upgrade pip
pip install -r requirements.txtDownload the FMA Small audio files and metadata from the official FMA repository:
Expected local file structure:
data/raw/fma_small/
data/raw/fma_metadata/
The metadata folder should contain the FMA metadata files, especially:
data/raw/fma_metadata/tracks.csv
The audio folder should contain the FMA Small audio files. FMA Small audio files are organized like:
data/raw/fma_small/000/000002.mp3
data/raw/fma_small/001/001486.mp3
Because FMA audio files are large, they are not included in this repository.
Run all commands from the project root.
python3 scripts/build_fma_subset.py --tracks_per_genre 10Expected output:
data/subsets/fma_small_subset.csv
python3 src/preprocess.pyExpected outputs:
data/metadata_processed.csv
data/processed/*.wav
outputs/logs/preprocessing_report.json
Each selected track is converted to mono, resampled to 22050 Hz, peak-normalized, and trimmed or padded to 30 seconds.
python3 src/extract_features.pyExpected outputs:
data/features_audio.csv
models/baseline/feature_matrix.npy
models/baseline/track_ids.json
outputs/logs/feature_extraction_report.json
The handcrafted feature matrix should have shape:
(80, 64)
python3 src/visualize_audio.py --num_tracks 16Expected outputs:
outputs/waveforms/
outputs/spectrograms/
outputs/logs/visualization_report.json
python3 src/build_baseline_index.pyExpected outputs:
models/baseline/scaler.joblib
models/baseline/feature_matrix_scaled.npy
models/baseline/feature_matrix_normalized.npy
outputs/logs/baseline_index_report.json
python3 src/search_baseline.py --track_id 1482 --top_k 5python3 src/run_baseline_examples.py --top_k 5
python3 src/evaluate_baseline_retrieval.py --top_k 5Expected baseline evaluation result:
Mean Precision@5 ~= 0.320
python3 src/embed_clap_audio.pyExpected outputs:
models/clap/clap_audio_embeddings.npy
models/clap/clap_track_ids.json
models/clap/clap_embedding_metadata.csv
outputs/logs/clap_audio_embedding_report.json
The CLAP embedding matrix should have shape:
(80, 512)
The first run may take time because the pretrained CLAP model needs to be downloaded and loaded.
python3 src/search_clap_audio.py --track_id 1482 --top_k 5python3 src/search_clap_text.py --query "dreamy ambient electronic music" --top_k 5python3 src/run_clap_examples.py --top_k 5
python3 src/evaluate_clap_retrieval.py --top_k 5
python3 src/compare_retrieval_systems.pyExpected CLAP evaluation result:
Mean Precision@5 ~= 0.375
CLAP - Baseline ~= +0.055
python3 src/generate_qualitative_analysis.pyExpected outputs:
outputs/analysis/qualitative_examples.csv
outputs/analysis/qualitative_summary.json
outputs/analysis/precision_comparison_table.csv
docs/qualitative_analysis.md
These qualitative analysis outputs are generated locally and are not committed to Git.
After running the pipeline and generating the required artifacts, start the Streamlit frontend:
python3 -m streamlit run demo/streamlit_app.pyThe Streamlit demo supports:
- audio-to-audio search using handcrafted baseline features
- audio-to-audio search using CLAP audio embeddings
- text-to-audio search using CLAP text embeddings
- audio playback
- waveform and mel-spectrogram display
- dataset browser
- evaluation summary
Example text query:
dreamy ambient electronic music
Note: the first CLAP text query may take several seconds or longer on CPU because the model is loaded lazily.
Initialize the SQLite database:
python3 api/init_db.pyRun the backend:
python3 -m uvicorn api.main:app --reloadIf port 8000 is already in use:
python3 -m uvicorn api.main:app --reload --port 8001Open API documentation:
http://127.0.0.1:8000/docs
If you use port 8001, open:
http://127.0.0.1:8001/docs
Example API calls:
curl http://127.0.0.1:8000/health
curl "http://127.0.0.1:8000/tracks?limit=5"
curl http://127.0.0.1:8000/genres
curl http://127.0.0.1:8000/metrics
curl -X POST http://127.0.0.1:8000/search/audio \
-H "Content-Type: application/json" \
-d '{"track_id":"1482","method":"clap","top_k":5}'
curl -X POST http://127.0.0.1:8000/search/text \
-H "Content-Type: application/json" \
-d '{"query":"dreamy ambient electronic music","top_k":5}'python3 -m pytest tests/test_api.pyExpected result:
5 passed
The project uses genre-overlap Precision@5 as a lightweight sanity check for retrieval quality.
| System | Mean Precision@5 |
|---|---|
| Handcrafted Feature Baseline | 0.320 |
| CLAP Embeddings | 0.375 |
| Difference | +0.055 |
This repository is intended as a reproducible codebase for the SoundBridge final project.