sense-music

Turn audio into structured analysis and annotated visualizations for AI perception. Liner notes for an AI.

Built by humanjava.com — find this and other tools for the agentic age at huje.tools.

Install

pip install sense-music              # core: sections, loops, key, energy, spectrogram, lyrics
pip install "sense-music[full]"      # + deep perception (CLAP, madmom, Demucs, Qwen2-Audio)

Deep-perception layers are optional extras (embedding, rhythm, stems, caption, loudness) — each degrades gracefully if its dependency is absent. ⚠️ On Python 3.12, madmom needs the git build (pip install git+https://github.com/CPJKU/madmom.git); the PyPI 0.16.1 won't build on 3.12.

Quick Start

from sense_music import analyze

result = analyze("song.mp3")
print(result.summary)
result.save("output/")

Full Example

from sense_music import analyze

result = analyze("song.mp3")

# Structured data
print(f"{result.bpm.tempo} BPM, {result.key.key} {result.key.mode}")
print(f"Genre: {result.genre}, Mood: {', '.join(result.mood)}")

# Sections
for s in result.sections:
    print(f"  {s.label}: {s.start:.1f}s — {s.end:.1f}s")

# Lyrics (requires whisper)
for line in result.lyrics:
    print(f"  [{line.start:.1f}s] {line.text}")

# Save everything
result.save("output/")           # spectrogram.png, waveform.png, analysis.json, analysis.html
result.render_page("song.html")  # self-contained HTML report

Skip Lyrics

If you don't have Whisper installed or want faster analysis:

result = analyze("song.mp3", lyrics=False)

What You Get

Output	Description
`result.spectrogram`	PIL Image — annotated mel spectrogram with section markers and energy curve
`result.waveform`	PIL Image — waveform with colored section regions
`result.bpm`	BPMInfo(tempo, confidence)
`result.key`	KeyInfo(key, mode, confidence)
`result.sections`	List of Section(label, start, end)
`result.lyrics`	List of LyricLine(start, end, text)
`result.energy_curve`	Per-second normalized energy values
`result.genre`	Simple genre classification
`result.mood`	List of mood descriptors
`result.summary`	Natural language track description
`result.motifs`	Recurring LOOPS (Motif label, count, occurrences) — which sections reprise
`result.structure`	Motif sequence, e.g. `"A-B-A-A-C-A"`
`result.key_changes`	Modulation timeline (per-section key changes)
`result.rhythm`	madmom beats/downbeats/tempo + bar grid (`rhythm=True`)
`result.chords`	Chord progression + timeline (`chords=True`)
`result.loudness`	LUFS + crest factor
`result.clap_tags`	CLAP zero-shot semantic tags (`clap_tags=True`)
`result.embedding`	CLAP 512-d audio embedding — a similarity metric ("does this sound like X") (`embedding=True`)
`result.arrangement`	Demucs stem activity + element in/out timeline (`stems=True`)
`result.caption`	Qwen2-Audio free-text liner notes (`caption=True`)

Deep perception (v0.3)

Each layer is an analyze() flag, fail-soft if its dep is missing:

result = analyze("song.mp3", rhythm=True, embedding=True, clap_tags=True,
                 chords=True, stems=True, caption=False)

rhythm (madmom) — SOTA beat/downbeat tracking → the BAR grid (the thing video editors cut on).
embedding + clap_tags (CLAP) — a 512-d audio embedding (the similarity metric) + zero-shot tags.
stems (Demucs) — source separation → an arrangement timeline (which element enters/exits when).
chords (madmom) — chord-progression recognition.
caption (Qwen2-Audio) — natural-language "liner notes" (heavy; loads a 7B model).

Cut grid (for video editing)

from sense_music.cutgrid import edit_points, match_reference
pts = edit_points(result, snap=True)             # bar-aligned, ranked edit points
hits = match_reference([4.1, 8.0, 12.2], result) # what song event each reference cut lands on

Dependencies

librosa — audio analysis · matplotlib — visualization · Pillow — image handling
openai-whisper — lyrics (optional via lyrics=False)
Deep-perception extras: transformers (CLAP + Qwen2-Audio), madmom (beat/downbeat/chords), demucs (stems), pyloudnorm (LUFS)

Usage & Copyright

You are responsible for ensuring you have the legal right to analyze any audio you submit to this tool, whether running locally or via the hosted service at huje.tools. sense-music provides compute and analysis only — it does not store, host, or redistribute audio content. By using this tool, you accept full responsibility for the content you process and how you use the results.

For details, see huje.tools/support.

License

MIT — Humanjava Enterprises Inc.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
clawhub		clawhub
examples		examples
src/sense_music		src/sense_music
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sense-music

Install

Quick Start

Full Example

Skip Lyrics

What You Get

Deep perception (v0.3)

Cut grid (for video editing)

Dependencies

Usage & Copyright

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sense-music

Install

Quick Start

Full Example

Skip Lyrics

What You Get

Deep perception (v0.3)

Cut grid (for video editing)

Dependencies

Usage & Copyright

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages