Podcast Episode Guide Generator

Generates PDF "magazine-style" episode guides for:

Each run produces:

A cover page
A clickable Table of Contents
One A4 page per episode with links
Podcast-specific feature pages (TWIR QoW list, ZTTP game list)
TWIR also writes a companion CSV with episode metadata

Prerequisites

Python 3.9 or later (tested on 3.9 and 3.10)
A Google API key with the YouTube Data API v3 enabled (TWIR only)
Optional: a Gemini API key for Ten Pence next-month-game extraction
A Reddit account with a registered script application (TWIR QoW only)
Internet access to retroasylum.com for the RA guide (no API keys required)
Spotify app credentials for RGDS (RGDS_CLIENT_ID, RGDS_CLIENT_SECRET, RGDS_REDIRECT_URI)

Quick Start (Beginner)

If you want the shortest path from zero to running, follow exactly one of these flows.

macOS / Linux

# 1) Open a terminal and go to the project folder
cd /path/to/PodcastEpisoideGuideGenerator

# 2) Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3) Install dependencies
pip install -r requirements.txt

# 4) Create .env from the example in this README

# 5) Run the app
python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

Windows (PowerShell)

# 1) Open PowerShell and go to the project folder
cd C:\path\to\PodcastEpisoideGuideGenerator

# 2) Create and activate a virtual environment
py -3 -m venv .venv
.\.venv\Scripts\Activate.ps1

# 3) Install dependencies
pip install -r requirements.txt

# 4) Create .env from the example in this README

# 5) Run the app
python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

Output files are written to /mnt/ssd/podcast-episodes (podcast dependent):

TWiR Episode Guide.pdf
TWiR_Data.csv
ZTTP Episode Guide.pdf
RA Episode Guide.pdf
Ten Pence Arcade Episode Guide.pdf
RGDS Episode Guide.pdf

Installation

Verify your Python version first:

python3 --version      # macOS/Linux
py -3 --version        # Windows

venv uses whichever Python interpreter you run it with.

python3 -m venv .venv creates a venv from your default python3.
python3.10 -m venv .venv310 creates a Python 3.10 venv specifically.

Recommended default setup (works with your installed Python 3.x).

macOS / Linux

python3 -m venv .venv
source .venv/bin/activate
python --version
pip install -r requirements.txt

Windows (PowerShell)

py -3 -m venv .venv
.\.venv\Scripts\Activate.ps1
python --version
pip install -r requirements.txt

Optional: if you specifically want to pin to Python 3.10 (known-good version):

macOS / Linux (Python 3.10 pinned)

python3.10 -m venv .venv310
source .venv310/bin/activate
python --version                      # Should show Python 3.10.x
pip install -r requirements.txt

Windows (PowerShell, Python 3.10 pinned)

py -3.10 -m venv .venv310
.\.venv310\Scripts\Activate.ps1
python --version
pip install -r requirements.txt

If python3.10 is not available, you can still create a venv with your installed Python 3.x version.

Note: The YouTube client library is imported as from pyyoutube import Api but the pip package name is python-youtube.

Configuration

1. `.env` file

Create a .env file in the project root by copying .env.example.

macOS / Linux

cp .env.example .env

Windows (PowerShell)

Copy-Item .env.example .env

Template contents (grouped by podcast/provider):

Global Variables (All Podcasts)

Variable	Required	Description
`LOG_LEVEL`	No	Logging verbosity. Valid values are Python logging levels such as `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. Defaults to `INFO` if omitted or invalid.

TWIR Variables (Required for `--podcasts twir` and `--podcasts all`)

Variable	Required	Description
`YOUTUBE_API_KEY`	Yes	YouTube Data API key for TWIR episode retrieval.
`YOUTUBE_PLAYLIST_ID`	Yes	ID of the TWIR YouTube playlist.
`PODBEAN_RSS_FEED`	Yes	Full URL of the Podbean RSS feed for TWIR.
`REDDIT_CLIENT_ID`	Yes	Reddit app client ID (TWIR QoW), from reddit.com/prefs/apps.
`REDDIT_CLIENT_SECRET`	Yes	Reddit app client secret (TWIR QoW).
`REDDIT_USERNAME`	Yes	Reddit account username (TWIR QoW).
`REDDIT_PASSWORD`	Yes	Reddit account password (TWIR QoW).
`REDDIT_USER_AGENT`	Yes	User-agent string identifying the script to Reddit.

10P Variables (Optional for `--podcasts 10p` and `--podcasts all`)

Variable	Required	Description
`TEN_P_GEMINI_API_KEY`	No	Optional Gemini API key used for Ten Pence next-month-game extraction. If omitted, the provider falls back to cache/overrides and `No Game`.

RGDS Variables (Required for `--podcasts rgds` and `--podcasts all`)

Variable	Required	Description
`RGDS_CLIENT_ID`	Yes	Spotify app client ID.
`RGDS_CLIENT_SECRET`	Yes	Spotify app client secret.
`RGDS_REDIRECT_URI`	Yes	Spotify app redirect URI (must match your Spotify app settings).
`RGDS_SHOW_ID`	No	Optional Spotify show ID for RGDS. Defaults to the current RGDS show if omitted.
`RGDS_REFRESH_TOKEN`	No	Optional Spotify refresh token for non-interactive RGDS runs. If omitted, first RGDS run uses browser OAuth bootstrap and caches the refresh token under `.cache/RGDS/auth.json`.

RA and ZTTP Variables

No dedicated environment variables are required by these providers in the current implementation.

Compatibility note:

YOUTUBE_API_KEY is the canonical TWIR key name.
GOOGLE_API_KEY is still accepted as a legacy fallback for older local .env files.

Example .env:

YOUTUBE_API_KEY=your_youtube_api_key_here
YOUTUBE_PLAYLIST_ID=PLPVR2wA1dpHZR7p2GL5rgB7ybTMxtgPPB
PODBEAN_RSS_FEED=https://feed.podbean.com/TWIR/feed.xml
REDDIT_CLIENT_ID=your_reddit_client_id
REDDIT_CLIENT_SECRET=your_reddit_client_secret
REDDIT_USERNAME=your_reddit_username
REDDIT_PASSWORD=your_reddit_password
REDDIT_USER_AGENT=script:question_search:v1.0 (by u/your_username)
LOG_LEVEL=INFO
TEN_P_GEMINI_API_KEY=your_gemini_api_key_here
RGDS_CLIENT_ID=your_spotify_client_id
RGDS_CLIENT_SECRET=your_spotify_client_secret
RGDS_REDIRECT_URI=http://127.0.0.1:8888/callback
RGDS_SHOW_ID=00sL9tgDezr0PRSzd3C7H6
RGDS_REFRESH_TOKEN=

How to get the values:

YOUTUBE_API_KEY: Create in Google Cloud Console and enable YouTube Data API v3 for that project.
YOUTUBE_PLAYLIST_ID: Use the playlist ID from the TWiR YouTube URL (already provided in the example).
PODBEAN_RSS_FEED: Use the RSS feed URL (already provided in the example).
REDDIT_*: Create a Reddit app at https://www.reddit.com/prefs/apps and choose script.

RGDS Spotify bootstrap notes:

Create a Spotify app in the Spotify Developer Dashboard.
Add your redirect URI (for example http://127.0.0.1:8888/callback) to the app settings.
On first rgds run without RGDS_REFRESH_TOKEN, a browser OAuth flow is launched.
On success, the refresh token is saved to .cache/RGDS/auth.json and reused on subsequent runs.

2. Centralized env var handling (`env_var_utils.py`)

All environment variables are loaded, validated, and logged in one place via EnvVarUtils.

Runtime behavior:

Required variables are validated at startup; missing required values abort the run.
Startup logs include all loaded env vars.
Sensitive values (keys containing PASSWORD, SECRET, API_KEY, TOKEN) are masked in logs.
LOG_LEVEL controls application verbosity.
For TWIR, YOUTUBE_API_KEY is preferred and GOOGLE_API_KEY is accepted as a fallback alias.
For Ten Pence AI extraction, invalid/missing TEN_P_GEMINI_API_KEY does not fail the run; extraction is disabled for that run and cache/overrides/No Game are used.
For RGDS, RGDS_CLIENT_ID, RGDS_CLIENT_SECRET, and RGDS_REDIRECT_URI are required only when running the RGDS provider.

To create a Reddit app:

Go to https://www.reddit.com/prefs/apps
Click Create another app
Select script as the type
Set the redirect URI to http://localhost:8080
Copy the client ID (shown under the app name) and secret

Running

macOS / Linux

python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

If your venv is not active, run with the venv Python directly:

./.venv/bin/python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'
./.venv310/bin/python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

Windows (PowerShell)

python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

If your venv is not active:

.\.venv\Scripts\python.exe run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'
.\.venv310\Scripts\python.exe run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

Output files are written to /mnt/ssd/podcast-episodes:

/mnt/ssd/podcast-episodes/TWiR Episode Guide.pdf — the full episode guide PDF
/mnt/ssd/podcast-episodes/TWiR_Data.csv — CSV of episode data including questions
/mnt/ssd/podcast-episodes/ZTTP Episode Guide.pdf — the ZTTP episode guide PDF
/mnt/ssd/podcast-episodes/RA Episode Guide.pdf — the Retro Asylum episode guide PDF
/mnt/ssd/podcast-episodes/Ten Pence Arcade Episode Guide.pdf — the Ten Pence Arcade episode guide PDF
/mnt/ssd/podcast-episodes/RGDS Episode Guide.pdf — the RGDS episode guide PDF

Utility Scripts

Convenience scripts are available in scripts/ for common runs:

./scripts/zttp.sh   # Runs: python run_guides.py --podcasts zttp
./scripts/twir.sh   # Runs: python run_guides.py --podcasts twir
./scripts/ra.sh     # Runs: python run_guides.py --podcasts ra
./scripts/10p.sh    # Runs: python run_guides.py --podcasts 10p
./scripts/all.sh    # Runs: python run_guides.py --podcasts all

If needed, make scripts executable first:

chmod +x scripts/*.sh

Unified Multi-Podcast Run

Use the shared runner to select one or more podcast guides in a single command. Each selected podcast generates its own PDF output file (no combined PDF).

CLI selection tokens are lower-case:

twir
zttp
ra
10p
rgds
all

Internally, provider IDs are centralized in cache_paths.py as upper-case keys (TWIR, ZTTP, RA, 10P, RGDS) and the runner derives CLI tokens from those constants.

python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'
python run_guides.py --podcasts zttp
python run_guides.py --podcasts ra
python run_guides.py --podcasts 10p
python run_guides.py --podcasts rgds
python run_guides.py --podcasts twir,zttp
python run_guides.py --podcasts twir,ra
python run_guides.py --podcasts twir,10p
python run_guides.py --podcasts twir,rgds
python run_guides.py --podcasts all

If one podcast fails and you still want to continue remaining selections:

python run_guides.py --podcasts all --continue-on-error

Project Structure

.
├── run_guides.py            # Unified entry point for TWIR, ZTTP, RA, 10P, and RGDS guide generation
├── data_retriever.py        # Fetches episodes from YouTube API and Podbean RSS feed
├── cache_paths.py           # Centralized cache locations under .cache/<PROVIDER>
├── env_var_utils.py         # Loads and validates environment variables from .env
├── .env.example             # Safe starter template for local configuration
├── requirements.txt         # Runtime dependency list for local installation
├── constants/               # Typed constants registry for provider selection
├── renderers/               # Shared and provider-specific renderers
├── tests/                   # Unit tests
├── podcasts/
│   ├── common/              # Shared base classes, runtime helpers, and constants
│   ├── twir/                # TWIR-specific modules (including qow/)
│   ├── zttp/                # ZTTP-specific modules
│   ├── ra/                  # Retro Asylum–specific modules (scrapes retroasylum.com)
│   │   └── assets/          # Local RA static assets (e.g., RACover.png)
│   ├── tenp/                # Ten Pence Arcade–specific modules
│   └── rgds/                # RGDS-specific modules (Spotify API + OAuth)
├── scripts/                 # Convenience shell scripts (twir.sh, zttp.sh, ra.sh, 10p.sh, all.sh)
├── .env                     # Environment variables (do not commit to source control)
├── .cache/
│   ├── _SHARED/
│   │   └── images/          # Shared image cache reused across providers
│   ├── TWIR/
│   │   ├── images/          # TWIR image cache
│   │   ├── qow_cache.pkl    # TWIR QoW cache file
│   │   └── episodes.json    # TWIR episode metadata cache
│   ├── ZTTP/
│   │   ├── images/          # ZTTP image cache
│   │   ├── episode_cache.pkl
│   │   ├── zzap_cache.pkl
│   │   └── crapverts_cache.pkl
│   ├── RA/
│   │   ├── images/          # RA image cache
│   │   └── episodes_cache.pkl
│   ├── 10P/
│   │   ├── images/          # Ten Pence image cache
│   │   ├── episode_cache.pkl
│   │   └── next_month_game_cache.pkl
│   └── RGDS/
│       ├── images/          # RGDS image cache
│       ├── episodes.json
│       └── auth.json        # Spotify refresh-token cache
└── image_cache/             # Optional manual image staging area (not read automatically)

Logging

The app uses Python's standard logging module throughout (instead of print).

Configure verbosity with LOG_LEVEL in .env (or by exporting it in your shell).
Severity levels are used consistently:
- INFO for normal progress/status
- WARNING for retries and recoverable issues
- ERROR / EXCEPTION for failures

Examples:

LOG_LEVEL=INFO python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'
LOG_LEVEL=DEBUG python run_guides.py --podcasts twir,zttp,ra,10p,rgds  # any combination, or 'all'

Image Cache

Images (episode thumbnails, cover, listen button) are cached on first download and reused on subsequent runs.

Cache locations:

Provider-local cache: .cache/<PROVIDER>/images/
Shared cross-provider cache: .cache/_SHARED/images/

The runtime checks caches in this order:

Active provider cache (.cache/<PROVIDER>/images/)
Shared cache (.cache/_SHARED/images/)
Other provider caches (cross-provider fallback)
Network download

When an image is found in another provider cache, it is copied into shared cache and reused.

This behavior:

Significantly speeds up generation
Reduces duplicate downloads across providers
Allows blocked or unavailable images to be substituted manually

Current standardized cache locations are:

.cache/_SHARED/images/
.cache/TWIR/images/
.cache/ZTTP/images/
.cache/RA/images/
.cache/10P/images/
.cache/RGDS/images/

Cache policy:

Cache paths are built via shared helpers in cache_paths.py.
Cache directory names are centralized in cache_paths.py (CACHE_DIRNAME, IMAGE_CACHE_DIRNAME).
Cache filenames are deterministic and URL-derived, so the same image URL maps to the same cache filename across providers.

Cache filenames are derived from the full URL to ensure uniqueness across episodes:

https://i.ytimg.com/vi/kE8c463aibw/hqdefault.jpg
  → i-ytimg-com-kE8c463aibw-hqdefault.jpg

Adding images manually

If a download fails, the log will show:

Image cache MISS: i-ibb-co-ccL0XZPJ-TWIR-Reddit-logo.jpg - downloading
Image download FAILED for URL: https://i.ibb.co/ccL0XZPJ/TWIR-Reddit-logo.jpg

Download the image manually and copy it to the provider image cache directory using the filename from the Image cache MISS: line.

Preferred manual-copy locations are:

.cache/_SHARED/images/ (recommended for reuse by all providers)
.cache/TWIR/images/
.cache/10P/images/
.cache/ZTTP/images/
.cache/RA/images/

The two static images hosted on i.ibb.co (which may be blocked in some environments) are:

Cache filename	Source URL
`i-ibb-co-ccL0XZPJ-TWIR-Reddit-logo.jpg`	`https://i.ibb.co/ccL0XZPJ/TWIR-Reddit-logo.jpg`
`i-ibb-co-NWmMHcH-Listen-Now.jpg`	`https://i.ibb.co/NWmMHcH/Listen-Now.jpg`

ZTTP Cover Images

The ZTTP guide inserts a full-page magazine cover before the first episode page that covers each issue.

Zzap!64 covers (issues 1–91, May 1985 – December 1992)

Cover images are scraped from zzap64.co.uk and cached in .cache/ZTTP/zzap_cache.pkl. The scraper runs automatically on the first run and on any run where the cache is absent or invalid. Cover pages are keyed by Month YYYY extracted from the episode title.

Ten Pence AI Extraction

Ten Pence uses AI only for next-month-game extraction, and only when there is no override/cached value.

Resolution order per episode:

Existing next-month-game cache value
Hardcoded override in podcasts/tenp/page_constants.py
Gemini extraction (if TEN_P_GEMINI_API_KEY is configured and valid)
Fallback value No Game

Failure handling:

Missing key: AI is skipped, run continues.
Invalid key/API errors: AI is disabled for the rest of the run and processing continues.
Any extraction exception: warning is logged and fallback is used.

The run never fails solely because Gemini extraction fails.

Developer Notes

Test mode

Use environment variables to run a short test selection (default count is 5):

GUIDE_TEST_RUN=true GUIDE_TEST_COUNT=5 python run_guides.py --podcasts [twir | zttp | ra | 10p | all]

QoW cache

Reddit QoW data is cached in .cache/TWIR/qow_cache.pkl. Deleting it will cause the next run to re-scrape all posts from Reddit. The cache contains no credentials.

Episode retry logic

If an exception occurs while building an episode page, the TWIR builder retries based on RETRY_NUMBER (currently 1, so one retry after the first failure) before aborting. This helps with transient network/image errors.

Shared main/runtime flow

Provider entrypoints use shared helpers from podcasts/common/ (TWIR, ZTTP, RA, and 10P):

runtime.py for logging bootstrap and test-run env parsing
guide_main_base.py for common create/write/save orchestration

Retro Asylum (RA) specifics

Data source: RA episodes are scraped from retroasylum.com (no API keys required). An active internet connection is needed on the first run; subsequent runs use the local cache.
Cover asset: The RA cover image is stored locally at podcasts/ra/assets/RACover.png.
Episode filtering: Episodes whose cover image URL ends with RA_error.png are suppressed in both the TOC and episode pages. The filter list is configurable via TEXT_TO_REMOVE in podcasts/ra/page_constants.py.
Cache behavior: The generator reads and writes only provider-local cache paths under .cache/<PROVIDER>/.
Network resilience: If retroasylum.com is unreachable during page discovery, the generator falls back to the existing episode cache rather than aborting.

Unit Tests

The project uses Python's built-in unittest framework with tests in the tests/ directory.

Run all tests:

macOS / Linux

python -m unittest discover -s tests -v

If using the local venv directly:

./.venv/bin/python -m unittest discover -s tests -v
./.venv310/bin/python -m unittest discover -s tests -v

Run a single test module:

python -m unittest tests.test_twir_utils -v
./.venv310/bin/python -m unittest tests.test_twir_utils -v

Windows (PowerShell)

python -m unittest discover -s tests -v
.\.venv\Scripts\python.exe -m unittest discover -s tests -v
.\.venv310\Scripts\python.exe -m unittest discover -s tests -v

Run a single test case or method:

python -m unittest tests.test_twir_utils.TestExtractEpisodeNumber -v
python -m unittest tests.test_twir_utils.TestExtractEpisodeNumber.test_twir_ep_format -v
./.venv310/bin/python -m unittest tests.test_twir_utils.TestExtractEpisodeNumber -v
./.venv310/bin/python -m unittest tests.test_twir_utils.TestExtractEpisodeNumber.test_twir_ep_format -v

Current test coverage includes:

podcasts.twir.twir_utils parsing helpers
podcasts.twir.pdf_writer cache key/sanitization and text splitting helpers
podcasts.twir.qow model and cache loading behavior
podcasts.ra.episode episode number extraction (standard and Bytesize formats)
podcasts.ra.pdf_writer episode filtering logic
constants registry and ZTTP caching flows

Dependencies

Package	Purpose
`python-dotenv`	Load `.env` file into the environment
`python-youtube`	YouTube Data API v3 client (imported as `pyyoutube`)
`feedparser`	Parse the Podbean RSS feed
`requests`	HTTP image downloads
`Pillow`	Image processing and resizing
`reportlab`	PDF generation
`numpy`	Image array conversion for JPEG→PNG normalisation
`praw`	Reddit API client for QoW scraping
`beautifulsoup4`	HTML parsing for ZTTP page and content extraction
`lxml`	Parser backend used by BeautifulSoup in ZTTP flows

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
constants		constants
podcasts		podcasts
renderers		renderers
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
REFACTOR_PLAN.md		REFACTOR_PLAN.md
cache_paths.py		cache_paths.py
data_retriever.py		data_retriever.py
env_var_utils.py		env_var_utils.py
pdf_writer_base.py		pdf_writer_base.py
requirements.txt		requirements.txt
run_guides.py		run_guides.py

Folders and files

Latest commit

History

Repository files navigation

Podcast Episode Guide Generator

Table of Contents

Prerequisites

Quick Start (Beginner)

macOS / Linux

Windows (PowerShell)

Installation

macOS / Linux

Windows (PowerShell)

macOS / Linux (Python 3.10 pinned)

Windows (PowerShell, Python 3.10 pinned)

Configuration

1. .env file

macOS / Linux

Windows (PowerShell)

Global Variables (All Podcasts)

TWIR Variables (Required for --podcasts twir and --podcasts all)

10P Variables (Optional for --podcasts 10p and --podcasts all)

RGDS Variables (Required for --podcasts rgds and --podcasts all)

RA and ZTTP Variables

2. Centralized env var handling (env_var_utils.py)

Running

macOS / Linux

Windows (PowerShell)

Utility Scripts

Unified Multi-Podcast Run

Project Structure

Logging

Image Cache

Adding images manually

ZTTP Cover Images

Zzap!64 covers (issues 1–91, May 1985 – December 1992)

Ten Pence AI Extraction

Developer Notes

Test mode

QoW cache

Episode retry logic

Shared main/runtime flow

Retro Asylum (RA) specifics

Unit Tests

macOS / Linux

Windows (PowerShell)

Dependencies

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `.env` file

TWIR Variables (Required for `--podcasts twir` and `--podcasts all`)

10P Variables (Optional for `--podcasts 10p` and `--podcasts all`)

RGDS Variables (Required for `--podcasts rgds` and `--podcasts all`)

2. Centralized env var handling (`env_var_utils.py`)

Packages