Skip to content

fix(medcat-trainer): Fix Spaces in model pack names breaking CDB import#533

Merged
alhendrickson merged 4 commits into
mainfrom
fix/medcat-trainer/fix-spaces-in-model-pack-name
Jun 11, 2026
Merged

fix(medcat-trainer): Fix Spaces in model pack names breaking CDB import#533
alhendrickson merged 4 commits into
mainfrom
fix/medcat-trainer/fix-spaces-in-model-pack-name

Conversation

@alhendrickson

Copy link
Copy Markdown
Collaborator

SOLR can't have spaces in its collection name

Right now any model pack i save with a space in the name can never get imported.

Seems like it's always been like this, but anyway, hopefully this explains some of the inconsistencies with the import I'd been seeing.

Current behavior

  1. Save a new model pack with name "my model pack" in the new admin UI
  2. Create a project and save
  3. Everything looks ok but the CDB import isnt done

The background process logs show this error:

2026-06-11 12:10:07.454 | [bg-process] INFO 2026-06-11 11:10:07,453 fixes.py l:41:CDB addl_info contains legacy data: 'cui2original_names' . Moving it to cui2info
2026-06-11 12:10:07.454 | [bg-process] INFO 2026-06-11 11:10:07,454 fixes.py l:21:Used 5 out of 5 CUIs in the 'cui2original_names' map
2026-06-11 12:10:07.454 | [bg-process] INFO 2026-06-11 11:10:07,454 utils.py l:310:Clearing addons for CDB upon load: 5
2026-06-11 12:10:07.463 | [bg-process] ERROR 2026-06-11 11:10:07,462 solr_utils.py l:233:Failure creating collection: solr error: {'metadata': ['error-class', 'org.apache.solr.common.SolrException', 'root-error-class', 'org.apache.solr.common.SolrException'], 'msg': 'Invalid collection: [test pack with space_CDB_id_5]. collection names must consist entirely of periods, underscores, hyphens, and alphanumerics as well not start with a hyphen', 'code': 400}
2026-06-11 12:10:07.466 | [bg-process] ERROR 2026-06-11 11:10:07,462 tasks.py l:57:Rescheduling api.admin.actions.import_concepts_from_cdb
2026-06-11 12:10:07.466 | [bg-process] Traceback (most recent call last):
2026-06-11 12:10:07.466 | [bg-process]   File "/home/.venv/lib/python3.12/site-packages/background_task/tasks.py", line 43, in bg_runner
2026-06-11 12:10:07.466 | [bg-process]     func(*args, **kwargs)
2026-06-11 12:10:07.466 | [bg-process]   File "/home/api/api/admin/actions.py", line 374, in import_concepts_from_cdb
2026-06-11 12:10:07.466 | [bg-process]     import_all_concepts(cdb, cdb_model)
2026-06-11 12:10:07.466 | [bg-process]   File "/usr/local/lib/python3.12/contextlib.py", line 81, in inner
2026-06-11 12:10:07.466 | [bg-process]     return func(*args, **kwds)
2026-06-11 12:10:07.466 | [bg-process]            ^^^^^^^^^^^^^^^^^^^
2026-06-11 12:10:07.466 | [bg-process]   File "/home/api/api/solr_utils.py", line 149, in import_all_concepts
2026-06-11 12:10:07.466 | [bg-process]     "collection_name": collection_name, "cdb_id": str(cdb_model.id), "cdb_name": str(cdb_model.name)
2026-06-11 12:10:07.466 | [bg-process]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026-06-11 12:10:07.466 | [bg-process]   File "/home/api/api/solr_utils.py", line 234, in _solr_error_response
2026-06-11 12:10:07.466 | [bg-process]     concept_dct = {
2026-06-11 12:10:07.466 | [bg-process] Exception: Failure creating collection: solr error: {'metadata': ['error-class', 'org.apache.solr.common.SolrException', 'root-error-class', 'org.apache.solr.common.SolrException'], 'msg': 'Invalid collection: [test pack with space_CDB_id_5]. collection names must consist entirely of periods, underscores, hyphens, and alphanumerics as well not start with a hyphen', 'code': 400}
2026-06-11 12:10:07.482 | [bg-process] WARNING 2026-06-11 11:10:07,482 models.py l:254:Marking task api.admin.actions.import_concepts_from_cdb as failed

New behavior

Just replace any chars that aren't allowed with underscores.

I think this is backwards compatible - anything on previous versions that was valid chars shouldnt get touched. & anything on previous versions that wasnt compatible wouldn't have been able to create a solr collection.

@tomolopolis tomolopolis left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - makes sense. This used to make more sense when model upload was all done via humans that could rename the model name...

@alhendrickson alhendrickson merged commit 225b65e into main Jun 11, 2026
10 checks passed
@alhendrickson alhendrickson deleted the fix/medcat-trainer/fix-spaces-in-model-pack-name branch June 11, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants