Skip to content

(Draft) New Data Model#480

Draft
SKernchen wants to merge 281 commits into
developfrom
refactor/data-model
Draft

(Draft) New Data Model#480
SKernchen wants to merge 281 commits into
developfrom
refactor/data-model

Conversation

@SKernchen

Copy link
Copy Markdown
Contributor

This pull request is not ready yet. It is only for showing the process

led02 and others added 30 commits August 1, 2025 21:11
cont.active_ctx is very special with regards to the implementation of a dependency.
This is ugly and tests will be added to test_pyld_utils.
The conversion tests (`_to_python` and `_to_expanded_json`) are pretty long now...
Pull in relevant changes from `develop`
Disable tests that fail due to data model
…or/381-test-ld_container

# Conflicts:
#	poetry.lock

@zyzzyxdonta zyzzyxdonta left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first very high-level comments. Will continue the review another day.

Comment thread conftest.py

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a top-level conftest exist? I think this belongs into test/ or test/hermes_test/

Comment thread pyproject.toml
"pynacl>=1.5.0, <2.0.0",
"rdflib (>=7.1.4,<8.0.0)",
"schemaorg (>=0.1.1,<0.2.0)",
"tomlkit (>=0.14.0,<0.15.0)",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is our second toml dependency (we already had toml. And there is also tomli as a transitive dependency. It might make sense to consolidate this into one implementation at some point, if possible.

Comment thread pyproject.toml
codemeta = "hermes.commands.process.standard_merge:CodemetaProcessPlugin"

[project.entry-points."hermes.curate"]
pass_curate = "hermes.commands.curate.pass_curate:DoNothingCuratePlugin"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"do nothing" and "pass" are two different names for the same thing. We should make up our minds and choose one of the two.

@zyzzyxdonta zyzzyxdonta Jun 3, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Without having read any of the docs,) I find the jargon strange in this module. The module is called context_manager. It contains HermesCache which is a context manager, and HermesContext which is not a context manager but returns context managers. Maybe this could be cleared up by renaming the module. Maybe nothing with context in it as this term is already used for the context of the JSON-LD object.

from .error import HermesContextError


class HermesCache:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this class is only ever instantiated in HermesContext. So it might be a good idea to add a docstring to the module that explains how to get such a cache object (i.e. via HermesContext).

Comment thread src/hermes/model/error.py

class HermesValidationError(Exception):
"""
This exception should be thrown when input validation (e.g., during harvest) occurs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This exception should be thrown when input validation (e.g., during harvest) occurs.
This exception should be raised when an error occurs during input validation (e.g., during harvest).

Comment thread src/hermes/model/error.py

class HermesContextError(Exception):
"""
This exception should be thrown when interacting with the model context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This exception should be thrown when interacting with the model context.
This exception should be raised when interacting with the model context.

) -> None:
"""
Create a new ld_merge_list.
For further information on this function and the errors it throws see :meth:`ld_list.__init__`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For further information on this function and the errors it throws see :meth:`ld_list.__init__`.
For further information on this function and the exceptions it raises see :meth:`ld_list.__init__`.

"""

@classmethod
def vocabulary(cls, base_url: str = "http://spam.eggs/") -> dict:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only example.org and example.com are safe test domains

dict[str, Union["JSON_LD_VALUE", BASIC_TYPE, TIME_TYPE, "ld_dict", "ld_list"]],
]
""" Type description of valid JSON_LD objects that are partially represented by ld_containers """
PYTHONIZED_LD_CONTAINER: TypeAlias = Union[

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does pythonized mean? 😅 The Python representation of the thing?

Comment on lines +107 to +109
for mapping in active_ctx["mappings"].values():
if "@container" in mapping and long_iri:
value = {x: "none" for x in mapping["@container"]}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what this is for.

The and long_iri I guess saves work if no IRI is given. But this could be expressed more clearly by doing if long_iri is None: return long_iri at the beginning of the function, just like _compact_iri() does it.

value overwritten in each new iteration in each iteration because it @container can only exist once? In that case, this would be clearer by breaking out of the loop.

But what does "none" do?

I think this whole module goes deep into pyld internals and a couple of comments would be helpful to understand what is going on.


You should not need to interact with this data directly.
Instead, use {class}`hermes.model.context.HermesContext` and respective subclasses to access the data in a consistent way.
Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid JSON-LD, serialized as JSON

Obviously 😁

Suggested change
Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in
Output of the different `hermes` subcommands consequently are valid JSON-LD files that are cached in

You should not need to interact with this data directly.
Instead, use {class}`hermes.model.context.HermesContext` and respective subclasses to access the data in a consistent way.
Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in
subdirectories of the `.hermes/` directory that is created in the root of the project directory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a little directory tree like this (but updated with current file names):

.hermes
├── harvest
│   ├── cff_contexts.json
│   ├── cff.json
│   ├── file_exists_contexts.json
│   └── file_exists.json
└── process
    ├── hermes.json
    └── tags.json

(tree .hermes in Linux or tree /f .hermes in Powershell on Windows)

Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in
subdirectories of the `.hermes/` directory that is created in the root of the project directory.

The cache is purely for internal purposes, its data should not be interacted with.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the other way around:

Suggested change
The cache is purely for internal purposes, its data should not be interacted with.
The cache should only be interacted with via the `hermes` libraries.


The following sections show how this class works.

##### Creating a data model instance

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

h5 is too many levels down. It's absolutely fine to have multiple paragraphs with multiple sentences each in a subdivision.

Comment thread poetry.lock

@zyzzyxdonta zyzzyxdonta Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the given poetry.lock, the docs don't compile. A lot of our packages are old and incompatible with the ones that aren't pinned. I got it to work like this:

diff --git i/pyproject.toml w/pyproject.toml
index 17ea508..932427b 100644
--- i/pyproject.toml
+++ w/pyproject.toml
@@ -92,13 +92,13 @@ pytest-httpserver = "^1.1.5"
 optional = true
 
 [tool.poetry.group.docs.dependencies]
-Sphinx = "^6.2.1"
+Sphinx = "^8.0.0"
 # Sphinx - Additional modules
-myst-parser = "^2.0.0"
+myst-parser = "^4.0.0"
 sphinx-book-theme = "^1.0.1"
 sphinx-favicon = "^0.2"
 sphinxcontrib-contentui = "^0.2.5"
-sphinxcontrib-images = "^0.9.4"
+sphinxcontrib-images = "^1.0.1"
 sphinx-icon = "^0.1.2"
 sphinx-autobuild = "^2021.3.14"
 sphinx-autoapi = "^3.0.0"

:caption: Injecting additional schemas
from hermes.model import SoftwareMetadata

# Contents served at https://bar.net/schema.jsonld:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only use reserved example domains! https://en.wikipedia.org/wiki/Example.com

it will always be returned in a **list**-like object!
```

The reason for providing data in list-like objects is that JSON-LD treats all property values as arrays.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the related section in the JSON-LD spec would be nice

Comment on lines +236 to +240
As mentioned in the [introduction to the data model](#data-model),
`hermes` uses a JSON-LD-like internal data model.
The API class {class}`hermes.model.SoftwareMetadata` hides many
of the more complex aspects of JSON-LD and makes it easy to work
with the data model.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like a misplaced summary. I would just remove it.

Python data:

```{code-block} python
:caption: Naive containment assertion that raises

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't though. The mock output is from the path where the assertion holds.


## See Also

- API reference: {class}`hermes.model.SoftwareMetadata`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This link never works throughout the whole docs. For some reason, the class is only exposed in the docs as hermes.model.api.SoftwareMetadata

The full code and structure is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git).
This tutorial will present the basic steps for writing additional plugins.

The full code and structure of a harvest plugin is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emphasize that this is an example of a "standalone" (as in out-of-hermes-source) harvest plugin.


The full code and structure of a harvest plugin is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git).
This plugin extracts information from the local git history.
The hermes-plugin-git will help to gather contributing and branch metadata.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The hermes-plugin-git will help to gather contributing and branch metadata.
The hermes-plugin-git will help to gather contribution and branch metadata.

The hermes-plugin-git will help to gather contributing and branch metadata.

```{note}
For this tutorial you should be familiar with HERMES.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For this tutorial you should be familiar with HERMES.
To follow this tutorial you should be familiar with HERMES.

If you never used HERMES before, you might want to check the tutorial: [Automated Publication with HERMES](https://docs.software-metadata.pub/en/latest/tutorials/automated-publication-with-ci.html).
If you never used HERMES before, you might want to check the tutorial: [Automated Publication with HERMES](./automated-publication-with-ci).

Also all metadata directly handled by HERMES is [JSON-LD](https://json-ld.org/) so you should be familiar with that when writing a plugin.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Link to the "JSON-LD for plugin developers" guide instead.

If you never used HERMES before, you might want to check the tutorial: [Automated Publication with HERMES](./automated-publication-with-ci).

Also all metadata directly handled by HERMES is [JSON-LD](https://json-ld.org/) so you should be familiar with that when writing a plugin.
And uses the [schmea.org](https://schema.org/) (with prefix "schema") and the [CodeMeta](https://codemeta.github.io/) (without prefix) context.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
And uses the [schmea.org](https://schema.org/) (with prefix "schema") and the [CodeMeta](https://codemeta.github.io/) (without prefix) context.
And uses the [schema.org](https://schema.org/) (with prefix "schema") and the [CodeMeta](https://codemeta.github.io/) (without prefix) context.

}
```

HERMES would use the {py:class}`~hermes.model.merge.action.Reject` strategy for merging values of the key `full_property_iri` in objects of type `full_type_iri`. (A key in strategies being `None` instead of a string indicates to HERMES that its value is to be used as a default [i.e. if no more specific entry exists].)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels hacky. Have you considered collections.defaultdict?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file feels more like a reference than a tutorial. A tutorial should be "learning by doing". An example could be having the reader re-build the git-plugin from scratch.

See also: https://diataxis.fr/tutorials/


## Configure HERMES to use your plugin

To integrate your plugin, you have to register it as a plugin in the `pyproject.toml`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer calling the thing in the pyproject.toml "entry point". That's also how it's called in the PEP: https://peps.python.org/pep-0621/#entry-points

Suggested change
To integrate your plugin, you have to register it as a plugin in the `pyproject.toml`.
To integrate your plugin, you have to register it as an entry point in the `pyproject.toml`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ also uses the word plugin, I think it would be best to also mention it here, but then use the term "entry point" for consistency and to differentiate the HERMES plugin and the line in the pyproject.toml.

## Configure HERMES to use your plugin

To integrate your plugin, you have to register it as a plugin in the `pyproject.toml`.
To learn more about the `pyproject.toml` check https://python-poetry.org/docs/pyproject/ or refer to [PEP621](https://peps.python.org/pep-0621/).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the Poetry mention. Instead we can link to https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ which gives an agnostic overview.

Comment on lines 293 to 337
@@ -113,59 +313,76 @@ This variant is used to contribute to the HERMES community or adapt the HERMES w
If you want to contribute, see the [Contribution Guidelines](https://docs.software-metadata.pub/en/latest/dev/contribute.html).
After cloning the HERMES workflow repository you can adapt the pyproject.toml.
In the code below you see the parts with the important lines.
```{code-block} toml
```{code-block}
...
[tool.poetry.dependencies]
...
pydantic-settings = "^2.1.0"
hermes-plugin-git = { git = "https://github.com/softwarepub/hermes-plugin-git.git", branch = "main" }
{plugin_package} = { {plugin_name} = "{link_to_your_repo}", branch = "main" }
...
...
[tool.poetry.plugins."hermes.harvest"]
cff = "hermes.commands.harvest.cff:CffHarvestPlugin"
codemeta = "hermes.commands.harvest.codemeta:CodeMetaHarvestPlugin"
git = "hermes_plugin_git.harvest:GitHarvestPlugin"
[tool.poetry.plugins."hermes.{plugin_step}"]
{plugin_name} = "{plugin_package}.{plugin_module}:{plugin_class}"
...
```
In the dependencies you have to install your plugin. If your Plugin is pip installable than you can just give the name and the version.
If your plugin is in a buildable git repository, you can install it with the given expression.
Note that this differs with the accessibility and your wishes, check [Explicit Package Sources](https://python-poetry.org/docs/repositories/#explicit-package-sources).

The second thing to adapt is to declare the access point for the plugin.
You can do that with `git = "hermes_plugin_git.harvest:GitHarvestPlugin"`.
This expression makes the `GitHarvestPlugin` from the `hermes_plugin_git` package, a `hermes.harvest` plugin named `git`.
You can do that with `{plugin_name} = "{plugin_package}.{plugin_module}:{plugin_class}"`.
This expression makes the `plugin_class` from the `plugin_package` package, a `hermes.{plugin_step}` plugin named `plugin_name`.
So you need to configure this line with your plugin properties.

Now you just need to add the plugin to the `hermes.toml` and reinstall the adapted poetry package.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Poetry mentions should be removed. Instead we can use the generic version [project.entry-points."hermes..."] now. All of the Poetry stuff was written here because pyproject.toml support was poor in other tools years ago, when this was written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants