(Draft) New Data Model#480
Conversation
cont.active_ctx is very special with regards to the implementation of a dependency. This is ugly and tests will be added to test_pyld_utils.
The conversion tests (`_to_python` and `_to_expanded_json`) are pretty long now...
This reverts commit 4fe4ffb.
This reverts commit df5b0ef.
This reverts commit d825217.
This reverts commit 8b4f9a3.
Pull in relevant changes from `develop`
Disable tests that fail due to data model
Resolve #383: Test `ld_context`
…or/381-test-ld_container # Conflicts: # poetry.lock
Fix #454: Add end-to-end tests for the plugin-related `SoftwareMetadata` API
…c-api Refactor/423 implement public api
zyzzyxdonta
left a comment
There was a problem hiding this comment.
Some first very high-level comments. Will continue the review another day.
There was a problem hiding this comment.
Should a top-level conftest exist? I think this belongs into test/ or test/hermes_test/
| "pynacl>=1.5.0, <2.0.0", | ||
| "rdflib (>=7.1.4,<8.0.0)", | ||
| "schemaorg (>=0.1.1,<0.2.0)", | ||
| "tomlkit (>=0.14.0,<0.15.0)", |
There was a problem hiding this comment.
This is our second toml dependency (we already had toml. And there is also tomli as a transitive dependency. It might make sense to consolidate this into one implementation at some point, if possible.
| codemeta = "hermes.commands.process.standard_merge:CodemetaProcessPlugin" | ||
|
|
||
| [project.entry-points."hermes.curate"] | ||
| pass_curate = "hermes.commands.curate.pass_curate:DoNothingCuratePlugin" |
There was a problem hiding this comment.
"do nothing" and "pass" are two different names for the same thing. We should make up our minds and choose one of the two.
There was a problem hiding this comment.
(Without having read any of the docs,) I find the jargon strange in this module. The module is called context_manager. It contains HermesCache which is a context manager, and HermesContext which is not a context manager but returns context managers. Maybe this could be cleared up by renaming the module. Maybe nothing with context in it as this term is already used for the context of the JSON-LD object.
| from .error import HermesContextError | ||
|
|
||
|
|
||
| class HermesCache: |
There was a problem hiding this comment.
It seems this class is only ever instantiated in HermesContext. So it might be a good idea to add a docstring to the module that explains how to get such a cache object (i.e. via HermesContext).
|
|
||
| class HermesValidationError(Exception): | ||
| """ | ||
| This exception should be thrown when input validation (e.g., during harvest) occurs. |
There was a problem hiding this comment.
| This exception should be thrown when input validation (e.g., during harvest) occurs. | |
| This exception should be raised when an error occurs during input validation (e.g., during harvest). |
|
|
||
| class HermesContextError(Exception): | ||
| """ | ||
| This exception should be thrown when interacting with the model context. |
There was a problem hiding this comment.
| This exception should be thrown when interacting with the model context. | |
| This exception should be raised when interacting with the model context. |
| ) -> None: | ||
| """ | ||
| Create a new ld_merge_list. | ||
| For further information on this function and the errors it throws see :meth:`ld_list.__init__`. |
There was a problem hiding this comment.
| For further information on this function and the errors it throws see :meth:`ld_list.__init__`. | |
| For further information on this function and the exceptions it raises see :meth:`ld_list.__init__`. |
| """ | ||
|
|
||
| @classmethod | ||
| def vocabulary(cls, base_url: str = "http://spam.eggs/") -> dict: |
There was a problem hiding this comment.
Only example.org and example.com are safe test domains
| dict[str, Union["JSON_LD_VALUE", BASIC_TYPE, TIME_TYPE, "ld_dict", "ld_list"]], | ||
| ] | ||
| """ Type description of valid JSON_LD objects that are partially represented by ld_containers """ | ||
| PYTHONIZED_LD_CONTAINER: TypeAlias = Union[ |
There was a problem hiding this comment.
What does pythonized mean? 😅 The Python representation of the thing?
| for mapping in active_ctx["mappings"].values(): | ||
| if "@container" in mapping and long_iri: | ||
| value = {x: "none" for x in mapping["@container"]} |
There was a problem hiding this comment.
I don't understand what this is for.
The and long_iri I guess saves work if no IRI is given. But this could be expressed more clearly by doing if long_iri is None: return long_iri at the beginning of the function, just like _compact_iri() does it.
value overwritten in each new iteration in each iteration because it @container can only exist once? In that case, this would be clearer by breaking out of the loop.
But what does "none" do?
I think this whole module goes deep into pyld internals and a couple of comments would be helpful to understand what is going on.
|
|
||
| You should not need to interact with this data directly. | ||
| Instead, use {class}`hermes.model.context.HermesContext` and respective subclasses to access the data in a consistent way. | ||
| Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in |
There was a problem hiding this comment.
valid JSON-LD, serialized as JSON
Obviously 😁
| Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in | |
| Output of the different `hermes` subcommands consequently are valid JSON-LD files that are cached in |
| You should not need to interact with this data directly. | ||
| Instead, use {class}`hermes.model.context.HermesContext` and respective subclasses to access the data in a consistent way. | ||
| Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in | ||
| subdirectories of the `.hermes/` directory that is created in the root of the project directory. |
There was a problem hiding this comment.
Maybe add a little directory tree like this (but updated with current file names):
.hermes
├── harvest
│ ├── cff_contexts.json
│ ├── cff.json
│ ├── file_exists_contexts.json
│ └── file_exists.json
└── process
├── hermes.json
└── tags.json
(tree .hermes in Linux or tree /f .hermes in Powershell on Windows)
| Output of the different `hermes` commands consequently is valid JSON-LD, serialized as JSON, that is cached in | ||
| subdirectories of the `.hermes/` directory that is created in the root of the project directory. | ||
|
|
||
| The cache is purely for internal purposes, its data should not be interacted with. |
There was a problem hiding this comment.
Maybe the other way around:
| The cache is purely for internal purposes, its data should not be interacted with. | |
| The cache should only be interacted with via the `hermes` libraries. |
|
|
||
| The following sections show how this class works. | ||
|
|
||
| ##### Creating a data model instance |
There was a problem hiding this comment.
h5 is too many levels down. It's absolutely fine to have multiple paragraphs with multiple sentences each in a subdivision.
There was a problem hiding this comment.
With the given poetry.lock, the docs don't compile. A lot of our packages are old and incompatible with the ones that aren't pinned. I got it to work like this:
diff --git i/pyproject.toml w/pyproject.toml
index 17ea508..932427b 100644
--- i/pyproject.toml
+++ w/pyproject.toml
@@ -92,13 +92,13 @@ pytest-httpserver = "^1.1.5"
optional = true
[tool.poetry.group.docs.dependencies]
-Sphinx = "^6.2.1"
+Sphinx = "^8.0.0"
# Sphinx - Additional modules
-myst-parser = "^2.0.0"
+myst-parser = "^4.0.0"
sphinx-book-theme = "^1.0.1"
sphinx-favicon = "^0.2"
sphinxcontrib-contentui = "^0.2.5"
-sphinxcontrib-images = "^0.9.4"
+sphinxcontrib-images = "^1.0.1"
sphinx-icon = "^0.1.2"
sphinx-autobuild = "^2021.3.14"
sphinx-autoapi = "^3.0.0"| :caption: Injecting additional schemas | ||
| from hermes.model import SoftwareMetadata | ||
|
|
||
| # Contents served at https://bar.net/schema.jsonld: |
There was a problem hiding this comment.
Please only use reserved example domains! https://en.wikipedia.org/wiki/Example.com
| it will always be returned in a **list**-like object! | ||
| ``` | ||
|
|
||
| The reason for providing data in list-like objects is that JSON-LD treats all property values as arrays. |
There was a problem hiding this comment.
Link to the related section in the JSON-LD spec would be nice
| As mentioned in the [introduction to the data model](#data-model), | ||
| `hermes` uses a JSON-LD-like internal data model. | ||
| The API class {class}`hermes.model.SoftwareMetadata` hides many | ||
| of the more complex aspects of JSON-LD and makes it easy to work | ||
| with the data model. |
There was a problem hiding this comment.
This feels like a misplaced summary. I would just remove it.
| Python data: | ||
|
|
||
| ```{code-block} python | ||
| :caption: Naive containment assertion that raises |
There was a problem hiding this comment.
It doesn't though. The mock output is from the path where the assertion holds.
|
|
||
| ## See Also | ||
|
|
||
| - API reference: {class}`hermes.model.SoftwareMetadata` |
There was a problem hiding this comment.
This link never works throughout the whole docs. For some reason, the class is only exposed in the docs as hermes.model.api.SoftwareMetadata
| The full code and structure is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git). | ||
| This tutorial will present the basic steps for writing additional plugins. | ||
|
|
||
| The full code and structure of a harvest plugin is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git). |
There was a problem hiding this comment.
Emphasize that this is an example of a "standalone" (as in out-of-hermes-source) harvest plugin.
|
|
||
| The full code and structure of a harvest plugin is available at [hermes-plugin-git](https://github.com/softwarepub/hermes-plugin-git). | ||
| This plugin extracts information from the local git history. | ||
| The hermes-plugin-git will help to gather contributing and branch metadata. |
There was a problem hiding this comment.
| The hermes-plugin-git will help to gather contributing and branch metadata. | |
| The hermes-plugin-git will help to gather contribution and branch metadata. |
| The hermes-plugin-git will help to gather contributing and branch metadata. | ||
|
|
||
| ```{note} | ||
| For this tutorial you should be familiar with HERMES. |
There was a problem hiding this comment.
| For this tutorial you should be familiar with HERMES. | |
| To follow this tutorial you should be familiar with HERMES. |
| If you never used HERMES before, you might want to check the tutorial: [Automated Publication with HERMES](https://docs.software-metadata.pub/en/latest/tutorials/automated-publication-with-ci.html). | ||
| If you never used HERMES before, you might want to check the tutorial: [Automated Publication with HERMES](./automated-publication-with-ci). | ||
|
|
||
| Also all metadata directly handled by HERMES is [JSON-LD](https://json-ld.org/) so you should be familiar with that when writing a plugin. |
There was a problem hiding this comment.
Link to the "JSON-LD for plugin developers" guide instead.
| If you never used HERMES before, you might want to check the tutorial: [Automated Publication with HERMES](./automated-publication-with-ci). | ||
|
|
||
| Also all metadata directly handled by HERMES is [JSON-LD](https://json-ld.org/) so you should be familiar with that when writing a plugin. | ||
| And uses the [schmea.org](https://schema.org/) (with prefix "schema") and the [CodeMeta](https://codemeta.github.io/) (without prefix) context. |
There was a problem hiding this comment.
| And uses the [schmea.org](https://schema.org/) (with prefix "schema") and the [CodeMeta](https://codemeta.github.io/) (without prefix) context. | |
| And uses the [schema.org](https://schema.org/) (with prefix "schema") and the [CodeMeta](https://codemeta.github.io/) (without prefix) context. |
| } | ||
| ``` | ||
|
|
||
| HERMES would use the {py:class}`~hermes.model.merge.action.Reject` strategy for merging values of the key `full_property_iri` in objects of type `full_type_iri`. (A key in strategies being `None` instead of a string indicates to HERMES that its value is to be used as a default [i.e. if no more specific entry exists].) |
There was a problem hiding this comment.
This feels hacky. Have you considered collections.defaultdict?
There was a problem hiding this comment.
This whole file feels more like a reference than a tutorial. A tutorial should be "learning by doing". An example could be having the reader re-build the git-plugin from scratch.
See also: https://diataxis.fr/tutorials/
|
|
||
| ## Configure HERMES to use your plugin | ||
|
|
||
| To integrate your plugin, you have to register it as a plugin in the `pyproject.toml`. |
There was a problem hiding this comment.
I would prefer calling the thing in the pyproject.toml "entry point". That's also how it's called in the PEP: https://peps.python.org/pep-0621/#entry-points
| To integrate your plugin, you have to register it as a plugin in the `pyproject.toml`. | |
| To integrate your plugin, you have to register it as an entry point in the `pyproject.toml`. |
There was a problem hiding this comment.
As https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ also uses the word plugin, I think it would be best to also mention it here, but then use the term "entry point" for consistency and to differentiate the HERMES plugin and the line in the pyproject.toml.
| ## Configure HERMES to use your plugin | ||
|
|
||
| To integrate your plugin, you have to register it as a plugin in the `pyproject.toml`. | ||
| To learn more about the `pyproject.toml` check https://python-poetry.org/docs/pyproject/ or refer to [PEP621](https://peps.python.org/pep-0621/). |
There was a problem hiding this comment.
I would remove the Poetry mention. Instead we can link to https://packaging.python.org/en/latest/guides/writing-pyproject-toml/ which gives an agnostic overview.
| @@ -113,59 +313,76 @@ This variant is used to contribute to the HERMES community or adapt the HERMES w | |||
| If you want to contribute, see the [Contribution Guidelines](https://docs.software-metadata.pub/en/latest/dev/contribute.html). | |||
| After cloning the HERMES workflow repository you can adapt the pyproject.toml. | |||
| In the code below you see the parts with the important lines. | |||
| ```{code-block} toml | |||
| ```{code-block} | |||
| ... | |||
| [tool.poetry.dependencies] | |||
| ... | |||
| pydantic-settings = "^2.1.0" | |||
| hermes-plugin-git = { git = "https://github.com/softwarepub/hermes-plugin-git.git", branch = "main" } | |||
| {plugin_package} = { {plugin_name} = "{link_to_your_repo}", branch = "main" } | |||
| ... | |||
| ... | |||
| [tool.poetry.plugins."hermes.harvest"] | |||
| cff = "hermes.commands.harvest.cff:CffHarvestPlugin" | |||
| codemeta = "hermes.commands.harvest.codemeta:CodeMetaHarvestPlugin" | |||
| git = "hermes_plugin_git.harvest:GitHarvestPlugin" | |||
| [tool.poetry.plugins."hermes.{plugin_step}"] | |||
| {plugin_name} = "{plugin_package}.{plugin_module}:{plugin_class}" | |||
| ... | |||
| ``` | |||
| In the dependencies you have to install your plugin. If your Plugin is pip installable than you can just give the name and the version. | |||
| If your plugin is in a buildable git repository, you can install it with the given expression. | |||
| Note that this differs with the accessibility and your wishes, check [Explicit Package Sources](https://python-poetry.org/docs/repositories/#explicit-package-sources). | |||
|
|
|||
| The second thing to adapt is to declare the access point for the plugin. | |||
| You can do that with `git = "hermes_plugin_git.harvest:GitHarvestPlugin"`. | |||
| This expression makes the `GitHarvestPlugin` from the `hermes_plugin_git` package, a `hermes.harvest` plugin named `git`. | |||
| You can do that with `{plugin_name} = "{plugin_package}.{plugin_module}:{plugin_class}"`. | |||
| This expression makes the `plugin_class` from the `plugin_package` package, a `hermes.{plugin_step}` plugin named `plugin_name`. | |||
| So you need to configure this line with your plugin properties. | |||
|
|
|||
| Now you just need to add the plugin to the `hermes.toml` and reinstall the adapted poetry package. | |||
There was a problem hiding this comment.
Same here. Poetry mentions should be removed. Instead we can use the generic version [project.entry-points."hermes..."] now. All of the Poetry stuff was written here because pyproject.toml support was poor in other tools years ago, when this was written.
This pull request is not ready yet. It is only for showing the process