BiocPy · jkanche · Jun 22, 2026 · Jun 21, 2026 · Jun 21, 2026 · Jun 21, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,9 +1,11 @@
 # Changelog
 
-## Version 0.10.0
+## Version 0.10.0 - 0.10.1
 
 - Added methods to write to RDS/RData files.
 - Supports atomic types, generic dictionaries/lists, and **BiocPy objects**.
+- Read `symbols` registered in RDS objects.
+- Fixed an issue with S4 classes not properly saved as RDS files.
 
 ## Version 0.9.0 - 0.9.1
 

diff --git a/README.md b/README.md
@@ -4,101 +4,141 @@
 
 # rds2py
 
-Parse and save Python objects as **RDS or RData** files. `rds2py` supports various base classes from R, and Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` S4 classes. **_For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp)._**
+`rds2py` allows you to read and write R's native **RDS** and **RData** files directly in Python. Beyond standard R types, it provides integration with the [BiocPy](https://github.com/biocpy) ecosystem, allowing you to easily roundtrip complex S4 data structures like `SummarizedExperiment`, `SingleCellExperiment`, and `GenomicRanges`. **_For more details, check out [rds2cpp library](https://github.com/LTLA/rds2cpp)._**
 
 ## Installation
 
 Package is published to [PyPI](https://pypi.org/project/rds2py/)
 
 ```shell
 pip install rds2py
+```
+
+To enable automatic conversion to Bioconductor/BiocPy classes, make sure to install the optional dependencies:
 
-# or install optional dependencies
+```shell
 pip install rds2py[optional]
 ```
 
-By default, the package does not install packages to convert python representations to BiocPy classes. Please consider installing all optional dependencies.
 
-## Usage
+## Quickstart
 
-> [!NOTE]
->
-> If you do not have an RDS object handy, feel free to download one from [single-cell-test-files](https://github.com/jkanche/random-test-files/releases).
+### 1. Reading RDS and RData files
+
+Reading an RDS or RData file is as simple as a single function call. `rds2py` automatically detects and maps known R/Bioconductor classes to their Python equivalents:
 
 ```python
 from rds2py import read_rds, read_rda
-r_obj = read_rds("path/to/file.rds") # or read_rda("path/to/file.rda")
+
+# Read an RDS file (returns a Python/BiocPy object or dict)
+data = read_rds("path/to/file.rds")
+
+# Read objects from an RData workspace file (returns a dictionary of objects)
+workspace = read_rda("path/to/workspace.rda")
 ```
 
-The returned `r_obj` either returns an appropriate Python class if a parser is already implemented or returns the dictionary containing the data from the RDS file.
+If `rds2py` encounters an S4 class or complex R structure it doesn't have a parser registered for, it falls back to returning a dictionary so you don't lose any data.
 
-### Save RDS/RData files
+### 2. Saving to RDS and RData files
 
-You can also construct RDS or RData files from Python objects. `rds2py` supports writing atomic types, generic dictionaries/lists, and **BiocPy objects**.
+You can serialize Python objects back to RDS or RData formats. This includes NumPy arrays, SciPy sparse matrices, standard dictionaries/lists, and BiocPy objects:
 
 ```python
-from rds2py import write_rds, write_rda
 import numpy as np
-
-# Write atomic types
-write_rds(np.array([1, 2, 3], dtype=np.int32), "path/to/file.rds")
-
-# Write complex objects
+from rds2py import write_rds, write_rda
 from genomicranges import GenomicRanges
 from iranges import IRanges
 
-gr = GenomicRanges(
-    seqnames=["chr1", "chr2"],
-    ranges=IRanges(start=[1, 2], width=[10, 20]),
-    strand=["+", "-"]
-)
-write_rds(gr, "path/to/granges.rds")
+# 1. Write an atomic NumPy array
+write_rds(np.array([10, 20, 30], dtype=np.int32), "array.rds")
+
+# 2. Write a complex Bioconductor GenomicRanges object
+gr = GenomicRanges(seqnames=["chr1", "chr2"], ranges=IRanges(start=[1, 100], width=[10, 50]), strand=["+", "-"])
+write_rds(gr, "genomic_ranges.rds")
+
+# 3. Write multiple Python objects into a single RData workspace
+objects = {"my_array": np.array([1.1, 2.2, 3.3]), "my_granges": gr}
+write_rda(objects, "workspace.rda")
 ```
 
-### Write-your-own-reader
+### 3. Custom Extensions
 
-Reading RDS or RData files as dictionary representations allows users to write their own custom readers into appropriate Python representations.
+If you have custom S4 representations or class mapping needs, you can parse the raw RDS structure into Python dictionary representations using `parse_rds`/`parse_rda` and apply your custom deserializers:
 
 ```python
-from rds2py import parse_rds, parse_rda
+from rds2py import parse_rds
+from rds2py.read_granges import read_genomic_ranges
+
+# 1. Parse into a raw dictionary representation of the RDS tree
+raw_dict = parse_rds("path/to/file.rds")
+print(raw_dict.keys())  # ['type', 'class_name', 'attributes', 'data', ...]
 
-robject = parse_rds("path/to/file.rds") # or use parse_rda for rdata files
-print(robject)
+# 2. Build or invoke custom parser logic
+if raw_dict.get("class_name") == "GRanges":
+    gr = read_genomic_ranges(raw_dict)
+    print(gr)
 ```
 
-If you know this RDS file contains an `GenomicRanges` object, you can use the built-in reader or write your own reader to convert this dictionary.
+For writing custom objects, you can register your classes to `rds2py`'s serialization registry using the `save_rds` singledispatch generic:
 
 ```python
-from rds2py.read_granges import read_genomic_ranges
+from rds2py.generics import save_rds
+
 
-gr = read_genomic_ranges(robject)
-print(gr)
+class MyCustomClass:
+    def __init__(self, value):
+        self.value = value
+
+
+@save_rds.register(MyCustomClass)
+def _serialize_custom(x: MyCustomClass, path=None):
+    # Construct the raw RDS dictionary representation expected by rds2cpp
+    converted = {
+        "type": "integer",
+        "data": [x.value],
+        "attributes": {"class": {"type": "string", "data": ["MyCustomRClass"]}},
+    }
+
+    # Optionally save if path is provided, otherwise return representation
+    if path is not None:
+        from rds2py.lib_rds_parser import write_rds as write_rds_native
+
+        write_rds_native(converted, path)
+    return converted
 ```
 
+
 ## Type Conversion Reference
 
-| R Type     | Python/NumPy Type                    |
-| ---------- | ------------------------------------ |
-| numeric    | numpy.ndarray (float64)              |
-| integer    | numpy.ndarray (int32)                |
-| character  | list of str                          |
-| logical    | numpy.ndarray (bool)                 |
-| factor     | list                                 |
-| data.frame | BiocFrame                            |
-| matrix     | numpy.ndarray or scipy.sparse matrix |
-| dgCMatrix  | scipy.sparse.csc_matrix              |
-| dgRMatrix  | scipy.sparse.csr_matrix              |
-
-and integration with BiocPy ecosystem for Bioconductor classes
-  - SummarizedExperiment
-  - RangedSummarizedExperiment
-  - SingleCellExperiment
-  - GenomicRanges
-  - MultiAssayExperiment
+The table below describes how core R types are mapped to Python/NumPy/SciPy counterparts:
+
+| R Type / Class | Python / NumPy / SciPy Counterpart |
+| :--- | :--- |
+| **numeric** | `numpy.ndarray` (`float64`) |
+| **integer** | `numpy.ndarray` (`int32`) |
+| **logical** | `numpy.ndarray` (`bool`) |
+| **character** | `list` of `str` |
+| **factor** | `list` / representation levels |
+| **matrix (dense)** | `numpy.ndarray` |
+| **dgCMatrix** (Column-sparse) | `scipy.sparse.csc_matrix` |
+| **dgRMatrix** (Row-sparse) | `scipy.sparse.csr_matrix` |
+| **data.frame** / **DFrame** | `biocframe.BiocFrame` |
+
+### Supported Bioconductor Classes
+When `rds2py[optional]` is installed, the package fully translates R/S4 classes to their BiocPy equivalents:
+- **GenomicRanges** / **GRanges** <-> `genomicranges.GenomicRanges`
+- **GenomicRangesList** / **GRangesList** <-> `genomicranges.CompressedGenomicRangesList`
+- **SummarizedExperiment** <-> `summarizedexperiment.SummarizedExperiment`
+- **RangedSummarizedExperiment** <-> `summarizedexperiment.RangedSummarizedExperiment`
+- **SingleCellExperiment** <-> `singlecellexperiment.SingleCellExperiment`
+- **MultiAssayExperiment** <-> `multiassayexperiment.MultiAssayExperiment`
+
+---
 
 ## Developer Notes
 
-This project uses pybind11 to provide bindings to the rds2cpp library. Please make sure necessary C++ compiler is installed on your system.
+- `rds2py` uses `pybind11` to bind the core C++ `rds2cpp` library. Compiling from source requires a compatible C++ compiler.
+- Tests can be run via `tox` or directly using `pytest`.
 
 <!-- pyscaffold-notes -->
 

diff --git a/docs/custom_serialization.md b/docs/custom_serialization.md
@@ -0,0 +1,138 @@
+# Custom Serialization and Deserialization Guide
+
+This guide shows you how to extend `rds2py` to support custom Python classes. By implementing custom readers and writers, you can serialize your custom Python representations directly into native R RDS/RData structures, and read them back seamlessly.
+
+`rds2py` achieves this two-way extensibility using:
+1. Python's `functools.singledispatch` mechanism for writing/serialization (`save_rds`).
+2. A global class mapping registry for reading/deserialization (`read_rds`).
+
+---
+
+## 1. Custom Serialization (Python -> RDS)
+
+To serialize a custom Python class, you register it with the `save_rds` generic dispatcher. Your custom function needs to take your object and convert it into a structured dictionary that matches R's internal representation format.
+
+### The Structured RDS Representation Format
+R objects are represented in Python as nested dictionaries containing the following keys:
+- `"type"`: The R type descriptor (e.g., `"S4"`, `"vector"`, `"integer"`, `"double"`, `"string"`, `"logical"`, or `"null"`).
+- `"class_name"`: The target R class name (e.g., `"MyCustomRClass"`).
+- `"package_name"`: *(Optional, for S4 classes)* The name of the R package where the class is defined.
+- `"attributes"`: A dictionary representing R attributes or S4 slots. Each slot value must also be a structured representation dictionary.
+- `"data"`: The flat list or array of values for vector/atomic types.
+
+### Example: Implementing a Custom Serializer
+
+Let's say we have a custom Python class named `MyFeature`:
+
+```python
+class MyFeature:
+    def __init__(self, name: str, values: list):
+        self.name = name
+        self.values = values
+```
+
+To serialize `MyFeature` as a native R S4 class called `"MyCustomRClass"` from package `"MyRPackage"`, we register it using `@save_rds.register`:
+
+```python
+from typing import Optional
+from rds2py import save_rds
+
+
+@save_rds.register(MyFeature)
+def _save_rds_myfeature(x: MyFeature, path: Optional[str] = None):
+    # Native C++ writer call
+    from rds2py.lib_rds_parser import write_rds as write_rds_native
+
+    # 1. Structure the Python object into the expected R dictionary format
+    converted = {
+        "type": "S4",
+        "class_name": "MyCustomRClass",
+        "package_name": "MyRPackage",
+        "attributes": {
+            # Recursively call save_rds to serialize internal elements
+            "featureName": save_rds(x.name),
+            "featureValues": save_rds(x.values),
+        },
+    }
+
+    # 2. If a save path is specified, write directly using the native writer
+    if path is not None:
+        write_rds_native(converted, path)
+
+    return converted
+```
+
+---
+
+## 2. Custom Deserialization (RDS -> Python)
+
+To read custom S4 objects back into Python classes via `read_rds`, you need to:
+1. Write a deserialization function that constructs your Python class from the raw parsed dictionary.
+2. Register your deserializer function in `rds2py`'s global class mapping registry.
+
+### Example: Implementing the Reader
+
+```python
+from rds2py.generics import _dispatcher
+from rds2py.rdsutils import get_class
+
+
+def read_my_custom_class(robject: dict, **kwargs) -> MyFeature:
+    # 1. Verify the incoming R class name
+    cls_name = get_class(robject)
+    if cls_name != "MyCustomRClass":
+        raise ValueError(f"Expected class 'MyCustomRClass', but received '{cls_name}'")
+
+    # 2. Extract and parse the slots recursively
+    # We call the internal _dispatcher helper to parse child structures
+    feature_name = _dispatcher(robject["attributes"]["featureName"], **kwargs)
+    feature_values = _dispatcher(robject["attributes"]["featureValues"], **kwargs)
+
+    # 3. Instantiate and return your custom Python class
+    return MyFeature(name=feature_name, values=list(feature_values))
+```
+
+### Registering the Reader
+Map your class name to the reader function in the global class registry (`REGISTRY` from `rds2py.generics`):
+
+```python
+from rds2py.generics import REGISTRY
+
+# Register our custom deserializer in the global map
+REGISTRY["MyCustomRClass"] = read_my_custom_class
+```
+
+---
+
+## 3. Full Roundtrip
+
+Here is how the entire custom serialization and deserialization workflow works together:
+
+```python
+import tempfile
+import os
+from rds2py import write_rds, read_rds
+
+# 1. Create a custom instance
+feature = MyFeature(name="expression_level", values=[10, 20, 30])
+
+# 2. Serialize to a temporary RDS file
+with tempfile.NamedTemporaryFile(suffix=".rds", delete=False) as tmp:
+    path = tmp.name
+
+try:
+    # Write custom class to RDS format
+    write_rds(feature, path)
+
+    # Read the RDS file back into Python
+    recreated = read_rds(path)
+
+    # 3. Verify that the roundtrip correctly recreated the custom class
+    assert isinstance(recreated, MyFeature)
+    assert recreated.name == "expression_level"
+    assert recreated.values == [10, 20, 30]
+    print("Roundtrip validation successful!")
+finally:
+    if os.path.exists(path):
+        os.unlink(path)
+```
diff --git a/docs/index.md b/docs/index.md
@@ -1,24 +1,31 @@
-# rds2py
+# rds2py: R Serialization Formats in Python
 
-Parse, extract and create Python representations for datasets stored in RDS files. It supports Bioconductor's `SummarizedExperiment` and `SingleCellExperiment` objects. This is possible because of [Aaron's rds2cpp library](https://github.com/LTLA/rds2cpp).
+`rds2py` is designed to parse, extract, and write R data formats (RDS and RData) directly in Python. It provides native, out-of-the-box integration with the [BiocPy](https://github.com/biocpy) ecosystem, allowing seamless roundtripping of complex S4 datasets like `SummarizedExperiment`, `SingleCellExperiment`, and `GenomicRanges`.
 
-The package uses memory views (except for strings) so that we can access the same memory from C++ space in Python (through Cython of course). This is especially useful for large datasets so we don't make copies of data.
+This library is built on top of [Aaron Lun's rds2cpp library](https://github.com/LTLA/rds2cpp).
 
-## Install
+## Installation
 
-Package is published to [PyPI](https://pypi.org/project/rds2py/)
+`rds2py` is available on [PyPI](https://pypi.org/project/rds2py/):
 
 ```shell
 pip install rds2py
 ```
 
-## Contents
+To enable full conversion support for Bioconductor/BiocPy classes, consider installing the optional dependencies:
+
+```shell
+pip install rds2py[optional]
+```
+
+## Table of Contents
 
 ```{toctree}
 :maxdepth: 2
 
 Overview <readme>
 Tutorial <tutorial>
+Custom Serialization Guide <custom_serialization>
 Contributions & Help <contributing>
 License <license>
 Authors <authors>