Skip to content

Module names containing null bytes bypass the sys.modules cache #150633

@KowalskiThomas

Description

@KowalskiThomas

Bug report

Bug description:

find_frozen in Python/import.c converts the module name to a C string using PyUnicode_AsUTF8, which returns a \0-terminated pointer into the Unicode object's internal buffer. When the name contains an embedded null byte -- for example "codecs\x00junk" -- the C string passed to look_up_frozen is silently truncated to "codecs", so the frozen table lookup succeeds.

The module is then created and registered in sys.modules under the full key "codecs\x00junk" rather than "codecs", producing a second, independent frozen codecs module that is not the cached one. Every subsequent call with such a name bypasses the cache and allocates a fresh copy of all the module's functions/etc.

The following reproducer shows the inconsistency:

import sys

before = set(sys.modules.keys())
m = __import__('codecs\x00junk')
new_keys = set(sys.modules.keys()) - before
print("Added to sys.modules.keys:", new_keys)
print("Name of the newly added module:", m.__name__)
print("Modules are the same?", m is sys.modules.get('codecs'))

... which shows:

Added to sys.modules.keys: {'codecs\x00junk'}
Name of the newly added module: codecsjunk
Modules are the same? False

This bug was discovered by fuzzing pickle.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)topic-importlibtype-bugAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions