fix: webchat file attachment renamed with UUID instead of original name#8486
fix: webchat file attachment renamed with UUID instead of original name#8486march-7th-mini wants to merge 5 commits into
Conversation
Replace UUID-based file naming with original filenames and add dedup counter.
There was a problem hiding this comment.
Code Review
This pull request changes how files are saved locally in the webchat platform source, switching from using a random UUID to preserving the original filename and appending a counter to handle duplicates. The review highlights a high-severity path traversal vulnerability because the filename is not sanitized before being joined to the destination path, and identifies a redundant splitext computation inside the deduplication loop. A code suggestion is provided to resolve both issues.
| original_name = comp.name or os.path.basename(file_path) | ||
| ext = os.path.splitext(original_name)[1] or "" | ||
| filename = f"{uuid.uuid4()!s}{ext}" | ||
| dest_path = os.path.join(attachments_dir, filename) | ||
| dest_path = os.path.join(attachments_dir, original_name) | ||
| # dedup: if file with same name exists, append a counter | ||
| counter = 1 | ||
| while os.path.exists(dest_path): | ||
| name_part, ext_part = os.path.splitext(original_name) | ||
| dest_path = os.path.join(attachments_dir, f"{name_part}_{counter}{ext_part}") | ||
| counter += 1 |
There was a problem hiding this comment.
Security & Efficiency Review
- Path Traversal Vulnerability (High Severity): Using
comp.namedirectly inos.path.joinallows potential path traversal attacks if the filename contains directory traversal sequences (e.g.,../or..\). This could allow an attacker to write files outside the intendedattachments_dirdirectory. Wrapping it inos.path.basenamesanitizes the filename. - Redundant Computation: Calling
os.path.splitext(original_name)inside thewhileloop is redundant becauseoriginal_namedoes not change. Moving it outside the loop improves performance.
| original_name = comp.name or os.path.basename(file_path) | |
| ext = os.path.splitext(original_name)[1] or "" | |
| filename = f"{uuid.uuid4()!s}{ext}" | |
| dest_path = os.path.join(attachments_dir, filename) | |
| dest_path = os.path.join(attachments_dir, original_name) | |
| # dedup: if file with same name exists, append a counter | |
| counter = 1 | |
| while os.path.exists(dest_path): | |
| name_part, ext_part = os.path.splitext(original_name) | |
| dest_path = os.path.join(attachments_dir, f"{name_part}_{counter}{ext_part}") | |
| counter += 1 | |
| original_name = os.path.basename(comp.name or file_path) | |
| name_part, ext_part = os.path.splitext(original_name) | |
| dest_path = os.path.join(attachments_dir, original_name) | |
| # dedup: if file with same name exists, append a counter | |
| counter = 1 | |
| while os.path.exists(dest_path): | |
| dest_path = os.path.join(attachments_dir, f"{name_part}_{counter}{ext_part}") | |
| counter += 1 |
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- When using the original filename, consider normalizing/sanitizing
original_name(e.g., stripping directory components, disallowing..or path separators, enforcing a max length) beforeos.path.jointo avoid potential path traversal and filesystem issues from untrusted filenames. - To make the dedup logic clearer and slightly more efficient, you could precompute
name_partandext_partonce before thewhileloop and update only the suffix inside the loop rather than callingos.path.splitexton every iteration.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- When using the original filename, consider normalizing/sanitizing `original_name` (e.g., stripping directory components, disallowing `..` or path separators, enforcing a max length) before `os.path.join` to avoid potential path traversal and filesystem issues from untrusted filenames.
- To make the dedup logic clearer and slightly more efficient, you could precompute `name_part` and `ext_part` once before the `while` loop and update only the suffix inside the loop rather than calling `os.path.splitext` on every iteration.
## Individual Comments
### Comment 1
<location path="astrbot/core/platform/sources/webchat/webchat_event.py" line_range="114" />
<code_context>
- # save file to local
+ # save file to local with original name
file_path = await comp.get_file()
original_name = comp.name or os.path.basename(file_path)
- ext = os.path.splitext(original_name)[1] or ""
- filename = f"{uuid.uuid4()!s}{ext}"
</code_context>
<issue_to_address>
**🚨 issue (security):** Consider sanitizing `original_name` to avoid unsafe or unexpected path components.
`comp.name` might include path separators or `..` segments, which could create unintended paths when joined with `attachments_dir`. Normalize `original_name` before use (e.g., wrap it with `os.path.basename` and optionally filter out unsafe characters) to avoid path traversal or malformed paths.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
优化版代码:加了 os.path.basename 防路径穿越 + splitext 提到循环外。本地测试一切正常。
| name_part, ext_part = os.path.splitext(safe_name) | ||
| # dedup: if file with same name exists, append a counter | ||
| counter = 1 | ||
| while os.path.exists(dest_path): |
There was a problem hiding this comment.
这里可以限制循环上限(比如 100_000),防止一些奇怪的问题
There was a problem hiding this comment.
谢谢大佬!空了麻烦您再帮忙看看。
起初我只是觉得AI每次回复的文件都被重命名,就很不方便,
于是我问AstrBot,帮忙修改实现的这个效果,
因为我完全没有任何编程经验,我判断不了这个修改是否符合项目的规范,可能会带来哪些风险。
fix: 限制防重名循环上限为 10000
fix: 降低防重名循环上限至 1000,正常情况下使用足够了,修复 ruff 格式检查。
Replace UUID-based file naming with original filenames and add dedup counter.
修复 WebChat 文件被 UUID 重命名的问题
Fixes #8470
AstrBot WebChat sends files via SSE stream. The
_send()method inwebchat_event.pyuseduuid.uuid4()to rename files, causing them to appear as UUID strings (e.g.c697f0da-198b-49df-abaa-977b1a59e2df.html) instead of their original names. This fix preserves the original filename and adds a counter suffix (_1,_2...) when conflicts occur.AstrBot WebChat 通过 SSE 流发送文件时,
webchat_event.py的_send()方法用uuid.uuid4()重命名文件,导致用户收到的文件名是一串 UUID 乱码而非原始名称。本次修改保留原始文件名,同名文件自动添加_1、_2计数器后缀防冲突。Modifications / 改动点
File modified / 修改的文件:
astrbot/core/platform/sources/webchat/webchat_event.pyChanges / 改动内容:
Replace
uuid.uuid4()with original filename in_send()method's File handlerAdd dedup counter (
_1,_2...) to avoid filename conflicts移除
uuid.uuid4()重命名,改用原始文件名保存添加
_1、_2计数器后缀防同名冲突This is NOT a breaking change. / 这不是一个破坏性变更。
Screenshots or Test Results / 运行截图或测试结果
Test Scenarios / 测试场景:
c697f0da-...htmlfinal_test.htmloffline_test.htmlDisk attachment directory comparison / 附件目录对比:
Checklist / 检查清单
😊 If there are new features added in the PR, I have discussed it with the authors through issues/emails, etc.
/ 如果 PR 中有新加入的功能,已经通过 Issue / 邮件等方式和作者讨论过。
👀 My changes have been well-tested, and "Verification Steps" and "Screenshots" have been provided above.
/ 我的更改经过了良好的测试,并已在上方提供了“验证步骤”和“运行截图”。
🤓 I have ensured that no new dependencies are introduced, OR if new dependencies are introduced, they have been added to the appropriate locations in
requirements.txtandpyproject.toml./ 我确保没有引入新依赖库,或者引入了新依赖库的同时将其添加到
requirements.txt和pyproject.toml文件相应位置。😮 My changes do not introduce malicious code.
/ 我的更改没有引入恶意代码。
Summary by Sourcery
Preserve original filenames for WebChat file attachments while ensuring name uniqueness on disk.
Bug Fixes:
Enhancements: