Skip to content

Fix: lecture_014 mask on offs#48

Open
yaohwang wants to merge 107 commits into
gpu-mode:mainfrom
yaohwang:main
Open

Fix: lecture_014 mask on offs#48
yaohwang wants to merge 107 commits into
gpu-mode:mainfrom
yaohwang:main

Conversation

@yaohwang

Copy link
Copy Markdown

the mask in pseudo code should be working.

lancerts and others added 30 commits January 27, 2024 17:50
fix the indices typo.
If tpb.x = tbp.y and blocks.x = blocks.y, then the original way gives the correct result. Otherwise, not correct.
```
    tpb = ns(x=16,y=16)
    blocks = ns(x=math.ceil(w/tpb.x), y=math.ceil(h/tpb.y))
```
…etup

fix: update environment setup for session 4
rectangle matmul with shared memory python
Mark Saroufim and others added 29 commits September 28, 2024 12:52
Triton Internals Slides and Code
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
[refactor] replace hardcoded conda env, remove deprecated FindCUDA.cmake, added logic regarding libtorch download
* Slides/materials for Lecture 31

* Update README.md

---------

Co-authored-by: Mark Saroufim <marksaroufim@meta.com>
* add lecture 33 slides and tutorial links

* add README for folder_033
…de#39)

* docs: add SGLang Performance Optimization GPU MODE talk slide

* upd
add the slide_001 offered in google doc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.