Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make batch matrix multiplication on GPU tunable #5752
Make batch matrix multiplication on GPU tunable #5752
Changes from all commits
e4011ee
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing
unroll_explicit
in fallback schedule. Based on your comment, this might result in a problem at LLVM-based backends.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon looking at this, I should use unroll explicit, not just define it.
What I'm unsure about is whether I need to define it in the fallback - conv2d_direct doesn't define it and it seems to work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm it's interesting. It means this parameter is never effective in those schedules...others like conv2d_direct uses it like:
s[output].pragma(kernel_scope, 'unroll_explicit', cfg['unroll_explicit'].val)
I think you can also fix them if you prefer to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be OK now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry just found that the
unroll_explicit
is still missing in fallback. Also would you mind moving L103 (definetile_k
) up together with other tuning parameters?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As much fun as geeking out over this is
I don't think I need the fallback because
cfg._entity_map
which is what is queried by__getitem__
.I cannot move the definition of
tile_k
to before k is defined.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the first two points matter, we should just remove that parameter to make the tuning space more efficient. Meanwhile, I appreciate the third point that I didn't realize before.
For
tile_k
, I think you should be able to move the definition ofk
up as well. It should be safe because the tuning parameter must be static so it won't depend on other parameters. In this way, we can also put the fallback configs together to make it clearer.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the k and
tile_k
it appears to work and looks much prettier indeed. Thank you for insisting on it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else can I do to make it prettier?