Add a combine batch_matmul pass #5791

t-vi · 2020-06-12T18:48:51Z

Contrary what you might expect, this doesn't share as much code with the combine dense pass as it does with the combine 2d conv pass. This is because it concatenates the "output feature" dimensions just like the 2d conv pass concatenates output channels, whereas combine dense stacks the various matmul arguments.

I'm not sure if there is a deeper reason not to concatenate for dense, too, but maybe there is.

Contrary what you might expect, this doesn't share as much code with the combine dense pass as it does with the combine 2d conv pass. This is because it concatenates the "output feature" dimensions.

comaniac · 2020-06-15T03:51:10Z

This might be off the topic. I think it's possible to use the pattern language to replace all "combine parallel XX op" passes. Maybe we could create an issue to check it.

cc @mbrookhart @tqchen

mbrookhart · 2020-06-15T16:35:36Z

@t-vi Interesting PR! Thank you for submitting it! I presume the use case for this is in Transformer-like models? Do you see a perf benefit from the rewrite?

This might be off the topic. I think it's possible to use the pattern language to replace all "combine parallel XX op" passes. Maybe we could create an issue to check it.

cc @mbrookhart @tqchen

I would imagine yes, it would be pretty easy to implement this as a pattern and a rewrite. A pattern solution might have some complication though, I don't think we currently have a way to match something with 2 or 3 or 4 branches in the same pattern, it would require a number of patterns. I'll think about that.

t-vi · 2020-06-15T17:09:17Z

@mbrookhart Yes, my use-case is transformers. The PyTorch frontend translates the matmul used in HuggingFace transformer's BERT into batch_matmul. The speedup is 1.5x-2x-ish on ROCm (gfx906) and also some on a GTX1080Ti even though it currently hits a reshape right after batch_matmul. I don't quite reach the speed of ONNXRuntime yet.
I'm currently preparing a detailed writeup (and that's the pattern of my recent PRs - tuneable BMM, this, support for integers and other non-float32 PyTorch frontend).

I imagine that it would be cool to move the pass to a pattern-matching. I would expect that it would replace the code shared by the combine passes of batch_matmul and conv2d (and to some extend the dense combiner rather than the part that's separate. I have been wondering about the efficiency of dense btw. - it mentions BERT as a use-case in the code comments but it is unclear to me whether the dense -> batch_matmul with "duplicated" (possibly stride 0) input is better than dense -> dense with non-contiguous results (though the columns would still be and only the rows would be interleaved between the ops). But then I haven't looked a lot at how TVM deals with strides (which is relatively significant because the self-attention typically has some reshapes that would be nice to fuse).

mbrookhart

LGTM

tqchen · 2020-06-17T02:40:47Z

cc @vinx13 please also help to take a look

python/tvm/relay/transform/transform.py

* Add a combine batch_matmul pass Contrary what you might expect, this doesn't share as much code with the combine dense pass as it does with the combine 2d conv pass. This is because it concatenates the "output feature" dimensions. * fix docstring

Add a combine batch_matmul pass

4161dc0

Contrary what you might expect, this doesn't share as much code with the combine dense pass as it does with the combine 2d conv pass. This is because it concatenates the "output feature" dimensions.

t-vi closed this Jun 12, 2020

t-vi reopened this Jun 14, 2020

mbrookhart approved these changes Jun 16, 2020

View reviewed changes

tqchen assigned vinx13 Jun 17, 2020

vinx13 reviewed Jun 17, 2020

View reviewed changes

python/tvm/relay/transform/transform.py Outdated Show resolved Hide resolved

python/tvm/relay/transform/transform.py Outdated Show resolved Hide resolved

fix docstring

277fa0a

vinx13 approved these changes Jun 17, 2020

View reviewed changes

vinx13 merged commit 052ea4d into apache:master Jun 17, 2020

vinx13 added the status: accepted label Jun 17, 2020

ZihengJiang mentioned this pull request Sep 25, 2020

TVM v0.7 Release Note Candidate #6486

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a combine batch_matmul pass #5791

Add a combine batch_matmul pass #5791

t-vi commented Jun 12, 2020

comaniac commented Jun 15, 2020

mbrookhart commented Jun 15, 2020

t-vi commented Jun 15, 2020

mbrookhart left a comment

tqchen commented Jun 17, 2020

Add a combine batch_matmul pass #5791

Add a combine batch_matmul pass #5791

Conversation

t-vi commented Jun 12, 2020

comaniac commented Jun 15, 2020

mbrookhart commented Jun 15, 2020

t-vi commented Jun 15, 2020

mbrookhart left a comment

Choose a reason for hiding this comment

tqchen commented Jun 17, 2020