![cudalaunch nvprof cudalaunch nvprof](https://files.speakerdeck.com/presentations/c4210f2fa76e4a59abeb690cf39d3d8c/slide_48.jpg)
Compile To Target cuda -libs=cudnn,cublas home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 7) CUDNN_CONVOLUTION_FWD_ALGO_DIRECT - time: -1 ms, Memory: 0 home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 6) CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING - time: 6.98608 ms, Memory: 9564160
![cudalaunch nvprof cudalaunch nvprof](https://live.staticflickr.com/8606/15864926136_a1f43dd8e7_b.jpg)
![cudalaunch nvprof cudalaunch nvprof](https://files.speakerdeck.com/presentations/c4210f2fa76e4a59abeb690cf39d3d8c/slide_45.jpg)
home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 4) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD_NONFUSED - time: 0.651232 ms, Memory: 626688 home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 3) CUDNN_CONVOLUTION_FWD_ALGO_GEMM - time: 0.413408 ms, Memory: 4608 home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 2) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: 0.36752 ms, Memory: 4608 home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 1) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM - time: 0.35104 ms, Memory: 0 home/nvidia/src/tvm/src/contrib/cudnn/conv_:246: 0) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD - time: 0.289728 ms, Memory: 262144 home/nvidia/src/tvm/src/contrib/cudnn/conv_:243: CUDNN Found 8 fwd algorithms, choosing CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD A fallback configuration is used, which may bring great performance regression. I have tried MXNET_ENGINE_TYPE=NaiveEngine but it seems nothing different NVProf Log