Jun 5, 2019

nvcc & PTX: its internal operation and how to generate it


Go to any of the CUDA examples:


And edit the Makefile to modify two things:   NVCCFLAGS and gpuarch codes.

First is the NVCCFLAGS:  append a "-ptx" to the end. 


And the SMS code, which enumerate through all the different GPU architecture, just modify it to a single GPU architecture (for eg, 30 is used here):




And then "make" will generate:

"/home/tthtlc/cuda-10.0"/bin/nvcc -ccbin g++ -I../../common/inc  -m64 -ptx     -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o dxtc.o -c dxtc.cu"/home/tthtlc/cuda-10.0"/bin/nvcc -ccbin g++   -m64 -ptx       -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o dxtc dxtc.o
mkdir -p ../../bin/x86_64/linux/release
cp dxtc ../../bin/x86_64/linux/release

Then edit the dxtc.o file, which consist of all the PTX instructions:





PTX have a unique features in its memory model:


No comments: