My Explorations Into Deep Learning: nvcc & PTX: its internal operation and how to generate it

Go to any of the CUDA examples:

https://docs.nvidia.com/cuda/cuda-samples/index.html

And edit the Makefile to modify two things: NVCCFLAGS and gpuarch codes.

First is the NVCCFLAGS: append a "-ptx" to the end.

And the SMS code, which enumerate through all the different GPU architecture, just modify it to a single GPU architecture (for eg, 30 is used here):

And the details is explained here: (https://docs.nvidia.com/cuda/cuda-samples/index.html#getting-cuda-samples)

And then "make" will generate:

"/home/tthtlc/cuda-10.0"/bin/nvcc -ccbin g++ -I../../common/inc -m64 -ptx -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o dxtc.o -c dxtc.cu"/home/tthtlc/cuda-10.0"/bin/nvcc -ccbin g++ -m64 -ptx -gencode arch=compute_35,code=sm_35 -gencode arch=compute_35,code=compute_35 -o dxtc dxtc.o
mkdir -p ../../bin/x86_64/linux/release
cp dxtc ../../bin/x86_64/linux/release

Then edit the dxtc.o file, which consist of all the PTX instructions:

https://pastebin.com/McnJ4vYx

https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/nvcc.pdf

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html

https://en.wikipedia.org/wiki/Parallel_Thread_Execution

and the latest ISA specs is here:

https://docs.nvidia.com/cuda/pdf/ptx_isa_6.4.pdf

For example: