Why is cuMemAddressReserve() failing with CUDA_INVALID_VALUE?

tl;dr: Your reserved region size is not a multiple of (some device’s) allocation granularity.

As @AbatorAbetor suggested, cuMemAddressReserve() implicitly requires the size of the memory region to be a multiple of some granularity value. And despite 0x20000 seeming like a generous enough value for that (2^21 bytes … system memory pages are typically 4 KiB = 2^12 bytes) – NVIDIA GPUs are very demanding here.

For example, a Pascal GTX 1050 Ti GPU with ~4GB of memory has a granularity of 0x200000, or 2 MiB – 16 times more than what you were trying to allocate.

Now, what would happen if we had two devices with different granularity values? Would we need to use the least-common-multiple? Who knows.

Anyway, bottom line: Always check the granularity both before allocating and before reserving.

I have filed this as a documentation bug with NVIDIA, bug 3486420 (but you may not be able to follow the link, because NVIDIA hide their bugs from their users).

