NVidia Driver 295.20 and the RT-Preempt Patch
The RT-Preempt patches enable full support of real-time preemption for the Linux kernels. Since there is much information on the web about the purpose of the RT-Preempt patches, e.g., the OSADL (Open Source Automation Development Lab) project page, including instructions how to build and install your own real-time kernel, I will focus on the problem when using such a kernel with the latest NVidia drivers.
Following the instructions, you'll end up with a patched kernel 18.104.22.168.2-rt30. But when you'll try to build the kernel module of the latest NVidia driver 295.20, it won't compile. The reason for this incompatibility is that the type atomic_spinlock_t is no longer available and has to be replaced by a raw_spinlock_t. In addition, all the functions were renamed too and have to be changed accordingly. After these simple changes, the module will compile, but when it is loaded and the X-Server uses it, you will get a lot of scheduling error messages in the kernel logs. After some research and experiments, I'll replaced the semaphores used in the kernel driver with mutexes, silencing the kernel log finally. All necessary changes are put in the nvidia-rt-compat.patch, that should be applied to the kernel driver:
patch -p1 < nvidia-rt-compat.patch
When analyzing the scheduling performance there was one big curiosity caused by the NVidia driver: Whenever a OpenGL or CUDA application was started or quit, there was a high latency up to several milliseconds. Looking again through the kernel module source, I found the reason for this problem: The kernel module calls the wbinvd instruction, that invalidates the caches of all CPUs, forcing them to flush the caches and read everything again. As caches get larger and larger and the number of CPU cores increases too, this makes a huge impact on the performance since all CPUs actually stall during this operation. To remove this instruction, apply the attached nvidia-rt-no-wbinvd.patch:
patch -p1 < nvidia-rt-no-wbinvd.patch
I have tested these patches on an Ubuntu 10.04 LTS 32- and 64-Bit platform on different hardware (mostly Core2 Duo and Core i7) and haven't found any other problems regarding the real-time performance.