BugZero | Dell BugID 252982 - PowerEdge: NVIDIA Driver Error: nvidia-smi has fai...

OPERATIONAL DEFECT DATABASE

...

BugZero | Dell BugID 252982 - PowerEdge: NVIDIA Driver Error: nvidia-smi has fai...

Dell - Defect ID: 252982

PowerEdge: NVIDIA Driver Error: nvidia-smi has failed because it could not communicate with the NVIDIA driver

Dell - Defect ID: 252982

PowerEdge: NVIDIA Driver Error: nvidia-smi has failed because it could not communicate with the NVIDIA driver

Last updated on November 8th, 2025

BugZero Risk Score
0.0 Coming soon

Overall: N/A

Severity: N/A

Community: N/A

Lifecycle: N/A

What is the BugZero Risk Score?

Dell Integration

Learn more about where this data comes from

Dell Integration

Learn more

Bug Scrub Advisor

Streamline upgrades with automated vendor bug scrubs

Bug Scrub Advisor

Learn more

BugZero Enterprise

Wish you caught this bug sooner? Get proactive today.

BugZero Enterprise

Learn more

Bug Details

Support Case Count: 14
Article View Count: 293
Impact Category:

Description

Symptoms

The nvidia-smi command fails to run and returns the error message: nvidia-smi has failed because it could not communicate with the NVIDIA driver. NVIDIA GPU information is not displayed when running nvidia-smi . nvidia-smi has failed because it could not communicate with the NVIDIA driver NVRM: nvidia_ctl_session_announce failed as driver unload is in progress.

Cause

The error " nvidia-smi has failed because it could not communicate with the NVIDIA driver " can be caused by several factors: NVIDIA Driver Not Installed or Corrupted: The NVIDIA driver may not be installed on the system, or the installation could be corrupted, causing the nvidia-smi tool to fail when trying to interact with the GPU. Driver Incompatibility: The version of the NVIDIA driver installed may not be compatible with the GPU or the operating system, leading to communication issues. NVIDIA Kernel Module Not Loaded: The required NVIDIA kernel module ( nvidia.ko ) may not be loaded into the system, preventing proper communication between the nvidia-smi tool and the GPU. GPU Initialization Failure: The GPU might not have been initialized properly during boot or due to a hardware failure, which means nvidia-smi cannot establish communication with it. Conflicting Driver Versions: Conflicting or multiple GPU drivers (for example, Nouveau open-source driver or older NVIDIA driver versions) may be installed, causing the system to fail to load the correct NVIDIA driver. Faulty Hardware: There could be a hardware issue with the GPU itself, such as a physical malfunction, overheating, or improper connection, preventing the system from accessing it. Missing or Expired NVIDIA License (for vGPU setups): In virtualized environments, a missing or expired NVIDIA vGPU license can prevent the driver from functioning properly, leading to communication failures. System Updates or Kernel Changes: Recent updates to the operating system or kernel changes may have affected the compatibility or functionality of the NVIDIA driver, causing it to fail. To resolve this, check the driver installation, verify that the correct driver is loaded, and ensure that the hardware and software are compatible.

Resolution

Step-by-Step Guide to Enable vGPU in ESXi 7.0 and Later: Install the NVIDIA vGPU Manager: Download the latest NVIDIA vGPU Manager for VMware ESXi from the NVIDIA website . Use SSH to access the ESXi host or the ESXi Shell to install the vGPU Manager package. Install the NVIDIA vGPU Drivers in the Virtual Machines (VMs): For each VM using vGPU, install the appropriate NVIDIA GPU driver in the guest operating system (for example, Windows, Linux). Download the drivers from the NVIDIA website for the specific operating system. Install the drivers inside the VM as you would on a physical machine. Reboot the ESXi Host: After installing the NVIDIA vGPU Manager, reboot the ESXi host for the changes to take effect. Check if the NVIDIA Driver is Loaded: Run the command: esxcli system module list | grep nvidia This checks whether the NVIDIA kernel module is loaded. Manually Load the NVIDIA Driver (if not loaded): If the NVIDIA module is not loaded, you can manually load it by running: esxcli system module load --module=nvidia Enable Hardware Virtualization (if not enabled): Log in to the ESXi host over the ESXi Host Client or vSphere Client. Check that Intel VT-x or AMD-V is enabled in the BIOS/UEFI of the physical server. These options are required for virtualization. Check if the NVIDIA GPU is Detected: Run the command: lspci | grep -i nvidia This checks if the NVIDIA GPU is detected by ESXi. Check System Logs for Errors: Use the command to find specific error messages related to the NVIDIA driver: tail -f /var/log/vmkernel.log Check NVIDIA-Specific Logs: Review the NVIDIA-specific logs located at: /var/log/nvidia-installer.log Configure vGPU in vSphere: Open the vSphere Client and navigate to your ESXi host. Right-click the VM that uses vGPU and select Edit Settings . In the VM Hardware tab, click Add New Device and select PCI Device . Choose the NVIDIA GPU (vGPU) you want to assign to the VM. Select the wanted vGPU Profile (for example, GRID, vComputeServer, so on) depending on the available GPU resources and licensing. Assign a vGPU Profile: When configuring the VM, assign a vGPU profile that determines how much of the physical GPU’s resources to allocate to each VM. The profile options depend on the GPU model. Configure NVIDIA License: Ensure that the correct NVIDIA vGPU license is installed on the ESXi host. To install or update the vGPU license, use the vGPU Licensing Utility that comes with the NVIDIA vGPU package. The license is required for vGPU functionality to work properly, and it can be applied to the ESXi host over the command line. Verify vGPU is Enabled: After setting up the vGPU, verify that it is recognized correctly in the virtual machine. Log in to the VM and run the following command: nvidia-smi This should display the status of the virtual GPU, similar to how it would appear on a physical machine.

Relevant Products

Click on a version to see all relevant bugs

Affected versions:No known affected versions

Fixed versions: No known fixed versions

Relevant Products

Click on a version to see all relevant bugs

Affected versions:No known affected versions

Fixed versions: No known fixed versions

Top Dell Defects

Defect ID: 226452
PowerEdge: Enable Intel TDX on Dell 16G Intel Servers
Defect ID: 388642
Windows Server: Unable to delete partition due to page file
Defect ID: 284281
Rocky Linux: Error "unknown property" while trying to change network interface properties
Defect ID: 252982
PowerEdge: NVIDIA Driver Error: nvidia-smi has failed because it could not communicate with the NVIDIA driver
Defect ID: 51055
PowerFlex Multiple device errors on many hosts during increased use when using Toshiba KPM5XVUG1T92 and KPM5XRUG3T84 SSD

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

Dell - Defect ID: 252982

PowerEdge: NVIDIA Driver Error: nvidia-smi has failed because it could not communicate with the NVIDIA driver

Dell - Defect ID: 252982

PowerEdge: NVIDIA Driver Error: nvidia-smi has failed because it could not communicate with the NVIDIA driver

Last updated on November 8th, 2025

BugZero Risk Score0.0 Coming soon

Bug Details

Symptoms

Cause

Resolution

Top Dell Defects

Ready to prevent the next vendor outage?

Links

BugZero Risk Score
0.0 Coming soon