Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[spark-rapids] Update spark rapids version to 24.06.0 #1187

Merged
merged 2 commits into from
Jun 27, 2024

Conversation

SurajAralihalli
Copy link
Contributor

@SurajAralihalli SurajAralihalli commented Jun 7, 2024

This PR updates

  • Fixes rocky 8,9 driver install steps
  • Update RAPIDS Accelerator to 24.06.0

signed-off-by: Suraj Aralihalli suraj.ara16@gmail.com

Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
Signed-off-by: Suraj Aralihalli <suraj.ara16@gmail.com>
@SurajAralihalli
Copy link
Contributor Author

@cjac
Copy link
Contributor

cjac commented Jun 27, 2024

have you tested this on ubuntu or debian?

@cjac
Copy link
Contributor

cjac commented Jun 27, 2024

on 2.0-debian10 I receive this error when I pass cuda-version="12.4":

Passing it as 12.4.1 or leaving it unset works, though, so this is not a blocker. If symlinks could be put in place on the nvidia filesystem the installer would be more robust.

cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.0
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 10 (buster)
Release:        10
Codename:       buster
Thu Jun 27 18:30:03 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   65C    P0             34W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
@cjac
Copy link
Contributor

cjac commented Jun 27, 2024

2.1-debian11 also works:

cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.1
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye
Thu Jun 27 18:41:26 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   64C    P0             34W /   72W |       0MiB /  23034MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
@cjac
Copy link
Contributor

cjac commented Jun 27, 2024

2.20-debian12 also LGTM.

cjac@cluster-1718310842-w-0:~$ echo $DATAPROC_IMAGE_VERSION ; mokutil --sb-state ; lsb_release -a ; nvidia-smi
2.2
SecureBoot disabled
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 12 (bookworm)
Release:        12
Codename:       bookworm
Thu Jun 27 18:49:48 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L4                      Off |   00000000:00:03.0 Off |                    0 |
| N/A   65C    P0             31W /   72W |       0MiB /  23034MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
@cjac cjac self-requested a review June 27, 2024 18:50
Copy link
Contributor

@cjac cjac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look good to me. I've also tested on Debian.

@cjac cjac merged commit f28f0c9 into GoogleCloudDataproc:master Jun 27, 2024
1 of 2 checks passed
@SurajAralihalli
Copy link
Contributor Author

SurajAralihalli commented Jun 27, 2024

Thank you! We generally recommend the user not to set cuda-version or driver-version in the metadata in our docs. Different cuda versions may have different steps of installation and supporting all the versions in the init script may not be feasible. However, I've left this feature available for advanced users who can test and run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants