ML Workloads on WSL2 with RTX 4060¶
PyTorch¶
Verify:
python3 -c "
import torch
print('CUDA available:', torch.cuda.is_available())
print('Device:', torch.cuda.get_device_name(0))
print('VRAM:', torch.cuda.get_device_properties(0).total_mem / 1024**3, 'GB')
"
# Expected: CUDA available: True
# Device: NVIDIA GeForce RTX 4060
# VRAM: ~16.0 GB
TensorFlow¶
Verify:
python3 -c "
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
"
# Expected: [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Docker + NVIDIA Container Toolkit¶
Install Docker Engine¶
curl -fsSL https://get.docker.com | sh
sudo usermod -aG docker $USER
# Log out and back in for group membership
Install NVIDIA Container Toolkit¶
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \
| sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \
| sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \
| sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
Configure Docker Runtime¶
Verify¶
Limitation¶
Only --gpus all is supported in WSL2. You cannot select specific GPUs by
index (e.g., --gpus '"device=0"'). Not relevant for the single RTX 4060.
RTX 4060 16GB -- ML Capacity¶
| Workload | Fits in 16GB VRAM? | Notes |
|---|---|---|
| 7B models (LLaMA 2, Mistral) at FP16 | Yes | ~14 GB, comfortable |
| 13B models at 8-bit (GPTQ/AWQ) | Yes | ~13 GB quantized |
| 13B models at 4-bit | Yes | ~7 GB, room to spare |
| LoRA/QLoRA fine-tuning of 7B | Yes | Practical for research |
| Full fine-tuning of 7B | Tight | May need gradient checkpointing |
| Training from scratch >7B | No | Need multi-GPU or larger card |
Performance Notes¶
- Near-native for compute-bound workloads (matrix ops, training loops).
- ~5-33% overhead vs bare-metal Linux, depending on workload type.
- I/O-bound workloads (data loading) see more overhead due to WSL2 filesystem.
- Store training data on the Linux filesystem (
/home/...), not on/mnt/c/-- the Windows filesystem mount is significantly slower.