在Pytorch中的GPU上并行训练和推理 - 码农俱乐部 - Golang中国

我有一个RL代理，我想同时使用我的gpu进行训练和推理。后来将cuda与torch.multiprocessing一起使用似乎效果不佳，并且在初始化其他进程时出现此错误：

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

但是，使用“ spawn”启动方法给我另一个问题，因为该方法尝试使用pickle序列化所有内容，这给我带来了zero_grad（）上的另一个神秘错误。我已经分离了重放缓冲区中的所有内容...

  File "env.py", line 60, in <module>
    main()
  File "env.py", line 55, in main
    process_two.start()
  File "/home/`/anaconda3/lib/python3.7/multiprocessing/process.py", line 112, in start
    self._popen = self._Popen(self)
  File "/home/zero/anaconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/home/zero/anaconda3/lib/python3.7/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/home/zero/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/home/zero/anaconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/home/zero/anaconda3/lib/python3.7/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/home/zero/anaconda3/lib/python3.7/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/home/zero/anaconda3/lib/python3.7/site-packages/torch/multiprocessing/reductions.py", line 136, in reduce_tensor
    raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries.  If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

所以，我的问题是，有什么方法可以在GPU上并行进行异步训练/推理设置吗？ torch.multiprocessing的默认启动方法似乎在我的多核CPU上运行良好。但我想使用我的GPU进行更优化的Matmul操作...