运行keras时RStudio崩溃

尝试运行keras R软件包的功能时,我的RStudio会话崩溃。我在窗口中收到一条错误消息“ R Session Aborted”,但无法恢复有关其原因的任何其他信息。

enter image description here

I cannot reproduce the error when running the same set of commands in an R session from the terminal. The following script runs through just fine (after activating a conda session in the terminal with conda activate r-reticulate; example taken from Bradley Boehmke's awesome book Hands-On Machine Learning in R):

## installing keras and tensorflow
library(keras)
reticulate::use_condaenv()
install_keras(method = "conda", conda = reticulate::conda_binary())

library(tensorflow)
reticulate::use_condaenv()
install_tensorflow(method = "conda", conda = reticulate::conda_binary())


# Helper packages
library(dplyr)         # for basic data wrangling

# Modeling packages
library(keras)         # for fitting DNNs
library(tfruns)        # for additional grid search & model training functions

# Modeling helper package - not necessary for reproducibility
library(tfestimators)  # provides grid search & model training interface

# Import MNIST training data
mnist <- dslabs::read_mnist()
mnist_x <- mnist$train$images
mnist_y <- mnist$train$labels

# Rename columns and standardize feature values
colnames(mnist_x) <- paste0("V", 1:ncol(mnist_x))
mnist_x <- mnist_x / 255

# One-hot encode response
mnist_y <- to_categorical(mnist_y, 10)

# Specify the model
model <- keras_model_sequential() %>%
  
  # Network architecture
  layer_dense(units = 128, activation = "relu", input_shape = ncol(mnist_x)) %>%
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 10, activation = "softmax") %>%
  
  # Backpropagation
  compile(
    loss = 'categorical_crossentropy',
    optimizer = optimizer_rmsprop(),
    metrics = c('accuracy')
  )

# ## Error:
# 2020-07-31 17:35:21.328581: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /cluster/apps/hdf5/1.8.13/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/python/2.7.14/x86_64/tpl/gdal2.4.1/lib:/cluster/apps/python/2.7.14/x86_64/tpl/geos3.7.2/lib:/cluster/apps/python/2.7.14/x86_64/tpl/proj5.0.0/lib:/cluster/apps/r/3.6.0_openblas/x86_64/lib64/R/lib:/cluster/apps/r/3.6.0_openblas/x86_64/lib64:/cluster/apps/curl/7.49.1/x86_64/lib:/cluster/apps/pcre/8.38/x86_64/lib:/cluster/apps/xz/5.2.2/x86_64/lib:/cluster/apps/zlib/1.2.8/x86_64/lib:/cluster/apps/bzip2/1.0.6/x86_64/lib:/cluster/apps/netcdf/4.3.2/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/java/1.7.0_51/x86_64/jre/lib/amd64/server:/cluster/apps/java/1.7.0_51/x86_64/jre/lib/amd64:/cluster/apps/java/1.7.0_51/x86_64/jre/lib/amd64/xawt:/cluster/apps/qt/5.5.0/x86_64/lib:/cluster/apps/boost/1.55.0/x86_64/serial/gcc_4.8.2/lib64:/cluster/apps/r/3.2.2_openblas/x86_64/lib64/R/lib:/cluster/apps/r/3.2.2_openblas/x86_64/lib64:/cluster/apps/openblas/0.2.13_seq/x86_64/gcc_4.8.2/lib:/cluster/apps/mesa/12.0.6/x86_64/lib:/cluster/apps/nco/4.4.8/x86_64/gcc_4.8.2/lib:/cluster/apps/udunits/2.2.18/x86_64/gcc_4.8.2/lib:/cluster/apps/gsl/1.16/x86_64/gcc_4.8.2/lib64:/cluster/apps/netcdf/4.3.1/x86_64/gcc_4.8.2/serial/fortran/lib:/cluster/apps/netcdf/4.3.1/x86_64/gcc_4.8.2/serial/cxx/lib:/cluster/apps/netcdf/4.3.1/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/hdf5/1.8.12/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/szip/2.1/x86_64/gcc_4.8.2/lib:/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib:/cluster/apps/gcc/4.8.2/lib64:/cluster/apps/centos/6.7/lib64:/cluster/apps/centos/6.7/usr/lib64::/cluster/apps/r/3.6.0_openblas/x86_64/lib64/R/lib:/lib:/cluster/apps/zlib/1.2.8/x86_64/lib:/cluster/apps/bzip2/1.0.6/x86_64/lib:/cluster/apps/xz/5.2.2/x86_64/lib:/cluster/apps/pcre/8.38/x86_64/lib:/cluster/apps/curl/7.49.1/x86_64/lib:/cluster/apps/hdf5/1.8.13/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/python/2.7.14/x86_64/tpl/gdal2.4.1/lib:/cluster/apps/python/2.7.14/x86_64/tpl/geos3.7.2/lib:/cluster/apps/python/2.7.14/x86_64/tpl/proj5.0.0/lib:/cluster/apps/r/3.6.0_openblas/x86_64/lib64/R/lib:/cluster/apps/r/3.6.0_openblas/x86_64/lib64:/cluster/apps/curl/7.49.1/x86_64/lib:/cluster/apps/pcre/8.38/x86_64/lib:/cluster/apps/xz/5.2.2/x86_64/lib:/cluster/apps/zlib/1.2.8/x86_64/lib:/cluster/apps/bzip2/1.0.6/x86_64/lib:/cluster/apps/netcdf/4.3.2/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/java/1.7.0_51/x86_64/jre/lib/amd64/server:/cluster/apps/java/1.7.0_51/x86_64/jre/lib/amd64:/cluster/apps/java/1.7.0_51/x86_64/jre/lib/amd64/xawt:/cluster/apps/qt/5.5.0/x86_64/lib:/cluster/apps/boost/1.55.0/x86_64/serial/gcc_4.8.2/lib64:/cluster/apps/r/3.2.2_openblas/x86_64/lib64/R/lib:/cluster/apps/r/3.2.2_openblas/x86_64/lib64:/cluster/apps/openblas/0.2.13_seq/x86_64/gcc_4.8.2/lib:/cluster/apps/mesa/12.0.6/x86_64/lib:/cluster/apps/nco/4.4.8/x86_64/gcc_4.8.2/lib:/cluster/apps/udunits/2.2.18/x86_64/gcc_4.8.2/lib:/cluster/apps/gsl/1.16/x86_64/gcc_4.8.2/lib64:/cluster/apps/netcdf/4.3.1/x86_64/gcc_4.8.2/serial/fortran/lib:/cluster/apps/netcdf/4.3.1/x86_64/gcc_4.8.2/serial/cxx/lib:/cluster/apps/netcdf/4.3.1/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/hdf5/1.8.12/x86_64/gcc_4.8.2/serial/lib:/cluster/apps/szip/2.1/x86_64/gcc_4.8.2/lib:/cluster/apps/lsf/10.1/linux2.6-glibc2.3-x86_64/lib:/cluster/apps/gcc/4.8.2/lib64:/cluster/apps/centos/6.7/lib64:/cluster/apps/centos/6.7/usr/lib64
# 2020-07-31 17:35:21.328626: E tensorflow/stream_executor/cuda/cuda_driver.cc:313] failed call to cuInit: UNKNOWN ERROR (303)
# 2020-07-31 17:35:21.328653: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (eu-login-22): /proc/driver/nvidia/version does not exist
# 2020-07-31 17:35:21.328909: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
# 2020-07-31 17:35:21.339286: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2900250000 Hz
# 2020-07-31 17:35:21.339480: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xe8825e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
# 2020-07-31 17:35:21.339497: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version

# Train the model
fit1 <- model %>%
  fit(
    x = mnist_x,
    y = mnist_y,
    epochs = 25,
    batch_size = 128,
    validation_split = 0.2,
    verbose = FALSE
  )

# Display output
fit1
## Trained on 48,000 samples, validated on 12,000 samples (batch_size=128, epochs=25)
## Final epoch (plot to see history):
## val_loss: 0.1512
##  val_acc: 0.9773
##     loss: 0.002308
##      acc: 0.9994
plot(fit1)

我的问题是:

  • 是什么导致RStudio在终端中的R会话没有崩溃时崩溃?
  • 如何从中止的R(Studio)会话中恢复错误消息?
  • 是否在终端Studio会话中未加载的RStudio中加载了其他任何可能与崩溃相关的环境变量?安装正确的Python环境之前,我遇到了问题,并且在安装keras时收到有关scipy版本问题的以下消息:
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts (full output pasted in at the bottom).

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

tensorflow 2.2.0 requires scipy==1.4.1; python_version >= "3", but you'll have scipy 1.5.2 which is incompatible.

谢谢你的帮助。