觉得fs.readFile`的结果很奇怪IO在NodeJS线程池中工作

I generated lots of files with the same content and 150M size. I use fs.readFile async API to read them like this:

const fs = require('fs');
const COUNT = 16;

for (let i = 1; i <= COUNT; ++i) {
    console.time(i);
    console.log(process.hrtime());
    fs.readFile(`a${i}`, (err, data) => {
        console.log(process.hrtime());
        console.timeEnd(i);
    });
}

I've set the ENV variable UV_THREADPOOL_SIZE to 1. And then changed the COUNT to 8, 16, even to 128. But the callback seems triggered almost the same time. For 128, the time is more than 4s.

我仅测试了1个文件,大约需要60毫秒的时间。此屏幕快照是8个文件的结果:

enter image description here

In my memory, the async fs.readFile API is handled by the thread pool. So I changed the pool size to 1.

在NodeJS事件循环中,轮询阶段将处理IO事件并为其执行回调。我忘记了轮询阶段将阻塞事件循环多长时间。但是我猜不到4s。

因此,对于上面的代码,我们想异步读取文件。它们同时启动,排队等待接听。由于民意调查大小为1,我想我们将一张一张地读取所有文件,对吗?如果读取了一个文件,则回调将在下一个轮询阶段执行(对于128个文件,时间超过4s,所以我猜将有一个下一个轮询阶段)。然后,我们将获得控制台中的时间。

但我不明白输出。似乎回调几乎同时触发。

我是否对事件循环中的轮询阶段或线程池有所了解?

更新:我知道我可以使用流优化读取大文件。但是问题是,当我将线程池设置为1时,异步API似乎可以并行运行。

评论
  • lest
    lest 回复

    使用util包并尝试“ var util = require('util');”

  • oalias
    oalias 回复

    150兆字节是大量数据,因此从磁盘或SSD传输到RAM需要花费时间。而且您的磁盘或SSD可能具有某种内部读取请求队列。当您请求多个接近同时的读取时,它们将进入该队列,并一个接一个地处理。

    Reads of large files are broken up into smaller block reads. It looks like node interleaves those block reads, so multiple readFile operations proceed roughly in parallel.

    In practice it's best to use streams to read files of that size. If you don't need all that data in RAM at once, streams are good because they fire 'data' events for each chunk of data, and 'close' events when done. See this https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options