I have a file called foo.txt
with a 1B rows.
我想对产生10个新行的每一行应用一个操作。预计输出约为10B行。
To increase the speed and the IO, foo.txt
is on DiskA and bar.txt
DiskB (different drives - physically speaking).
DiskB将成为限制因素。因为有很多行要写入,所以在写入DiskB时添加了一个大缓冲区。
我的问题是:当我在diskB上调用flush()时,文件处理程序的缓冲区会将其刷新到硬盘驱动器。自从命令返回以来,这似乎是一个非阻塞调用,但是我仍然可以看到磁盘正在写入并且其繁忙指示器为100%。几秒钟后,指示器恢复为0%。 python中有没有办法等待磁盘完成?理想情况下,我希望flush()是一个阻塞调用。我现在看到的唯一解决方案是添加任意sleep()并希望磁盘已准备好。
Here's a snippet to show visually (It's a bit more complicated in practice as bar.txt
is not just one file but thousands of files so the IO efficiency is very poor):
with open('bar.txt', 'w', buffering=100 * io.DEFAULT_BUFFER_SIZE) as w:
with open('foo.txt') as r:
for line in r:
# writes each line of foo 10 times in bar.
for i in range(10):
w.write(line)
# w.flush()