Django文件系统/基于文件的缓存在5-10%的时间内无法写入数据

我们正在使用Django Celery进行后台数据处理,获取一个CSV文件(最大15MB),将其转换为dict数据列表(还包括一些Django对象),并将其分解为大块,作为单独的任务进行处理:

@task
def main_task(data):
  i = 0
  for chunk in chunk_up(data):
    chunk_id = "chunk_id_{}".format(i)
    cache.set(chunk_id, chunk, timeout=FIVE_HOURS)
    sub_task.delay(chunk_id)
    i += 1

@task
def sub_task(chunk_id):
  data_chunk = cache.get(chunk_id)
  ... # do processing

main_task runs in concurrent processes in the background managed by Celery. We originally used Redis backend but found it would routinely run out of memory during peak load scenarios and high concurrency. So we switched to Django's filebased cache backend. Although that fixed the memory issue, we saw that 20-30% of the cache entries never got written. No error thrown, just silent failure. When we go back and look up the cache from CLI, we see that for e.g. chunk_id_7 and chunk_id_9 would exist, but chunk_id_8 would not. So intermittently, some cache entries are failing to get saved.

We swapped in diskcache backend and are observing the same thing, though cache failures seem to be reduced to 5-10% (very rough estimate).

We noticed that in past there where concurrent process issues with Django filebased cache, but it seems to have been fixed many years ago (we are on v1.11). One comment says that this cache backend is more of a POC, though again not sure if it's changed since then.

基于文件的缓存是否是生产质量的缓存解决方案?如果是,那是什么导致我们的写入失败?如果没有,那么对于我们的用例来说,有什么更好的解决方案?

评论