我们正在使用Django Celery进行后台数据处理,获取一个CSV文件(最大15MB),将其转换为dict数据列表(还包括一些Django对象),并将其分解为大块,作为单独的任务进行处理:

def main_task(data):
  i = 0
  for chunk in chunk_up(data):
    chunk_id = "chunk_id_{}".format(i)
    cache.set(chunk_id, chunk, timeout=FIVE_HOURS)
    i += 1

def sub_task(chunk_id):
  data_chunk = cache.get(chunk_id)
  ... # do processing

main_task runs in concurrent processes in the background managed by Celery. We originally used Redis backend but found it would routinely run out of memory during peak load scenarios and high concurrency. So we switched to Django's filebased cache backend. Although that fixed the memory issue, we saw that 20-30% of the cache entries never got written. No error thrown, just silent failure. When we go back and look up the cache from CLI, we see that for e.g. chunk_id_7 and chunk_id_9 would exist, but chunk_id_8 would not. So intermittently, some cache entries are failing to get saved.

We swapped in diskcache backend and are observing the same thing, though cache failures seem to be reduced to 5-10% (very rough estimate).

We noticed that in past there where concurrent process issues with Django filebased cache, but it seems to have been fixed many years ago (we are on v1.11). One comment says that this cache backend is more of a POC, though again not sure if it's changed since then.