我有一个带有以下列的cuDF数据框:
columns = ["col1", "col2", "dt"]
The (dt
) in the form of datetime64[ns]
.
I would like to write a UDF to apply to each group in this dataframe, and get max of dt
for each group. Here is what I am trying, but seems like numba
doesn't support the datetime64[ns]
values in UDFs.
def f1(dt, out):
l = len(dt)
maxvalue = dt[0]
for i in range(cuda.threadIdx.x, l, cuda.blockDim.x):
if dt[i] > maxvalue:
maxvalue = dt[i]
out[:0] = maxvalue
gdf = df.groupby(["col1", "col2"], method="cudf")
df = gdf.apply_grouped(f1, incols={"dt": "dt"}, outcols=dict(out=numpy.datetime64))
这是我得到的错误:
This error is usually caused by passing an argument of a type that is unsupported by the named function.
[1] During: resolving callee type: Function(<numba.cuda.compiler.DeviceFunctionTemplate object at 0x7effda063510>)
[2] During: typing of call at <string> (10)
我有类似的功能,可以很好地处理整数和浮点数。这是否意味着numba不支持日期时间?
Apply_groups
won't give you what I think you're after, which is groupby on maxdt
. You needed to useaggs
with max ondt
. cudf's groupby functions would have done the rest. To get your values indatetime64[ms]
, you useastype()
, and save it back to the dataframe (very fast). See my example:dt
column values would be formatted between 1-4 seconds from Jan 1st, 1970, giving you a print out of