我正在参加Kaggle挑战赛(M5预测准确性) 下面是一段代码,其中我感到困惑,因为pandas.groupby返回的ID不在数据帧中。
X_train.sort_values('date').head()
df_temp = X_train.loc[X_train['id']=='FOODS_3_752_WI_2', :]
df_temp.groupby('id').apply(lambda x: x.index)
id
FOODS_1_001_CA_1 Int64Index([], dtype='int64')
FOODS_1_001_CA_2 Int64Index([], dtype='int64')
FOODS_1_001_CA_3 Int64Index([], dtype='int64')
FOODS_1_001_CA_4 Int64Index([], dtype='int64')
FOODS_1_001_TX_1 Int64Index([], dtype='int64')
...
HOUSEHOLD_2_516_TX_2 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_TX_3 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_WI_1 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_WI_2 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_WI_3 Int64Index([], dtype='int64')
Length: 30490, dtype: object
但
df1 = pd.DataFrame({'a': ['asdf','zxcv'], 'b': [1,2]})
df1
a b
0 asdf 1
1 zxcv 2
df2 = df1.loc[df1['a']=='asdf', :]
np.array(df2.groupby('a').apply(lambda x: x.index))
给
array([Int64Index([0], dtype='int64')], dtype=object)
那为什么呢
df_temp.groupby('id').apply(lambda x: x.index)
给出所有ID?
注意:X_train也是另一个数据帧df_train val_test的切片数据帧。 但即使那样
df_temp2 = df_train_val_test.loc[df_train_val_test['id']=='FOODS_3_752_WI_2', :]
df_temp2.groupby('id').apply(lambda x: x.index)
id
FOODS_1_001_CA_1 Int64Index([], dtype='int64')
FOODS_1_001_CA_2 Int64Index([], dtype='int64')
FOODS_1_001_CA_3 Int64Index([], dtype='int64')
FOODS_1_001_CA_4 Int64Index([], dtype='int64')
FOODS_1_001_TX_1 Int64Index([], dtype='int64')
...
HOUSEHOLD_2_516_TX_2 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_TX_3 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_WI_1 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_WI_2 Int64Index([], dtype='int64')
HOUSEHOLD_2_516_WI_3 Int64Index([], dtype='int64')
Length: 30490, dtype: object