如何使用两个单独的数据帧在熊猫中执行SumProduct()

 收藏

Objective: Create an output with a comparable SUMPRODUCT method within pandas

说明:我需要使用两个数据帧(df和df_2_copy)。我试图在将每张CD乘以各自在df中的价格(2000,3000,5000)后再添加1个月CD,3个月CD,6个月CD。

import pandas as pd

data = [['1-mo CDs', 1.0, 1,2000, '1, 2, 3, 4, 5, and 6'],
        ['3-mo CDs', 4.0 ,3 ,3000,'1 and 4'],
        ['6-mo CDs',9.0 ,6, 5000,'1']]
df = pd.DataFrame(data,columns=['Scenario','Yield', 'Term','Price', 'Purchase CDs in months'])
df

data_2 = [['Init Cash', 400000, 325000,335000,355000,275000,225000,240000],
          ['Matur CDs',0,0,0,0,0,0,0],
          ['Interest',0,0,0,0,0,0,0],
          ['1-mo CDs',0,0,0,0,0,0,0],
          ['3-mo CDs',0,0,0,0,0,0,0],
          ['6-mo CDs',0,0,0,0,0,0,0],
          ['Cash Uses',75000,-10000,-20000,80000,50000,-15000,60000],
          ['End Cash', 0,0,0,0,0,0,0]]

# set table
df_2 = pd.DataFrame(data_2,columns=['Month', 'Month 1', 'Month 2', 'Month 3', 'Month 4', 'Month 5', 'Month 6', 'End'])
df_2_copy = df_2.copy()

Ultimately, I would like to place the output of the SUMPRODUCT at the df_2_copy.iloc[7] location.

任何帮助,将不胜感激。

回复
  • et_id 回复

    您可以通过以下方式进行操作:

    Generate df3 - values from df_2 for particular months with Month column changed to index, for rows which have coresponding rows in df:

    df3 = df_2.drop(columns='End').set_index('Month')\
        .query('index in @df.Scenario')
    

    对于我的测试数据,当月份n的值更改时,它是:

              Month 1  Month 2  Month 3  Month 4  Month 5  Month 6
    Month                                                         
    1-mo CDs        1        2        0        2        2        0
    3-mo CDs        1        0        3        0        4        0
    6-mo CDs        1        1        0        2        0        0
    

    Then generate df4 - df with Scenario changed to index, limited to Price column, but still as a DataFrame:

    df4 = df.set_index('Scenario').Price.to_frame()
    

    结果是:

              Price
    Scenario       
    1-mo CDs   2000
    3-mo CDs   3000
    6-mo CDs   5000
    

    然后计算总和:

    sums = (df3.values * df4.values).sum(axis=0)
    

    结果是:

    [10000  9000  9000 14000 16000     0]
    

    最后一步是将这些数字写入目标位置:

    df_2.iloc[7, 1:7] = sums