使用爆炸功能时如何在Spark数据框中重命名列

我的代码使用爆炸如下所示的列的数组zip-

 df_map_transformation.select(col("_name") , explode(arrays_zip(col("instances.Instance._name"), col("instances.Instance._id") ))).select(col("_name"), col("col.*")).printSchema()

输出-

root
 |-- _name: string (nullable = true)
 |-- 0: string (nullable = true)
 |-- 1: string (nullable = true)

当我尝试选择“ _name”列时,我可以这样做-

df_map_transformation.select(col("_name") , explode(arrays_zip(col("instances.Instance._name"), col("instances.Instance._id") ))).select(col("_name"), col("col.*")).select(col("_name")).show(50,False)

但是,在尝试访问“ 0”或“ 1”列时,这是行不通的- 错误-

  File "/usr/local/spark/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o1614.showString.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _gen_alias_696#696

有什么方法可以重命名列“ 0”和“ 1”,或通过在数据帧中进行选择来提取它们?

评论
  • 月牙儿
    月牙儿 回复

    Try cast to col column to struct<cola:string,colb:string>. You can choose your own column names inside struct, for example I have taken cola & colb

    检查以下代码。

    df_map_transformation.select(col("_name") , explode(arrays_zip(col("instances.Instance._name"), col("instances.Instance._id") ))).select(col("_name"), col("col").cast("struct<cola:string,colb:string>")).select(col("_name"),col("col.cola"),col("col.colb")).printSchema()
    
    root
     |-- _name: string (nullable = true)
     |-- cola: string (nullable = true)
     |-- colb: string (nullable = true)
    
    

    Also you can use withColumnRenamed

    df_map_transformation.select(col("_name") ,explode(
     arrays_zip(
      col("instances.Instance._name"), 
      col("instances.Instance._id") )
     )
    ).select(col("_name"), col("col.*"))
    .withColumnRenamed("0","cola")
    .withColumnRenamed("1","colb")