我的代码使用爆炸如下所示的列的数组zip-
df_map_transformation.select(col("_name") , explode(arrays_zip(col("instances.Instance._name"), col("instances.Instance._id") ))).select(col("_name"), col("col.*")).printSchema()
输出-
root
|-- _name: string (nullable = true)
|-- 0: string (nullable = true)
|-- 1: string (nullable = true)
当我尝试选择“ _name”列时,我可以这样做-
df_map_transformation.select(col("_name") , explode(arrays_zip(col("instances.Instance._name"), col("instances.Instance._id") ))).select(col("_name"), col("col.*")).select(col("_name")).show(50,False)
但是,在尝试访问“ 0”或“ 1”列时,这是行不通的- 错误-
File "/usr/local/spark/spark/python/lib/py4j-0.10.8.1-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o1614.showString.
: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _gen_alias_696#696
有什么方法可以重命名列“ 0”和“ 1”,或通过在数据帧中进行选择来提取它们?
Try cast to
col
column tostruct<cola:string,colb:string>
. You can choose your own column names inside struct, for example I have takencola & colb
检查以下代码。
Also you can use
withColumnRenamed