选择结构火花数组

这个问题是我正在处理的复杂问题的一部分。为了最小化问题陈述,假设我有一个从json创建的数据框。假设最小化结构

The schema / printSchema() has output-

root
 |-- person: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- name: string (nullable = true)
 |    |    |-- email: string (nullable = true)
 |    |

原始数据可以说有点像

{"person":[{"name":"david", "email", "david@gmail.com"}, {"name":"steve", "email", "steve@gmail.com"}]}

您可以将其另存为person.json并创建数据集为

Dataset<Row> df =  spark.read.json("person.json")

df.show(false);

+------------------------------------------------------------+
|       person                                               |
+------------------------------------------------------------+
|[[david, david@gmail.com],[steve, steve@gmail.com]]         |
+------------------------------------------------------------+

现在的问题。作为代码的一部分,我必须做

df.select(array(struct(person.name, reverse(person.email)))

它给输出像

+------------------------------------------------------------+
|       array(named_struct(person.name as `name`, person.e...|
+------------------------------------------------------------+
|[[[david, steve],[david@gmail.com, steve@gmail.com]]]       |
+------------------------------------------------------------+

模式获取已更新为-

root
 |-- array(named_struct(name, person.name as `name`, email, person.email as `email`)): array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |--  name: array(nullable=true)
 |    |    |-- element: string (containsNull = true)
 |    |--  email: array(nullable=true)
 |    |    |-- element: string (containsNull = true)

我不希望更改架构和数据。我应该在df.select中更改什么