函数与整数值比较时产生火花

def takeArray(c: Column): Column = {

      when(size(c) === lit(4), c.getItem(2)).
        when(size(c) === lit(2), c.getItem(1))
      when(size(c) === lit(0), lit(0)).otherwise(lit(-100))

    }
    df.withColumn("new_col", takeArray(col("id")))
      .select( col("id"),size(col("id")), col("new_col")).show()

函数takeArray-基于数组的长度,它将获取索引并返回值。

+------------+--------+-------+
|          id|size(id)|new_col|
+------------+--------+-------+
|[1, 2, 3, 4]|       4|   -100|
|      [3, 4]|       2|   -100|
|          []|       0|      0|
+------------+--------+-------+

更新:

添加架构

root
 |-- id: array (nullable = false)
 |    |-- element: integer (containsNull = false)

我超越了输出,这是错误的。第一行id列大小为4,它应与first when子句匹配,并应返回“ 2” ..但它返回-100。任何的想法 ?为什么我得到奇怪的答案?

评论
  • lquia
    lquia 回复

    请检查以下代码。

    scala> val df = Seq(Seq(1,2,3,4),Seq(3,4),Seq()).toDF("id")
    df: org.apache.spark.sql.DataFrame = [id: array<int>]
    
    scala> :paste
    // Entering paste mode (ctrl-D to finish)
    
    df
    .withColumn("length",size($"id")) // Just to check Array Length.
    .orderBy($"length".desc)
    .withColumn("new_col",when(size($"id") === 4,$"id".getItem(2))
                          .when(size($"id") === 2,$"id".getItem(1))
                          .when(size($"id") === 0,lit(0)).otherwise(lit(100))
                          ).show(false)
    
    // Exiting paste mode, now interpreting.
    
    +------------+------+-------+
    |id          |length|new_col|
    +------------+------+-------+
    |[1, 2, 3, 4]|4     |3      |
    |[3, 4]      |2     |4      |
    |[]          |0     |0      |
    +------------+------+-------+