我有用ASCII字符æ(十六进制E6)分隔的大数据文件。我的解析文件的代码段如下,但是解析器似乎无法正确分割值(我使用Spark 2.4.1)
implicit class DataFrameReadImplicits (dataFrameReader: DataFrameReader) {
def readTeradataCSV(schema: StructType, path: String) : DataFrame = {
dataFrameReader.option("delimiter", "\u00E6")
.option("header", "false")
.option("inferSchema", "false")
.option("multiLine","true")
.option("encoding", "UTF-8")
.schema(schema)
.csv(path)
}
}
任何提示如何解决此问题?
Based on your sample data from the screenshot, Delimiter is multi character i.e
"æ"
java.lang.IllegalArgumentException: Delimiter cannot be more than one character: "æ"
Multi character delimiter is not allowed to specify in option -
option("delimiter",""""\u00E6"""")
Please check below code two step process to
parse
data.