linux-如何在字符串字段中使用多个逗号格式化.CSV文件的日期字段

提问

我有一个.CSV文件(file.csv),其数据都用双引号引起来.该文件的示例格式如下:

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","13-OCT-11","232"

第9个字段是日期字段,格式为“ DD-MMM-YY”.我必须将其转换为YYYY / MM / DD格式.我正在尝试使用以下代码,但没有用.

awk -F, '
 BEGIN {
 split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
 for (i=1; i<=12; i++) mdigit[month[i]]=i
 }
 { m=substr($9,4,3)
 $9 = sprintf("%02d/%02d/"20"%02d",mdigit[m],substr($9,1,2),substr($9,8,20))
 print
 }' OFS="," file.csv > temp_file.csv

执行上面的代码后,文件temp_file.csv的输出如下所示.

column1,column2,column3,column4,column5,column6,column7,Column8,00/00/2000,Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1,00/00/2000,"890","88","11-OCT-11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455",00/00/2002, name","12","455","12-OCT-11","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3,00/00/2000,"333","22","13-OCT-11","232"

据我了解,问题在于双引号中的逗号,因为我的代码也考虑了它们.请在以下问题上提出建议:

1)在所有字段中双引号所有值是否有任何区别?如果它们有什么区别,我如何从所有值中除去它们(除了带逗号的字符串)?
2)对我的代码进行任何修改,以便我可以将第9个字段的格式设置为“ DD-MMM-YYYY”,格式为YYYY / MM / DD

最佳答案

您可以尝试以下单线:

awk '
BEGIN {
    FS = OFS = ","
    split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, / /)
    for (i=1; i<=12; i++) {
        mm[month[i]]=i
    }
}
NR>1 { 
    gsub(/\"/, "", $(NF-1))
    split($(NF-1), d, /-/)
    $(NF-1)=q "20" d[3] "/" mm[d[2]] "/" d[1] q}1' q='"' file

输出:

column1,column2,column3,column4,column5,column6, column7, Column8, Column9, Column10
"12","B000QRIGJ4","4432","string with quotes, and with a comma, and colon: in between","4432","author1, name","890","88","2011/10/11","12"
"4432","B000QRIGJ4","890","another, string with quotes, and with more than, two commas: in between","455","author2, name","12","455","2011/10/12","55"
"11","B000QRIGJ4","77","string with, commas and (paranthesis) and : colans, in between","12","author3, name","333","22","2011/10/13","232"
评论