如何使用AWK解析列和行的子字符串

 收藏
C1      C2      C3      C1%     C2%     C3%                                                         
1       1       220     0.00    0.00    99.77                                                         
0       0       69      0.00    0.00    99.78                                                         
1       0       48      0.01    0.00    99.80                                                         
0       0       50      0.00    0.00    99.80                                                         
0       1       53      0.00    0.01    99.71                                                         
C1      C2      C3      C1%     C2%     C3%                                                           
1       2       229     0.00    2.32    97.46                                                         
1       0       71      0.00    0.00    99.77                                                         
0       2       52      0.00    9.29    90.52                                                         
0       0       50      0.00    0.00    99.81                                                         
0       0       56      0.00    0.00    99.74                                                         
C1      C2      C3      C1%     C2%     C3%                                                           
0       0       237     0.00    0.00    99.77                                                         
0       0       75      0.00    0.00    99.75                                                                                                                                                            
0       0       58      0.00    0.00    99.79                                                                                                                                                            
0       0       51      0.00    0.00    99.80                                                                                                                                                          
0       0       53      0.00    0.00    99.73  

我想提取在列标题中具有“%”的列。这些列的数量是可变的,我想提取该列中的所有行。另外,我想从其余所有列中提取所有标题行的标题行之后的第一行的值(“ C1%C2%C3%....”)。标题行之间的行数也是可变的。在这种情况下,最终输出将是一个包含三列的文件,因为存在三列带有百分号的列。因为标题行只有三个实例,所以有三行。

0.00    0.00    99.77
0.00    2.32    97.46
0.00    0.00    99.77
回复
  • USA 回复

    This assumes that only header lines will contain a % sign.

    # If printing is turned on, print the value for each column discovered to
    # contain % and set the flag to turn off printing
    f { sep="";
        for (n=1; n<=NF; ++n)
            if (n in a) { printf "%s%0.2f", sep, $n; sep = OFS };
        print "";
        f=0}
    
    # This is a header line. Create an array with indices corresponding to
    # the desire column numbers and set the flag to turn on printing
    /%/ { delete a;
          for (n=1; n<=NF; ++n)
            if ($n ~ /%/) a[n];
          f=1}
    

    输出:

    $ awk -f a.awk file
    0.00 0.00 99.77
    0.00 2.32 97.46
    0.00 0.00 99.77
    

    如果希望输出使用制表符分隔符:

    $ awk -v OFS='\t' -f a.awk file
    0.00    0.00    99.77
    0.00    2.32    97.46
    0.00    0.00    99.77