试图获取文件中每一行的字节和字符数

I am using awk (symlinked to gawk on my machine) to read through a file and get a character count per line to test if a file is fixed width. I can then re-use the following script with the -b --characters-as-bytes option to see if the file is fixed width by byte.

#!/usr/bin/awk -f

BEGIN {
    width = -1;
}

{
    len = length($0);

    if (width == -1) {
        width = len;
    } else if (len != 0 && len != width) {
        exit 1;
    }
}

I want to do something similar to test whether each line in a file has the same amount of bytes and characters to assume all characters are a single byte (I do realize this is subject false negatives). The challenge is that I would like to run through the file one time and break out at first mismatch. Is there a way to set the -b option from within an awk script similar to how you can adjust FS. If this isn't possible, I'm open to options outside of awk. I can always just write this in C if I have to, but I wanted to make sure there isn't something already available.

效率是我的目标。拥有此信息将帮助我跳过代价高昂的过程,因此我本身并不认为代价高昂。我正在处理的文件可能超过1亿行。

澄清度

我想要上面的东西。像这样

#!/usr/bin/awk -f
{
    if (length($0) != bytelength($0)
        exit 1;
}

I don't need any output. I will just trigger off the return code ($? in bash). So exit 1 if this fails. Obviously bytelength is not a function. I'm just looking for a way to achieve this without running awk twice.

评论