This is just another quick and dirty installment in the Little Things Doth Crabby Make series. Consider the man page for the colrm(1) command:
That looks pretty straightforward to me. If, for example, I have a 6-column text file and I only want to ingest from, say, columns 1 through 3, I should be able to execute colrm(1) with a single argument: 4. I’m not finding the colrm(1) command to work in accordance with my reading of the man page so that qualifies as a little thing that doth crabby make.
Consider the following screenshot showing a simple 6-column text file. To make sure there are no unprintable characters that might somehow interfere with colrm(1) functionality I also listed the contents with od(1):
Next, I executed a series of colrm(1) commands in an attempt to see which columns get plucked from the file based on different single-argument invocations:
Would that make anyone else crabby? The behavior appears to me very indeterminate to me and that makes me crabby.
Thoughts? Leave a comment!
Hiya Kevin,
There is certainly something to be crabby about here…
What this depends on is your interpretation of a column. With no other info that what is presented in the man page above, it is reasonable to think of a column as whitespace delimited.
colrm, on the other hand, defines a column as a character within a line. This becomes more obvious if you remove the white space from your test file as shown below:
tydnar> cat cols.txt
1.3.5.7.9
.2.4.6.8.
tydnar> colrm 1 colrm 2 colrm 3 colrm 5 colrm 2 7 < cols.txt
1.9
.8.
Looking at the man page on my Mac however, I see the following:
– – – – – – – – – – – –
COLRM(1) BSD General Commands Manual
NAME
colrm — remove columns from a file
SYNOPSIS
colrm [start [stop]]
DESCRIPTION
The colrm utility removes selected columns from the lines of a file. A
column is defined as a single character in a line. Input is read from
the standard input. Output is written to the standard output.
– – – – – – – – – – – –
Something to be crabby about indeed ;^)
randy t
Ah ha! The man page for colrm(1) begat of the Linux util-linux version of this (originally GNU) command says nothing of what a column is. Your man page output from Mac (BSD) is interesting.
I’ll go out on a limb here. Considering Linux is a derivation of Unix and Unix was invented to process text (that was the original, sole purpose for Unix) then a column is characters between $IFS and $IFS is, by default, white space.
Thoughts?
Hello Kevin
Manpage from Ubuntu is a little more detailed, but produces the same effect, what first comes to mind is that they should be “excel-like columns”, a list of values separated by something (spaces or special characters) in multiple lines.
But after executing some tests “colrm” really is the oposite of “cut” command and works with characters…
The “columns” word on the manpage really creates confusion…
COLRM(1) BSD General Commands Manual COLRM(1)
NAME
colrm — remove columns from a file
SYNOPSIS
colrm [start [stop]]
DESCRIPTION
The colrm utility removes selected columns from the lines of a file. A column is defined as a single character in a line. Input is read from the standard input. Output is written to the standard output.
If only the start column is specified, columns numbered less than the start column will be written. If both start and stop columns are specified, columns numbered less than the start column or greater than the stop column will be written. Column numbering starts with one, not zero.
Tab characters increment the column count to the next multiple of eight. Backspace characters decrement the column count by one.
ENVIRONMENT
The LANG, LC_ALL and LC_CTYPE environment variables affect the execution of colrm as described in environ(7).
Regards,
Rafael
Thanks for stopping by, Rafael.