Using the following data:
$cat p1.csv
R,3
R,4
S,1
S,2
S,3
R,2
T,4
R,3
ST,4
RST,2
RSTR,4
First sort based on column 2:
$cat p1.csv | sort -t "," -k2
S,1
R,2
RST,2
S,2
R,3
R,3
S,3
R,4
RSTR,4
ST,4
T,4
I want to count the number of 1's, 2's, 3's, and 4's from column two. Something like $cat p1.csv | sort -t "," -k2 | uniq -f2 -c .....Can uniq be applied to one column? The -f2 doesn't properly apply uniq to the correct field. The output should take the first instance of a unique value in column two and count the number of them. Hence, the data must first be sorted on column two. A correct output would look like:
1 S,1
3 R,2
3 R,3
4 R,4
Suggestions?
Your question isn't quite clear so I just reverse engineered your output to your input (assuming there is a typo in your output since you mention to count number of 1's, 2's and 3's from column 2 and show 2 R,2
). You'll probably need to explain your question a little better -
sort -t "," -k2 < p1.csv |
awk -F, '!z[$2]++{ a[$2]=$0; } END {for (i in a) print z[i], a[i]}' |
sort -k1
- !z[$2]++ removes the duplicates based on column 2 as awk progresses thru
each line.
- a[$2]=$0 stores the non-duplicates lines in an array
- END {..} looks at all the keys in array and pulls up values. For array a
it pulls up the first line it sees with unique column 2 (as your desired
output). For array z it pulls up number of lines seen with same column 2.
[jaypal:~/temp] cat file
R,3
R,4
S,1
S,2
S,3
R,2
T,4
R,3
ST,4
RST,2
RSTR,4
[jaypal:~/temp] sort -t "," -k2 < t |
awk -F, '!z[$2]++{ a[$2]=$0; } END {for (i in a) print z[i], a[i]}' |
sort -k1
1 S,1
3 R,2
3 R,3
4 R,4
For finding unique entries based on column, you can try sort with -u option (but it won't give you the counts though).
From the man
page:
-u, --unique
with -c, check for strict ordering;
without -c, output only the first of an equal run
You can try something like this -
sort -t, -k2 p1.csv | sort -u -t, -k2
I am not sure Uniq can be performed on a column separated by a delimiter other than a blank. Atleast on my mac it doesn't. Here is the man page reference
-f num Ignore the first num fields in each input line when doing comparisons.
A field is a string of non-blank characters separated
from adjacent fields by blanks. Field numbers are one based,
i.e., the first field is field one.
So if you can remove the ,
delimiter and run the following command you should get your desired result.
sort -k2 test | uniq -c -f1
[jaypal:~/temp] cat test
R 3
R 4
S 1
S 2
S 3
R 2
T 4
R 3
ST 4
RST 2
RSTR 4
[jaypal:~/temp] sort -k2 test | uniq -c -f1
1 S 1
3 R 2
3 R 3
4 R 4