When I run this script I recieve an error message with: "sort: write failed: standard output: Broken pipe"
If someone can help me it would be awesome, I am going crazy with this error
the input file is a list of files that all contain DNA sequences in a FASTA format, so each file has several sequences (each sequence in a single line) with the format: in $1 (Identifier) in $2,3,4,5,6,7&8 (more values) in $9 (the DNA sequence)
Then I want select each of this sequences by number of sequences ($common_hits) in each file (this number is not a fix value but i set 6 for the example) -All the files with less than 6 sequences must be removed -Files with 6 sequences are ok -The files with more than 6 sequences have to be reduced to 6 sequences (these sequences are selected by the higher values of field $5)
the output files must have all 6 sequences and the sequence (field $9) has to be in the line after the identifier
I am not removing the originals files with more than 6 sequences for now, because I want to be sure it works
par_list=`ls -1 *BR`
common_hits="6"
for i in ${par_list}
do
if [ "`cat ${i} | wc -l`" -lt "${common_hits}" ]
then
rm -f ${i}
elif [ "`cat ${i} | wc -l`" -gt "${common_hits}" ]
then
cat ${i} | sort -nr -k 5 | head -n ${common_hits} | \
awk '{print $1" " $2" " $3" " $4" " $5" " $6" " $7" "$8 ; print $9}' > ${i}.ph
fi
done
sort | head
always reports an error, if head
exits (or otherwise closes its stdin) before sort
has written all its output (as will be the case, if the stream written by sort
is much longer than that consumed by head
). This is by-design: If sort
can't write all its output, it's expected to fail; if it ignored such failures, it would also ignore cases where it couldn't write its output for other reasons (disk full, broken network connection, etc.
There's nothing unusual or undesirable about this. If you don't care about the error, ignore it, and check the number of lines of output from the pipeline to determine whether you had an error condition instead.