I am working with a unix shell script that does genome construction then creates a phylogeny. Depending on the genome assembler you use, the final output (the phylogeny) may change. I wish to compare the effects of using various genome assemblers. I have developed some metrics to compare them on, but I need help organizing them so I can run useful analyses. I would like to import my data into excel in columns.
This is the script I am using to output data:
echo "Enter the size (Mb or Gb) of your data set:"
read SIZEOFDATASET
echo "The size of your data set is $SIZEOFDATASET"
echo "Size of Data Set:" >> metrics_file.txt
echo $SIZEOFDATASET >> metrics_file.txt
echo "Enter the name of your assembler"
read NAMEOFASSEMBLER
echo "You are using $NAMEOFASSEMBLER as your assembler"
echo "Name of Assembler:" >> metrics_file.txt
echo "$NAMEOFASSEMBLER" >> metrics_file.txt
echo "Time:" >> metrics_file.txt
The output comes out like this currently:
Size of Data Set:
387 Mb
Name of Assembler:
Velvet
Genome Size:
1745690
Time:
I want it to look something like this:
Thanks in Advance!
#!/bin/sh
in_file=in.txt # Input file
params=3 # Parameters count
res_file=$(mktemp) # Temporary file
sep=' ' # Separator character
# Print header
cnt=0
for i in $(cat $in_file | head -$((params*2))); do
if [ $((cnt % 2)) -eq 0 ]; then
echo $i
fi
cnt=$((cnt+1))
done | sed ":a;N;\$!ba;s/\n/$sep/g" >>$res_file
# Parse and print values
cnt=0
for i in $(cat $in_file); do
# Print values, skip param names
if [ $((cnt % 2)) -eq 1 ]; then
echo -n $i >>$res_file
fi
if [ $(((cnt+1) % (params*2))) -eq 0 ]; then
# Values line is finished, print newline
echo >>$res_file
elif [ $((cnt % 2)) -eq 1 ]; then
# More values expected to be printed on this line
echo -n "$sep" >>$res_file
fi
cnt=$((cnt+1))
done
# Make nice table format
cat $res_file | column -t
rm -f $res_file
This scripts assumes that:
Most of the code is just parsing of your input data format. Actual column formatting is done by column
tool.
If you want to export this table to excel, just change sep variable to ','
and save result output to .csv file. This file can be easily imported in excel application.
Input file:
Size
387
Name
Velvet
Time
13
Size
31415
Name
Minia
Time
18
Size
31337
Name
ABCDEF
Time
42
Script output:
Size Name Time
387 Velvet 13
31415 Minia 18
31337 ABCDEF 42