awk-wordness simplify and speed up your code with awk

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

awk-wordness simplify and speed up your code with awk

#1 Post by technosaurus »

awk is a powerful tool that is often overlooked for more familiar tools like sh,sed,grep,wc,head,tail,cat,cut,...
Quite often we can drastically speed up a command by using awk built-ins (it takes time to call external programs). I will try to show the major ones here.

grep STRING FILE ==> awk '/STRING/{print}' FILE
grep -v STRING FILE ==> awk '!/STRING/{print}' FILE

cut -d "\t" -f2,3 FILE ==> awk 'FS="\t" {print $2 $3}'

sed 's/string/newstring/' ==> awk 'sub(/string/,"newstring");print}' FILE
sed 's/string/newstring/g' ==>awk 'gsub(/string/,"newstring");print}' FILE

cat FILE ==> awk '{print}' FILE

That's all the time I have right now, so I will add a ...
TODO
show how to execute an external program using builtin - system() command
show how to do head/tail-like operations
show how to do math operations
show how to store variables and arrays
show how to do various loops
show how to do other stuff I am forgetting
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#2 Post by technosaurus »

here is an example that is simple in awk, but would be pretty complex without it

example: targress file.tar.{gz,bz2,xz,...}

Code: Select all

[ -f "$1" ] && tar -tvf "$1" >/tmp/tarfiles || exit
tar -xvf "$1" |awk '{
if ( $3 != "" ){
	size[$6]=$3
	tot+=$3
}else{
	subtot+=size[$1]
	printf "%d\n", 100 * subtot / tot
}}' /tmp/tarfiles -
explanation:
[ -f "$1" ] && tar -tvf "$1" >/tmp/tarfiles || exit
if the input is a file, get a long listing containing all of the file sizes (or exit)

tar -xvf "$1" |awk '{ ... }' /tmp/tarfiles -
go through the file /tmp/tarfiles first and then - (stdin from the tar -xvf "$1", which lists the files as they are decompressed)

if ( $3 != "" ){
size[$6]=$3
tot+=$3

if there is a size field (only in /tmp/tarfiles), then add an associative array with the file name ($6th field) as the name with the size ($3rd field) as the value, then increment the total by that amount

}else{
subtot+=size[$1]
printf "%d\n", 100 * subtot / tot

since there is no field 3, we are processing the verbose output from decompressing the tarball (each filename as it is decompressed) We use that as the name of the associative array and add its value to the subtotal, then use that subtotal to print the percentage

you can make this into a standalone script for use by yad or {,X,gtk}dialog by adding a #!/bin/sh to the top or as a function like
targress(){
#code here
}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#3 Post by disciple »

It would be good if you could add a one-sentence explanation of the purpose of that targress script :)
Quite often we can drastically speed up a command by using awk built-ins (it takes time to call external programs)
So you're not saying for example that an awk script is faster than a sed script, just that a single awk script is faster than a script which calls multiple tools e.g. both awk and sed. And that awk is a particularly powerful tool, so there is a good chance you can do the whole task with it. Is that right?
What about the issue that many of those tools are built in to busybox? Is awk still faster?
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
sunburnt
Posts: 5090
Joined: Wed 08 Jun 2005, 23:11
Location: Arizona, U.S.A.

#4 Post by sunburnt »

technosaurus brought this up a while back when I was writing a script library.
Awk was always faster than mulitple piped commands.

Plus you can specify awk at the start of the script to save an additional call.
The whole script needs to be written in awk this way, but it`s fast.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#5 Post by technosaurus »

the targress code just outputs the percentage that a tarball has been extracted ... good for use in a package installer to show the user how far it has to go ... for example
targress tarball.tar | Xdialog --progress "message here" 0 0 -

using busybox is somewhat closer if you use "prefer applets" (most distros do not ... nor does Barry, due to compatibility issues) but it still has to fork /proc/self/exe for many and extra streams and pipes have to be set up ... if that is in a loop of any kind awk will win hands down, but the busybox version of awk is extremely fast and nearly as capable as gawk nawk and standard awk

awk seems difficult at first, but once you get it, its actually pretty capable of doing things that we often use several other tools for. If you can use only one tool for a task, it is often both simpler and faster (not always, my jwm_menu_create only uses busybox ash and it does a lot of stuff that would be faster/easier in awk... I just knew shell scripting better at the time)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

disciple
Posts: 6984
Joined: Sun 21 May 2006, 01:46
Location: Auckland, New Zealand

#6 Post by disciple »

Is http://awk.info/ the closest equivalent to http://sed.sourceforge.net/ (but presumably not written itself in awk)?
Do you know a good gtkdialog program? Please post a link here

Classic Puppy quotes

ROOT FOREVER
GTK2 FOREVER

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#7 Post by technosaurus »

Ok, here it is as something somewhat useful - a tarball extractor that _should_ work with all versions of gtkdialog (2,3,4 and the backport to gtk1)... I used Xdialog for the progress bar, since not all versions of gtkdialog have it. This could probably be extended to work for other types of archives as well by using if (FILENAME ~ ".tar")...

Code: Select all

targress(){
[ -f "$1" ] && TARBALL="$1" || return 1
shift
[ "$1" ] && tar -tvf "$TARBALL" $@ >/tmp/tarfiles
tar -xvf "$TARBALL" $@ |awk '{
if ( $3 != "" ){
   size[$6]=$3
   tot+=$3
}else{
   subtot+=size[$1]
   printf "%d\n", 100 * subtot / tot
}}' /tmp/tarfiles - |Xdialog --wmclass xmessage --progress "extracting ${TARBALL%%*/}" 0 0 -
}

tar -tvf "$1" >/tmp/tarfiles

eval `awk 'BEGIN{print "<vbox><hbox><button><input file>/usr/share/mini-icons/pupget.xpm</input> \
<label>Extract</label></button><button cancel></button></hbox> \
<tree><label>Permissions|Size|Date|Time|Filename</label> \
<variable>TREE1</variable><width>400</width><height>150</height>" }
{print "<item>" $1 "|" $3 "|" $4 "|" $5 "|" $6 "</item>" }
END{print "</tree></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
[ "$EXIT" == "Extract" ] && targress "$1" $TREE1
Edit: fixed formatting related typo that broke the script
Last edited by technosaurus on Wed 16 May 2012, 03:12, edited 1 time in total.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

goingnuts
Posts: 932
Joined: Sun 07 Dec 2008, 13:33
Contact:

#8 Post by goingnuts »

Cant get it running (named the script "test"):

Code: Select all

# ./test disktype-9.tar.bz2 
./test: line 23: Error: command not found
line 23 is the line with:

Code: Select all

END{print "</tree></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
my gtkdialog1 works and /tmp/tarfiles contains:

Code: Select all

drwxr-xr-x root/root         0 2012-05-15 20:33 bin/
-rwxr-xr-x root/root    411644 2012-05-15 20:33 bin/disktype
:?:

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#9 Post by technosaurus »

oops, I must have accidentally deleted the "<" before the height tag after pasting it ... here it is working with busybox applets and gtk1 versions of gtkdialog and Xdialog

Code: Select all

#!/bin/ash

targress(){
[ -f "$1" ] && TARBALL="$1" || return 1
shift
[ "$1" ] && busybox tar -tvf "$TARBALL" $@ >/tmp/tarfiles
busybox tar -xvf "$TARBALL" $@ |awk '{
if ( $3 != "" ){
   size[$6]=$3
   tot+=$3
}else{
   subtot+=size[$1]
   printf "%d\n", 100 * subtot / tot
}}' /tmp/tarfiles - |Xdialog --wmclass xmessage --progress "extracting ${TARBALL##*/}" 0 0 -
}

busybox tar -tvf "$1" >/tmp/tarfiles

eval `busybox awk '
BEGIN{print "<vbox><hbox><button><input file>/usr/share/mini-icons/pupget.xpm</input> \
<label>Extract</label></button><button cancel></button></hbox> \
<tree><label>Permissions|Size|Date|Time|Filename</label> \
<variable>TREE1</variable><width>400</width><height>150</height>" }
{print "<item>" $1 "|" $3 "|" $4 "|" $5 "|" $6 "</item>" }
END{print "</tree></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
[ "$EXIT" == "Extract" ] && targress "$1" $TREE1
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

goingnuts
Posts: 932
Joined: Sun 07 Dec 2008, 13:33
Contact:

#10 Post by goingnuts »

Thanks! It works now - I had to do slight modification as gtkdialog1 only reports the value of the first column - and it seems that gtkdialog2 tree code wont show the content so change tree to table. I havent got the gtkdialog3 to run yet.

Code: Select all

eval `busybox awk '
BEGIN{print "<vbox><hbox><button><input file>/usr/share/mini-icons/pupget.xpm</input> \
<label>Extract</label></button><button cancel></button></hbox> \
<table><label>Filename|Size     |Date       |Time     |Permissions</label> \
<variable>TREE1</variable><width>400</width><height>150</height>" }
{print "<item>" $6 "|" $3 "|" $4 "|" $5 "|" $1 "</item>" }
END{print "</table></vbox>" }' /tmp/tarfiles |gtkdialog1 -s`
[ "$EXIT" == "Extract" ] && echo "TREE is $TREE1" && targress "$1" $TREE1 
I had to load a kernel-pkg to actually view the progress bar - but it works very well!

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#11 Post by technosaurus »

goingnuts wrote:Thanks! It works now - I had to do slight modification as gtkdialog1 only reports the value of the first column - and it seems that gtkdialog2 tree code wont show the content so change tree to table. I havent got the gtkdialog3 to run yet.
...
I had to load a kernel-pkg to actually view the progress bar - but it works very well!
good catch, I was only testing with the name ($6) & forgot to modify that when it changed, it would be nice if the first column would expand instead of the last, since it is the one that is actually used

yeah, the second part is pretty quick, but I may need to add
| tee /tmp/tarfile |
rather than pre-generating it, but I don't know if it will help - I don't think gtkdialog draws until the whole tree is loaded - I need to look into fixing that
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

goingnuts
Posts: 932
Joined: Sun 07 Dec 2008, 13:33
Contact:

#12 Post by goingnuts »

The loading of the kernel pkg was rather slow...maybe also use the progress-bar when loading into gtkdialog? Or load part of the archive listing into gtkdialog - and then refresh the table when all files available? Or both...
Well - maybe this get too much off topic - sorry for that!

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#13 Post by technosaurus »

goingnuts wrote:The loading of the kernel pkg was rather slow...maybe also use the progress-bar when loading into gtkdialog? Or load part of the archive listing into gtkdialog - and then refresh the table when all files available? Or both...
Well - maybe this get too much off topic - sorry for that!
with awk you can check how many lines to list using FNR or NR < gtkdialog_limit (what maybe 40 or so?) ... not sure the gtkdialog command to refresh or how to signal it to refresh other than just redrawing in the END section using variables after the listing is complete (I'm pretty sure the tar listing is what takes so long) ... there is no real way to do a percent bar though just an spinning hourglass (AFAIK, unless there is a way to quickly get the raw number of files in a tarball)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

goingnuts
Posts: 932
Joined: Sun 07 Dec 2008, 13:33
Contact:

#14 Post by goingnuts »

gtkdialog refresh needs a trigger from one of the widgets so it will not work...
Seems that gtkdialog1 autosize first column but gtkdialog2 does not...
Below works in gtkdialog1&2, maybe some awk-script can be used for the loading progress-bar...

Code: Select all

#!/bin/ash

echo "
targress(){	
[ -f \"\$1\" ] && TARBALL=\"\$1\" || return 1
shift
[ \"\$1\" ] && busybox tar -tvf \"\$TARBALL\" \$@ >/tmp/tarfiles
busybox tar -xvf \"\$TARBALL\" \$@ | awk '{
if ( \$3 != \"\" ){
	size[\$6]=\$3
	tot+=\$3
}else{	
	subtot+=size[\$1]
	printf \"%d\n\", 100 * subtot / tot
}}' /tmp/tarfiles - | Xdialog --title \"Archiver\" --wmclass gtkdialog2 --progress \"extracting \${TARBALL##*/}\" 0 0 -	
}
"> /tmp/ashfunc

tarfile="$1"

busybox tar -tvf "$tarfile" | busybox awk  'FS=" " {print $6"|"$3"|"$4"|"$5"|"$1}' > /tmp/tarfiles &

count=10
(while [ ! "$(ps | grep 'tar -tvf')" = "" ];do sleep 1; count=$(expr $count + 2); echo $count; done) | Xdialog --title "Archiver" --wmclass gtkdialog2 --progress "Loading archive - Please wait..." 0 0

export MAIN_DIALOG='
<wtitle>Archiver</wtitle>
<vbox>
	<hbox>
		<button><input file>/usr/share/mini-icons/pupget.xpm</input>
		<label>Extract</label>
		<action>targress '$tarfile' $TREE1</action>
		</button>
		<button cancel></button>
	</hbox>
	<table>
		<label>Filename|Size     |Date           |Time       |Permissions</label>
		<variable>TREE1</variable>
		<width>500</width><height>200</height>
		<input>cat /tmp/tarfiles</input>
	</table>
</vbox>'


gtkdialog1 --program=MAIN_DIALOG -i /tmp/ashfunc
Attachments
snap0001.png
(82.09 KiB) Downloaded 896 times

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#15 Post by technosaurus »

@goingnuts thanks for the fixes, I guess I'll try to come up with some more examples ... probably some kind of universal daemon process that works with sit (my simple icon tray) to handle multiple tray applets in one process ... btw I could probably make something similar to sit for gtk1, but it would need to be swallowed by the tray... didn't you already do one using only xlib and xpm though?
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Bruce B

#16 Post by Bruce B »

goingnuts wrote:Cant get it running (named the script "test"):
We aren't supposed to name executables 'test', because test already exists.

~

goingnuts
Posts: 932
Joined: Sun 07 Dec 2008, 13:33
Contact:

#17 Post by goingnuts »

technosaurus wrote:... didn't you already do one using only xlib and xpm though?
Yes - a single applet (pmmon) and one with place for 5 (pmmon5).link
Bruce B wrote:We aren't supposed to name executables 'test', because test already exists.
I will remember that - in this case I do not think that was the problem though...

starhawk
Posts: 4906
Joined: Mon 22 Nov 2010, 06:04
Location: Everybody knows this is nowhere...

#18 Post by starhawk »

Somewhat off-topic, Technosaurus, did you get my PM? I'm posting here because I think you're likely to see it ;)

penguinpowerppp
Posts: 5
Joined: Sat 14 Apr 2012, 23:49

#19 Post by penguinpowerppp »

goingnuts wrote:
technosaurus wrote:... didn't you already do one using only xlib and xpm though?
Yes - a single applet (pmmon) and one with place for 5 (pmmon5).link
...
I'll have to take a look and see if I can adapt my sit arg parsing logic to it so that it can be unlimited ... which reminds me, I found some nicer fonts in aicon to add to my txt2xpm

Edit ... oops my wife was logged on -technosaurus

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#20 Post by technosaurus »

adding a template:

Code: Select all

#!/bin/awk -f
#FILENAME (name of current file) $FILENAME (contents of current file)
#NF number of fields, $NF last field
#NR line number in all files      #FNR line number in current file
#ORS (default is "\n")            #RS  (default is "\n")
#OFS (default is " ")            #FS (default is [ \t]*)
#system(command) run a command      #close(filename) close(command)
#ARGC, ARGV similar to C, but skips some stuff
#IGNORECASE (default is 0) set to non-0 or use toupper() or tolower()
#ENVIRON array of env vars ex. ENVIRON["SHELL"] (equivalent of $SHELL)
#getline var < file ... close file or command | getline var
#index(haystack, needle) find needle in haystack
#length(string)
#match(string, regexp) returns where the regex starts, or 0
#RLENGTH length of /match/ substring or -1
#RSTART position where the /match/ substring starts, or 0
#split(string, array, fieldsep) split string into an array separated by fieldsep
#printf(format, expression1,...) print format-ted replacing %* with expressions
#%{c,d/i,e,f,g,o,s,x,X,%} char, decimal int, exp notation, float, shortest of
#   exp/float, octal, string, hex int, capitalized hex int, a '%' character
#sprintf(format, expression1,...) store printf in a variable
#sub(regexp, replacement, target) replace first regex with replacement in target
#gsub(regexp, replacement, target) like gsub but for all regex in target
#substr(string, start, length)get substring of string from start to start+length
#print > /dev/stdin, /dev/stdout, /dev/stderr, /dev/fd/# or filename
#output can be piped like print $0 | command
#comparisons <,>,<=,>=,==,!=,~,!~,in use && for AND, || for OR, ! for NOT
#   (~ is for regexp and "in" looks for subscript in array)
#/word/{...} like if match(...) {...} equivalent of grep
#(condition) ? if-true-exp : if-false-exp or use if (condition){}
#math +,-,*,/,%,**,log(x),exp(x),,sqrt(x),cos(x),sin(x),atan2(y,x),
#rand(),srand(x),time(),ctime()
#
#function name (parameter-list) {
#     body-of-function
#}

BEGIN {
#actions that happen before any files are read in
}
#
{
#actions to do on files
}
#
END {
#actions to do after all files are done
}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply