gawk

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

gawk

#1 Post by smokey01 »

I'm hoping someone can help me with this gawk problem I have.

Create two small files.

-----------------Cut------------------------------

Filename: testgawk with contents:

Uppercase
uppercase

-------------------Cut----------------------------

Filename: test-gawk.sh with contents:

#!/bin/sh

gawk '/Uppercase/ { print $0 } BEGIN {

IGNORECASE = 1

}

' testgawk

--------------------------Cut---------------------

Make test-gawk.sh executable.

When test-gawk.sh is run in a terminal you get:

Uppercase
uppercase

Now change IGNORECASE = 1 to IGNORECASE = 0

Run test-gawk.sh again, this time you get:
Uppercase only.

Now this is correct because /Uppercase/ is matched.

If you were to change /Uppercase/ to /uppercase/ only
uppercase would be found.

IGNORECASE = 0 is case sensitive (default)
IGNORECASE = 1 means ignore case.

Sorry about the long explanation, here comes the issue.

I want to be able to place a variable on IGNORECASE.
EG: IGNORECASE = variable

I also want to be able to define this variable outside of the
gawk function but use it inside the gawk function.

Something like this:

#!/bin/sh

variable="1"

gawk '/Uppercase/ { print $0 } BEGIN {

IGNORECASE = variable

}

' testgawk

Sounds easy doesn't it, but it's not. Yes, I can hear you thinking,
why not put a $ before variable, nope, gawk doesn't like that.

I would love a solution on this. I don't need an alternative solution
with awk, sed, grep or anything else. It has to be gawk.

TIA.

Peterm321
Posts: 411
Joined: Thu 29 Jan 2009, 14:09
Location: UK

#2 Post by Peterm321 »

Strange. It is possible to set a variable in an AWK script by passing a name=value pair with the -v switch, yet when I tried it with gawk, IGNORECASE can be set by a constant ie IGNORECASE=1 but not IGNORECASE=variable. Perhaps its just a version of gawk I have with a bug. Busybox awk worked as advertised.

Code: Select all

#!/bin/sh
AWKPROGRAM='BEGIN { IGNORECASE=casevar ; } /Uppercase/ { printf("\nIGNORECASE=%s, %s, casevar= %d\n",IGNORECASE,$0,casevar); } '
#
echo -e "\n\n\tDoesnt work:\n"
caseval=0
echo "UPPERCASE" | gawk  -v casevar=$caseval "$AWKPROGRAM"
caseval=1
echo "UPPERCASE" | gawk  -v casevar=$caseval "$AWKPROGRAM"
#
# works (busybox awk)
#
echo -e "\n\n\tWorks in busybox awk:\n"
caseval=0
echo "UPPERCASE" | awk  -v casevar=$caseval "$AWKPROGRAM"
caseval=1
echo "UPPERCASE" | awk  -v casevar=$caseval "$AWKPROGRAM"
#
echo -e "\n\n\tWorks in Gawk if IGNORECASE is set by a constant:\n"
AWKPROGRAM='BEGIN { if (casevar) { IGNORECASE=1 } ; } /Uppercase/ { printf("\nIGNORECASE=%s, %s, casevar= %d\n",IGNORECASE,$0,casevar); } '
caseval=0
echo "UPPERCASE" | gawk  -v casevar=$caseval "$AWKPROGRAM"
caseval=1
echo "UPPERCASE" | gawk  -v casevar=$caseval "$AWKPROGRAM"

The result I got from the above was:

Code: Select all

	Doesnt work:


IGNORECASE=0, UPPERCASE, casevar= 0

IGNORECASE=1, UPPERCASE, casevar= 1


	Works in busybox awk:


IGNORECASE=1, UPPERCASE, casevar= 1


	Works in Gawk if IGNORECASE is set by a constant:


IGNORECASE=1, UPPERCASE, casevar= 1

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#3 Post by musher0 »

Hi guys.

For the sake of argument, let's say that I have a list of members for a
small stamp collectors club. There are, say, 15 of us. And there are two
named "Peter".

The list is this format: Last-Name First-Name Street-Address E-Mail

To fish out the coordinates of the two Peters, I type this at console:

Code: Select all

A=Peter;awk '$2 ~ /'$A'/ { print }' StampCollClub.lst
First I define var. A.

Then, to query on field #2 (the "First-Name" field), inside the awk line, I
suspend awk's apostrophes around var. $A to let var. $A be "absorbed"
by the awk line as definition of field #2.

What happens is that awk is suspended for a millisecond, so bash re-
surfaces to provide the contents of the $A variable. And then we have
another apostrophe, which ends the suspension, and awk resumes its work.

(Of course I could've used the name "Peter" itself in awk to define field #2,
but there are times in bash scripting where we need the above formulation.)

I can't remember where I found this trick, but it's been fool-proof for me
ever since.

-v never worked for me, and I found that inserting an awk variable after
the awk formula but before the file name was iffy.

I hope this answers Smokey01's question.

BFN.
Attachments
example.jpg
Example using a draft of this post.
(83.76 KiB) Downloaded 247 times
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

Peterm321
Posts: 411
Joined: Thu 29 Jan 2009, 14:09
Location: UK

#4 Post by Peterm321 »

Code:
A=Peter;awk '$2 ~ /'$A'/ { print }' StampCollClub.lst
Interesting, though the code would need to be refined to assign IGNORECASE from a variable and take it's value into account when pattern matching.

In fact there is more than one way to pass variables from a shell script to an AWK program.

One way is to pass a command argument and read via ARGV[nnn]. smokey01 wanted to use a variable so I presume ruled out this method.

It would also be possible to export a variable and use AWK's ENVIRON["VARIABLE"] function to read its contents.

And of course the main loop can dispense with the /<regexp>/ method of selecting which contents to print. Mostly I avoid using the slash notation instead move the logic into the main loop. Below is an example AWK program which I use to search multiple zip files for a search string.

It works by piping the contents of unzip -l which is read by the main loop, and passing via ARGV[] the filename being searched and the string to be matched:

Code: Select all

  unzip -l $ZIPFILE | searchzip.awk $ZIPFILE $SEARCHSTRING

Code: Select all

#!/bin/awk -f
#
# Case insensitive index function:
#
function indexi(p1,p2) {
 return int(index(toupper(p1),toupper(p2)));
}
#
BEGIN {
 IGNORECASE=1;
 fname=ARGV[1];
 searchstring=ARGV[2];
 ARGC=1;
 multimatch=0;
 if (searchstring ~ /,/ ) {
    multimatch=1;
    split(searchstring,search_string2,",");
    }
 
} # E N D (BEGIN)

 
{
  if (multimatch) {
     if (indexi($0,search_string2[1]) > 0 && indexi($0,search_string2[2]) > 0) 
              { printf("\nFN= %s,\t%s",fname,$0); }
     } else {
  
      if (indexi($0,searchstring) > 0 ) 
              { printf("\nFN= %s,\t%s",fname,$0); }
    }
}

Moving the selection decision from the /<regexp>/ notation into the main loop could be used to take more complicated decisions based on the data and/or whether to apply case sensitive or not searches.

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#5 Post by smokey01 »

Thanks guys but unfortunately it didn't help, maybe I'm a little slow on the uptake. This is the last hurdle for a little script I'm working on.

I'm trying to use a gtkdialog <checkbox> to signal a 0 or 1 to IGNORECASE.

It's just not working for me.

Thanks

some1
Posts: 117
Joined: Thu 17 Jan 2013, 11:07

#6 Post by some1 »

smokey: Try this

Code: Select all

#!/bin/sh

#MYSHELLCASECTL=0 # obey case

MYSHELLCASECTL=1 # ignore case
echo "smokeyscase
sMoKeYsCaSe"|awk -v myawkcasevar=$MYSHELLCASECTL 'BEGIN{IGNORECASE=myawkcasevar+0;}
/smokeyscase/{print;}' #>outRES

#expected
#MYSHELLCASECTL=0 -> smokeyscase
#
#MYSHELLCASECTL=1 -> smokeyscase
#                 -> sMoKeYsCaSe
IGNORECASE=myawkcasevar+0

IGNORECASE has to be a number,
awk assumes that myawkcasevar is a string -
we typecast myawkcasevar into a number by doing a number operation
---------------
Musher0:

Its a bad,inefficient approach -injecting shell-vars into awk-code.Use the -v switch or read into ARGV instead.
---------------
Edit:Just edited a typo
Last edited by some1 on Wed 30 Aug 2017, 13:41, edited 1 time in total.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#7 Post by MochiMoppel »

Nice trick :D

BTW:There is a typo in your code. Should be gawk -v myawkcasevar=$MYSHELLCASECTL
(myawkcasevar corrects the typo and gawk instead of awk is mandatory since IGNORECASE is gawk specific. Most Puppies symlink awk to gawk so you won't notice but better be explicit)

some1
Posts: 117
Joined: Thu 17 Jan 2013, 11:07

#8 Post by some1 »

MochiMoppel:
Yeah - you caught me while I was editing.

There are many awk-versions outthere,so awk-code may or may not work on a given awk-version.
But I did have gawk in mind - so if you know of some puppies -
which ONLY have the busybox-awk - perhaps you should/would mention them.

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#9 Post by smokey01 »

some1 wrote:smokey: Try this

Code: Select all

#!/bin/sh

#MYSHELLCASECTL=0 # obey case

MYSHELLCASECTL=1 # ignore case
echo "smokeyscase
sMoKeYsCaSe"|awk -v myawkcasevar=$MYSHELLCASECTL 'BEGIN{IGNORECASE=myawkcasevar+0;}
/smokeyscase/{print;}' #>outRES

#expected
#MYSHELLCASECTL=0 -> smokeyscase
#
#MYSHELLCASECTL=1 -> smokeyscase
#                 -> sMoKeYsCaSe
IGNORECASE=myawkcasevar+0

IGNORECASE has to be a number,
awk assumes that myawkcasevar is a string -
we typecast myawkcasevar into a number by doing a number operation
---------------
Musher0:

Its a bad,inefficient approach -injecting shell-vars into awk-code.Use the -v switch or read into ARGV instead.
---------------
Edit:Just edited a typo
@some1, success. This is actually working. I can declare a variable external to gawk and it's seen inside gawk as long as I export it like:

export MYSHELLCASECTL=1

or

export MYSHELLCASECTL=0

If I place it in an if/then statement like this it fails:

Code: Select all

case_yes () {
MYSHELLCASECTL=0
}
export -f case_yes
#
case_no () {
MYSHELLCASECTL=1
}
export -f case_no
The if/then is used by a gtkdialog <checkbox> in a menuitem like:

Code: Select all

<menuitem label="Case Sensitive Search" checkbox="false">
					<variable>CHECKBOX</variable>
					<action>echo Checkbox is $CHECKBOX now.</action>
					<action>"if [ $CHECKBOX = true ]; then
					case_yes
					echo $MYSHELLCASECTL
				else
					case_no
					echo $MYSHELLCASECTL
					fi"</action>
					<variable>CHECKBOX</variable>
				</menuitem>
There must be something going on here that I keep missing. My eyes have been hanging out of my head for a few days now.

Thanks

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#10 Post by musher0 »

some1 wrote:smokey: Try this(...)
#
---------------
Musher0:

Its a bad,inefficient approach -injecting shell-vars into awk-code.Use the -v switch or read into ARGV instead.
---------------
(...)
Hi some1.

Please provide proof that it is bad? "From-the-pulpit arguments" do not
work with me. It's called "code injection", and it's very efficient!

Please see answer #1 here for an example (towards the bottom of
the page).

@Smokey01:
I don't know if it's relevant to your problem, but bash "text variables" can
be made uppercase or lowercase at will. Hopefully you know this trick?

Code: Select all

A=HELLO;echo "${A,,} --- ${A,}"
Result: hello --- hELLO

Code: Select all

B=hello;echo "${B^^} --- ${B^}"
Result : HELLO --- Hello
Ref. : The Bash https://linux.die.net/man/1/bash/]Manual, the
paragraph entitled "case modification".

BFN.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#11 Post by MochiMoppel »

smokey01 wrote:If I place it in an if/then statement like this it fails:
What fails? Seems to work. But where does gawk fit in?

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#12 Post by smokey01 »

@musher0 some1 code sorted the awk issue with ignorecase.

@MM I was having a checkbox issue but CatDude came up with a solution. Since then we have sorted a couple more problems. Almost ready to release.

Quite happy with the outcome.

Thanks all.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#13 Post by MochiMoppel »

@some1:musher0 asked a valid question about your flat rejection of variable injection.
IMO this is an important issue and needs some explanation from you or anybody else. I understand the potential security issue involved. Without knowing smokey01's code it's hard to say if this argument applies , but chances are that there is no risk at all . Would have made smokey01's task very easy. You also mentioned somewhere that variable injection into awk could create "speed bumps" - I didn't understand that. Enlighten us :wink:

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#14 Post by smokey01 »

MochiMoppel wrote:@some1:musher0 asked a valid question about your flat rejection of variable injection.
IMO this is an important issue and needs some explanation from you or anybody else. I understand the potential security issue involved. Without knowing smokey01's code it's hard to say if this argument applies , but chances are that there is no risk at all . Would have made smokey01's task very easy. You also mentioned somewhere that variable injection into awk could create "speed bumps" - I didn't understand that. Enlighten us :wink:
Sorry I don't understand it either.

Post Reply