Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sat 22 Feb 2020, 06:57
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
sed scratch pad -- A thread of sed examples
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 4 [51 Posts]   Goto page: 1, 2, 3, 4 Next
Author Message
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Sun 29 Dec 2019, 02:41    Post subject:  sed scratch pad -- A thread of sed examples
Subject description: that might be hard to find online
 

I find sed very difficult to grasp. This thread is to help demonstrate how to do things in sed which one might not be able to find example for online.

My first post is an example on how to call an external function in sed. Here is my example:

Code:

echo a | sed -ne 's/\(.*\)/echo a\1/' -e 'e' -e 'p'

or alternatively:
Code:

echo a | sed -ne '
s/\(.*\)/echo a\1/ #Replace "a" with echo aa
e                  #Execute the output of the last command
p                  #Print the result


the -n option is needed to keep sed from auto printing. Otherwise sed would print each line that it reads.

* The 's' denotes string substitution.
* The brackests "\(...\)" capture the text which matches the regular expression inside the brackets. In our case the regular expression is .* which means match any string (in our case 'a'). The value of the match can be retrieved with the back reference "\1". The backslash in front of each bracket isn't necessary if you use extended regular expressions. However, with extended regular expressions more escaping of special characters may be required.

Next we Execute the external command which is the output of our last expression. In our case we are executing the external command echo aa. The "e" character means execute the external command.

Finally we print the result. The 'p' command is used to print the result.

The output is "aa"

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1977
Location: Japan

PostPosted: Sun 29 Dec 2019, 03:43    Post subject: Re: sed scratch pad -- A thread of sed examples
Subject description: that might be hard to find online
 

s243a wrote:
My first post is an example on how to call an external function in sed. Here is my example:
Code:

echo a | sed -ne 's/\(.*\)/echo a\1/' -e 'e' -e 'p'


Alternatively
Code:
echo a | sed -nr 's/(.*)/echo a\1/ep'
or even shorter
Code:
echo a | sed 's/.*/echo a&/e'

Beware that the e command is a GNU extension and most likely will only work with GNU sed. Does not work with busybox sed.
In my experience calling shell commands from within sed is very slow. Should probably be used only when no other alternatives exist.
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Sun 29 Dec 2019, 20:37    Post subject: Re: sed scratch pad -- A thread of sed examples
Subject description: that might be hard to find online
 

MochiMoppel wrote:
s243a wrote:
My first post is an example on how to call an external function in sed. Here is my example:
Code:

echo a | sed -ne 's/\(.*\)/echo a\1/' -e 'e' -e 'p'


Alternatively
Code:
echo a | sed -nr 's/(.*)/echo a\1/ep'
or even shorter
Code:
echo a | sed 's/.*/echo a&/e'

Beware that the e command is a GNU extension and most likely will only work with GNU sed. Does not work with busybox sed.
In my experience calling shell commands from within sed is very slow. Should probably be used only when no other alternatives exist.


Thanks for the tips and warnings. I want to to hone my sed skills because it is used a lot. This means learning both standard sed an extensions.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Sun 29 Dec 2019, 20:47    Post subject:  

I'm borrowing the next one, which simply numbers the lines of a file:
Quote:

sed '/./=' test | sed '/./N; s/\n/ /'

http://tuxthink.blogspot.com/2012/01/adding-line-numbers-to-file.html

I spent a fiar bit of time trying to google better ways of doing this and while it can probably be done without a pipe the code to do so is probably more complex. The tricky thing about this problem is that the "=" command prints the line number but inserts a new line character after it.

In the above exmaple the "/./" means "match any non-empty line. Said match (pun unintended), wouldn't be necessary if we wanted to match every line.

This syntax is pattern (e.g. /./ ) command ( "=" ). When the pattern matches the command is executed (i.e. print the line number followed by a new line character).

The input file (i.e. test) is:
Quote:

Hi
How are
You.


The first sed command in the pipe outputs:

Quote:

1
Hi
2
How are
3
You.


The second sed command, reads two lines and then removes the new line character. The reading of the second line is done with the "N" command which appends the next line into pattern space. When sed prints it automatically inserts a new line character at the end of the output, unless you use the "-z" option is used in which case the null character (i.e. $\'0' ) is used instead of the new line character.

Many of the sed man pages don't mention that you can use the null caracter as the new line seperator. One way to perhaps do this in a single sed script is to use the "-z" option but there there will be hidden null characters in the output.

As a final note, the fact that we need two sed commands to do this means that some other utility is probably preferable for this application. However, there may be times where one has good reason to pipe sed to sed, in which case this example might be a good starting point.

_________________
Find me on minds and on pearltrees.

Last edited by s243a on Mon 30 Dec 2019, 17:31; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website 
rockedge


Joined: 11 Apr 2012
Posts: 1539
Location: Connecticut, United States

PostPosted: Sun 29 Dec 2019, 21:02    Post subject:  

thanks guys for the sed tips....I'm beginning to use it more often
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1977
Location: Japan

PostPosted: Sun 29 Dec 2019, 22:04    Post subject:  

s243a wrote:
I'm borrowing the next one, which simply numbers the lines of a file

1) The code numbers only non-empty lines of a file. Intentionally?
2) The linked page shows the output having periods after the line numbers, which is not the output of this code
3) Not a mistake but still bad: Naming a file 'test' can lead to nasty errors since test is also the name of a shell command.

s243a wrote:
In the above exmaple the "/./" means "match any line.
No, it means "match any line containing at least one character"
For matching any line the code could have used "/^/" or simply no match pattern at all:
Code:
sed = filename | sed 'N;s/\n/ /'

s243a wrote:
When sed prints it automatically inserts a new line character at the end of the output, unless you use the "-z" option is used in which case the null character (i.e. $\'0' ) is used instead of the new line character.
???
It never adds a new line character at the end of the output and I doubt that the -z option would add a null character. Have you tried this?

s243a wrote:
Many of the sed man pages don't mention that you can use the null caracter as the new line seperator. One way to perhaps do this in a single sed script is to use the "-z" option but there there will be hidden null characters in the output.
I assume that one reason for not mentioning this option is the fact that it's relatively new. My GNU sed version 4.2.1 knows nothing about it. My understanding is that it treats null characters in the input like it would treat linefeeds without this option. It would treat "real" linefeeds as normal characters. Neither null characters nor linefeeds would be stripped or changed for the output, unless explicitely changed by the code.
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Mon 30 Dec 2019, 17:39    Post subject:  

MochiMoppel wrote:
s243a wrote:
I'm borrowing the next one, which simply numbers the lines of a file

1) The code numbers only non-empty lines of a file. Intentionally?
2) The linked page shows the output having periods after the line numbers, which is not the output of this code
3) Not a mistake but still bad: Naming a file 'test' can lead to nasty errors since test is also the name of a shell command.

s243a wrote:
In the above exmaple the "/./" means "match any line.
No, it means "match any line containing at least one character"

Yes. I realized this after reading point "1" above. I suppose it is cleaner to not number empty lines.

Quote:
For matching any line the code could have used "/^/" or simply no match pattern at all:

Agreed.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Mon 30 Dec 2019, 18:02    Post subject:  

MochiMoppel wrote:

s243a wrote:
When sed prints it automatically inserts a new line character at the end of the output, unless you use the "-z" option is used in which case the null character (i.e. $\'0' ) is used instead of the new line character.
???
It never adds a new line character at the end of the output and I doubt that the -z option would add a null character. Have you tried this?

s243a wrote:
Many of the sed man pages don't mention that you can use the null caracter as the new line seperator. One way to perhaps do this in a single sed script is to use the "-z" option but there there will be hidden null characters in the output.
I assume that one reason for not mentioning this option is the fact that it's relatively new. My GNU sed version 4.2.1 knows nothing about it. My understanding is that it treats null characters in the input like it would treat linefeeds without this option. It would treat "real" linefeeds as normal characters. Neither null characters nor linefeeds would be stripped or changed for the output, unless explicitely changed by the code.


We'll look into how sed actually works here later, but for now consider the following:

Code:

[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -z p | tr '\0' '\n'; echo ""
a
a
b
b
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -z p; echo ""
aabb
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -nz p; echo ""
ab
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -nz p | tr '\0' '\n'; echo ""
a
b
[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -zne 's/\(.*\)/c\1/;p' | tr '\0' '\n'; echo ""
ca
cb


I need some time to ponder this and part of pondering it is figuring out how to properly test it.

Note that I had to use the printf function because apparently in bash you can't sotre a null character in a variable (or even string?).

BTW on dpup buster64 we have "sed (GNU sed) 4.7"


Edit: so considering the above here is how we can do it in a single sed command:

Code:

[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -zne '=' -rne 's/([^0-9]+.*)/\1\n/;p'
1a
2b


Of course there are hidden null characters here.

Code:

[root@dpupbuster64 ~] $ { echo -n a; printf '\0'; echo -n b; } | sed -zne '=' -rne 's/([^0-9]+.*)/\1\n/;p' | tr '\0' '.'
1.a
.2.b

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1977
Location: Japan

PostPosted: Tue 31 Dec 2019, 04:59    Post subject:  

s243a wrote:
Note that I had to use the printf function
Question
Instead of
{ echo -n a; printf '\0'; echo -n b; }
try
echo -ne "a\x00b"
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Tue 31 Dec 2019, 12:03    Post subject:  

MochiMoppel wrote:
s243a wrote:
Note that I had to use the printf function
Question
Instead of
{ echo -n a; printf '\0'; echo -n b; }
try
echo -ne "a\x00b"


That also worked.
Code:

# echo -ne "a\x00b" | sed -zne '=' -rne 's/([^0-9]+.*)/\1\n/;p' | tr '\0' '.'
1.a
.2.b


Thanks for the tip. Smile Do you have any documentation on those kinds of codes with the echo command?

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Tue 31 Dec 2019, 14:47    Post subject:  

The next example, I will also borrow:
Code:

sed -e '/./{H;$!d;}' -e 'x;/Administration/!d' thegeekstuff.txt

https://www.thegeekstuff.com/2009/12/unix-sed-tutorial-7-examples-for-sed-hold-and-pattern-buffer-operations/

What this example does is prints paragraphs that contain the word Administration. The two ways to solve this problem, which are apparent to me are as follows:
1. Either use the hold space or alternatively
2. Use Loops.

The above example uses approach #1. I will also try this in another post using approach #2.

The reason that we use the "hold space" here is that when sed reads the next line of input [1], as part of a new cycle, the previous line that is in pattern space is replaced by the line of text read in the next cycle (see [url=http://info2html.sourceforge.net/cgi-bin/info2html-demo/info2html?(sed.info.gz)Execution%2520Cycle]execution cycle[/url]). The two ways around this -- as noted above -- are to either append the previous line to hold space before starting the next cycle, or alternatively use the "N" command to append the next line of text (as a new line), into pattern space.

So anyway recall that /./ matches non blank lines. If there is a match, then we use the "H" command to append the line in we just read from standard in (which is currently in pattern space), into hold space. After this you'll notice "$!d", which means if we are at the last line than delete the pattern space. See "Relations between d, p, and !" at:
https://www.grymoire.com/Unix/Sed.html

Anyway, I'm not really sure of the point of doing this since in the next command (i.e. 'x') we replace the pattern space with the contents of the hold space, which will effectively delete the previous pattern space anyway. The final action in the script is:
Code:

/Administration/!d'


which means, "Delete the paragraph if it doesn't contain the word "Administration".

Notes
---------------------
1. We call it the "next line of input" but the lines can be separated either by a new line character, or in the case of the "-z" option a null character. The -z option is only available in newer versions of sed.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1977
Location: Japan

PostPosted: Tue 31 Dec 2019, 22:40    Post subject:  

s243a wrote:
Do you have any documentation on those kinds of codes with the echo command?
Not sure what you mean by "those kinds of codes". You'll find a good starting point right at your fingertips:
Code:
help echo
I recommend to stay away from octal codes and always use hex codes since with hex the syntax in bash echo, busybox echo and bash printf is the same. And unless you know what you are doing you should avoid to use abbreviated codes like '\0'. You'll always be safe when you use 3 digits for octal and 2 digits for hex.
Back to top
View user's profile Send private message 
MochiMoppel


Joined: 26 Jan 2011
Posts: 1977
Location: Japan

PostPosted: Tue 14 Jan 2020, 05:47    Post subject:  

Looks like another abandoned thread Crying or Very sad

I'll give it a try anyway since I don't know where to ask.
The challenge is to remove all comments from a XML/HTML document, using only sed.

Example text:
Code:
<JWM>
   <Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
      <!-- Additional TrayButton attribute: label -->
      <TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
      <Pager/>
      <!-- Additional TaskList attribute: maxwidth -->
      <TaskList maxwidth="200"/>
      <Dock/>
      <!-- Additional Swallow attribute: height -->
   <!--   <Swallow name="blinky">
         blinkydelayed -bg "#DCDAD5"
      </Swallow> -->
   <!--   <Swallow name="xtmix-launcher">
         xtmix -launch
      </Swallow> -->
   <!--   <Swallow name="asapm">
         asapmshell -u 4
      </Swallow> -->
   <!--   <Swallow name="freememapplet" width="34">
         freememappletshell
      </Swallow> -->
      <Swallow name="xload" width="32">
         xload -nolabel -bg "#888888" -fg red -hl white
      </Swallow>
      <Clock format="%H:%M">minixcal</Clock>
   </Tray>
</JWM>

The problem is that these comments can be multiline. My rough idea is to let sed move a line to the hold buffer when a '<!--' tag is detected, then continue to fill the hold buffer until a '--> is detexted', load the hold buffer into the pattern space and remove the comment, clear the hold buffer and continue with the next cycle. May not be the right way and I'm not even close to achieve the goal. Does anybody know how to do this?
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2468

PostPosted: Tue 14 Jan 2020, 09:48    Post subject:  

MochiMoppel wrote:
Looks like another abandoned thread Crying or Very sad

I'll give it a try anyway since I don't know where to ask.
The challenge is to remove all comments from a XML/HTML document, using only sed.

Example text:
Code:
<JWM>
   <Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
      <!-- Additional TrayButton attribute: label -->
      <TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
      <Pager/>
      <!-- Additional TaskList attribute: maxwidth -->
      <TaskList maxwidth="200"/>
      <Dock/>
      <!-- Additional Swallow attribute: height -->
   <!--   <Swallow name="blinky">
         blinkydelayed -bg "#DCDAD5"
      </Swallow> -->
   <!--   <Swallow name="xtmix-launcher">
         xtmix -launch
      </Swallow> -->
   <!--   <Swallow name="asapm">
         asapmshell -u 4
      </Swallow> -->
   <!--   <Swallow name="freememapplet" width="34">
         freememappletshell
      </Swallow> -->
      <Swallow name="xload" width="32">
         xload -nolabel -bg "#888888" -fg red -hl white
      </Swallow>
      <Clock format="%H:%M">minixcal</Clock>
   </Tray>
</JWM>

The problem is that these comments can be multiline. My rough idea is to let sed move a line to the hold buffer when a '<!--' tag is detected, then continue to fill the hold buffer until a '--> is detexted', load the hold buffer into the pattern space and remove the comment, clear the hold buffer and continue with the next cycle. May not be the right way and I'm not even close to achieve the goal. Does anybody know how to do this?


I have to go to work so something like:
Code:

#If we don't yet have a terminating comment just append to the hold space and start the next cycle.
/.*-->.*/!{
  H #Append pattern space to hold space
  d #Delete pattern space and start next cycle
  }
#If we have a closing comment append data to hold space and copy the hold space to the pattern space to see if we can match both an opening and closing comment in pattern space.
/.*-->.*/ {
    H #Append new data to hold space 
    x #Exchange hold space with pattern space
    h #Copy pattern space to hold space
  }
#If this block matches the previous block has already been executed and this block will be executed next.
/.*<!--.* -->.*./ {
    s/<!--.* -->// #Delete comment
    p #Print patter space
    s/.*//g #delete pattern space
    x #exchange pattern space with hold space
    d #delete pattern space and start next cycle.
  }


I might test this latter. We'll see.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
6502coder


Joined: 23 Mar 2009
Posts: 663
Location: Western United States

PostPosted: Tue 14 Jan 2020, 12:24    Post subject:  

Isn't this essentially the same as the problem of using sed to remove comments from a C program, for which Googling turns up a bunch of suggestion? I haven't looked into this carefully, just making an observation.
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 4 [51 Posts]   Goto page: 1, 2, 3, 4 Next
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0817s ][ Queries: 12 (0.0086s) ][ GZIP on ]