Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Thu 09 Apr 2020, 19:31
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
PCRE -- Perl Compatible Regular Expressions
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [9 Posts]  
Author Message
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sat 07 Mar 2020, 20:55    Post subject:  PCRE -- Perl Compatible Regular Expressions  

In this thread lets give some examples of interesting perl compatible regular expressions and some tips on how to understand them.

We of course could use either perl or ssed (see post). Here's an example. Consider:
Code:

a.so.b


I want to match something that is "Not = ".so." followed by ".so." followed by "b". Here is my attempt which appears to work:

Code:

^((?![.]so[.]).)*([.]so[.])(.*)$

https://regex101.com/r/aUa8Ml/1/
**uses negative lookahead. See:
https://www.regular-expressions.info/lookaround.htm
https://www.regular-expressions.info/refadv.html

Now say, we don't know if there will be a ".so.b" then we might try this

Code:

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?

Here are some related links:
https://stackoverflow.com/questions/7520704/sed-can-my-pattern-contain-an-is-not-character-how-do-i-say-is-not-x
https://unix.stackexchange.com/questions/145773/reverse-match-in-sed-replace-opposite-of-what-was-found
https://stackoverflow.com/questions/977251/regular-expressions-and-negating-a-whole-character-group
https://www.perlmonks.org/?node_id=229044/
https://stackoverflow.com/questions/23403494/perl-matching-string-not-containing-pattern

_________________
Find me on minds and on pearltrees.

Last edited by s243a on Sat 07 Mar 2020, 21:07; edited 2 times in total
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sat 07 Mar 2020, 21:02    Post subject:  

Anyway, once one gets a good grasp on perl compatible regular expressions, note that they can be used in several places. For instance grep can use PCRE with certain options, one can use ssed (see post) instead of sed if they want to use PCRE (perl compatible regular expressions), or you can do perl one liners.

For example:
Code:

# echo abc | perl -pe 's/(ab)/12/'
12c


See:
https://stackoverflow.com/questions/22729336/convert-sed-one-liner-to-perl

I'm pretty sure puppy typically comes with a minimal amount of perl functionality out of the box, so I presume that you can do something like the above out of the box with puppy.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sat 07 Mar 2020, 21:21    Post subject: Re: PCRE -- Perl Comptable Regular Expressions  

s243a wrote:


Now say, we don't know if there will be a ".so.b" then we might try this

Code:

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
[/code]

The regular expression tester (above link) gave me a good clue. It says a repeated capture group only matches the last iteration, to match the whole thing put a capture group around it. This is what I came up with:

Code:

(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/3

In the test string "a.so.b" we have:

Code:

Full match   0-6   a.so.b
Group 1.   0-1   a
Group 2.   1-5   .so.
Group 3.   5-6   b


and with the test string ab we have:

Code:

Full match   0-2   ab
Group 1.   0-2   ab


Does this look like a good regular expression or can someone find issues with it?

_________________
Find me on minds and on pearltrees.

Last edited by s243a on Sat 07 Mar 2020, 22:54; edited 1 time in total
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sat 07 Mar 2020, 22:06    Post subject: Re: PCRE -- Perl Comptable Regular Expressions  

s243a wrote:
s243a wrote:


Now say, we don't know if there will be a ".so.b" then we might try this

Code:

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
[/code]

The regular expression tester (above link) gave me a good clue. It says a repeated capture group only matches the last iteration, to match the whole thing put a capture group around it. This is what I came up with:

Code:

(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/3

In the test string "a.so.b" we have:

Code:

Full match   0-6   a.so.b
Group 1.   0-1   a
Group 2.   1-5   .so.
Group 3.   5-6   b


and with the test string ab we have:

Code:

Full match   0-2   ab
Group 1.   0-2   ab


Does this look like a good regular expression or can someone find issues with it?


Now here is a way that you can assign capture groups #1 and #3 to two variables "a" and "b"

Code:

# read -d '\n' a b < <(echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$/\1\n\3/')
# echo $a
a
# echo $b
b

**The above will work on bash but not ash since ash doesn't support process substitution.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sun 08 Mar 2020, 00:18    Post subject: Re: PCRE -- Perl Comptable Regular Expressions  

s243a wrote:
s243a wrote:
s243a wrote:


Now say, we don't know if there will be a ".so.b" then we might try this

Code:

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/2

but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?
[/code]

The regular expression tester (above link) gave me a good clue. It says a repeated capture group only matches the last iteration, to match the whole thing put a capture group around it. This is what I came up with:

Code:

(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$

https://regex101.com/r/aUa8Ml/3

In the test string "a.so.b" we have:

Code:

Full match   0-6   a.so.b
Group 1.   0-1   a
Group 2.   1-5   .so.
Group 3.   5-6   b


and with the test string ab we have:

Code:

Full match   0-2   ab
Group 1.   0-2   ab


Does this look like a good regular expression or can someone find issues with it?


Now here is a way that you can assign capture groups #1 and #3 to two variables "a" and "b"

Code:

# read -d '\n' a b < <(echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$/\1\n\3/')
# echo $a
a
# echo $b
b

**The above will work on bash but not ash since ash doesn't support process substitution.


I wrote some code to do the same thing as above but without using regualar expressions. I'm not sure which approach is faster and/or more readable.

Code:

function split_on_so(){
  local str=$1
  local len=${#str}
  local len_m=$((len-1))
  local index
  local s1
  local s2
  local p1
  local p2

    ind=$(expr index $str .so)
    len=${#str}
    [ $ind -eq 0 ] && ind=$len
    p1=$((ind-1))
    s1=${str:0:$p1}
    if [ $ind -lt $len ]; then
      p2=$((ind+2))
      if [ ${str:p2:1} = '.' ]; then
        p2=$((p2+1))
      fi
    else
      p2=len
    fi
    s2=${str:$p2}

  echo "$s1"
  echo "$s2"
}

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sun 08 Mar 2020, 00:40    Post subject: Re: PCRE -- Perl Comptable Regular Expressions  

s243a wrote:


Now here is a way that you can assign capture groups #1 and #3 to two variables "a" and "b"

Code:

# read -d '\n' a b < <(echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]).)+)(?:([.]so[.])(.*))?$/\1\n\3/')
# echo $a
a
# echo $b
b

**The above will work on bash but not ash since ash doesn't support process substitution.


Here's a slightly improved regular expression:

Code:

# echo a.so.b | perl -pe 's/(^(?:(?![.]so[.]?).)+)(?:([.]so[.]?)(.*))?$/\1\n\3/'
a
b
# echo a.so | perl -pe 's/(^(?:(?![.]so[.]?).)+)(?:([.]so[.]?)(.*))?$/\1\n\3/'
a

# echo ab | perl -pe 's/(^(?:(?![.]so[.]?).)+)(?:([.]so[.]?)(.*))?$/\1\n\3/'
ab


What I added was the question marke at the end of "[.]so[.]?", so that it would work with an input such as "a.so".

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
MochiMoppel


Joined: 26 Jan 2011
Posts: 2016
Location: Japan

PostPosted: Sun 08 Mar 2020, 11:01    Post subject: Re: PCRE -- Perl Comptable Regular Expressions  

s243a wrote:
I wrote some code to do the same thing as above
Not the same. The latter may produce wrong results. Hint: Check the documentation for the expr command:
  "index STRING CHARS   Index in STRING where any CHARS is found, or 0"
Try 's.so.b' or 'o.so.b' so see what "any CHARS" means.

Quote:
I'm not sure which approach is faster
The latter

Quote:
and/or more readable.
Laughing
Back to top
View user's profile Send private message 
s243a

Joined: 02 Sep 2014
Posts: 2620

PostPosted: Sun 08 Mar 2020, 17:27    Post subject: Re: PCRE -- Perl Comptable Regular Expressions  

MochiMoppel wrote:
s243a wrote:
I wrote some code to do the same thing as above
Not the same. The latter may produce wrong results. Hint: Check the documentation for the expr command:
  "index STRING CHARS   Index in STRING where any CHARS is found, or 0"
Try 's.so.b' or 'o.so.b' so see what "any CHARS" means.


It should be fixed now:
Code:

function split_on_so(){
  local str=$1
  local len=${#str}
  local len_m=$((len-1))
  local s1
  local s2
  local p1
  local p2

    len=${#str}   
    s1=${str%%.so*}
    p1=${#s1}

    if [ $p1 -lt $len ]; then
      p2=$((p1+3))
      if [ ${str:p2:1} = '.' ]; then
        p2=$((p2+1))
      fi
    else
      p2=len
    fi
    s2=${str:$p2}

  echo "$s1"
  echo "$s2"
}



Quote:

Quote:
I'm not sure which approach is faster
The latter

Quote:
and/or more readable.
Laughing


I suppose then I'll have to find a better application of PCRE.

_________________
Find me on minds and on pearltrees.
Back to top
View user's profile Send private message Visit poster's website 
GustavoYz


Joined: 07 Jul 2010
Posts: 895
Location: .ar

PostPosted: Wed 25 Mar 2020, 14:19    Post subject:  

Quote:

Code:

^((?![.]so[.]).)*(?:([.]so[.])(.*))?$


but with the test string "ab" it doesn't work. It only matches "b"...well sort of. The full match is "ab" but there is only one capture group which is "b". So what am missing?


Not sure I understood the problem correctly, but by the time you reach "b" you're overwriting
the capture group at $1 (where was "a" and then back-tracked to just put "b").

Using your expression, I think this works:

Code:
^((?![.]so[.]).+?)(?:([.]so[.])(.*))?$


It gets anything before '.so.' at $1, '.so.' at $2 and anything after at $3.
If there is no '.so.', all goes to $1. However be aware that you can expect a
bazillion backtracks if the input string is somehting like 'bunchoftextcuzwhynot.so.nowsomemore'.

If you're using Perl, I'd recommend you Regexp::Debugger which is awesome and has a nice
interface that shows you the steps of the matching process.
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [9 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0594s ][ Queries: 12 (0.0053s) ][ GZIP on ]