Geany - automate "replace" with regular expressions??

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

Geany - automate "replace" with regular expressions??

#1 Post by greengeek »

.
Is it possible to use a script to automate some of Geany's inbuilt editing functions so that I don't have to work through a long file manually driving the find/replace function?

********************************************************************
EDIT : Here is an important more recent note from technosaurus:
technosaurus wrote:Just thought I would mention that as of 1.25 geany (released this month) has a checkbox to allow multiline regex or otherwise uses sed-style matching.
Quoted from other thread here
********************************************************************

Continued original thread:
I use Geany to edit a bunch of text messages that I copy off my cellphone. First I copy the two data lines out of each backed up text message then I use the "replace" function to help me get rid of some of the extra words/numbers/jargon/formatting etc that accompanies the text message so that I can get down to something tidy and easily readable.

I have learned how to set the "use regular expressions" function in the "replace" menu so now I can search for word strings and insert/delete line feeds, tabs etc but it would be nice to be able to automate this in a script of some sort.

Here is an example of the sort of data formatting I am working with:

Code: Select all

Date:02.06.2015 10.12.35
TEXT:Hey,did you find the 2nd wheel?
Date:02.06.2015 10.14.17
TEXT:Look in the garage
Date:02.06.2015 10.16.00
TEXT:Its definitely there!!
and here is an example of what I want my formatting to achieve:

Code: Select all

02.06	10.12.35		Hey,did you find the 2nd wheel?
02.06	10.14.17		Look in the garage
02.06	10.16.00		Its definitely there!!
I would like to know if there is some way to use a script to open Geany and have it access my file, do several "replace" functions then save the new file. Here are the notes I have made for myself to follow when I do the manual processing:

Code: Select all

Open the Geany 'replace' menu then turn on "Replace all - in document". Next turn on "regular expressions" in the replace dialog and then do the following steps:

1) Replace Date: with \			(\ is the 'regular expression' which means backspace)
2) Replace .2015(followed by space) with \t (\t is the 'regular expression' that means tab)
3) Replace \nTEXT: with \t\t	(\n means "the newline or linefeed preceeding TEXT:") (\t\t means double tab)
This method of using 'regular expressions' does give me the ability to edit the document exactly as i want manually but I would like to have an automatic method to concatenate several replace functions one after the other somehow so that I can watch TV while Geany is doing the work editing what will be a long file.
(Maybe there is a better way than Geany? - sed seems like an appropriate method but I struggle with the level of technical accuracy that sed requires so I do feel more comfortable with Geany at this stage)

Here is the typical 'replace' menu I am using in Geany (with "regular expressions" ticked)
:
Attachments
Replace_regular_expressions.jpg
(31.62 KiB) Downloaded 484 times
Last edited by greengeek on Wed 22 Jul 2015, 19:36, edited 1 time in total.

User avatar
6502coder
Posts: 677
Joined: Mon 23 Mar 2009, 18:07
Location: Western United States

#2 Post by 6502coder »

Better to use something like sed or awk for this.

I wrote an awk script named "dt.awk" to do the text processing you seem to want, while passing everything else through unchanged. The file "data.txt" is my guinea pig to test it out.

Code: Select all

$ cat data.txt
Date:02.06.2015 10.12.35
TEXT:Hey,did you find the 2nd wheel?
Foo candy remains onto
Date:02.06.2015 10.14.17
TEXT:Look in the garage
jabba the hutt Date:green
Date:02.06.2015 10.16.00
TEXT:Its definitely there!!     
remarkably sound TEXT:mark
all right now

$ cat dt.awk
{   if (substr($1, 1, 5) == "Date:")
    {
        printf( "%s\t%s", substr($1,6), $2);
    }
    else if (substr($1, 1, 5) == "TEXT:")
    {
        printf( "\t%s\n", substr($0,6));
    }
    else
        print
}

$ awk -f dt.awk data.txt > data2.txt

$ cat data2.txt
02.06.2015      10.12.35        Hey,did you find the 2nd wheel?
Foo candy remains onto
02.06.2015      10.14.17        Look in the garage
jabba the hutt Date:green
02.06.2015      10.16.00        Its definitely there!!  
remarkably sound TEXT:mark
all right now

User avatar
01micko
Posts: 8741
Joined: Sat 11 Oct 2008, 13:39
Location: qld
Contact:

#3 Post by 01micko »

no need for geany or sed, just shell (ash) and /bin/echo

Code: Select all

#!/bin/ash

while read line;do
	if echo "$line" | grep -q "^Date";then
		dline=${line#*:}
		dline=${dline%% *}
		date=${dline%.*}
		echo -n "$date" >> result
		time=${line##* }
		echo -en "\t $time" >> result
	else
		text=${line#*:}
		echo -e "\t\t $text" >> result
	fi
done < msgs
PS: you wont have time to watch TV as a largish file should be done in seconds.
Puppy Linux Blog - contact me for access

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

Re: Geany - automate "replace" with regular expressions??

#4 Post by MochiMoppel »

greengeek wrote:I would like to have an automatic method to concatenate several replace functions
Not possible with Geany. But since you have already mastered the most difficult task, creating regex patterns, all there is left is to find a tool to apply these patterns to your text. If you have come so far, sed is simple. In your example you don't even need regular expressions (you could use Geany's option "Use escape sequences") and you could do it with pure bash (see below).
01micko wrote:no need for geany or sed, just shell
Do I see grep? Me thinks you are cheating :lol:

Here is a way to use "just shell":

Code: Select all

#!/bin/bash
IN=$(cat msgs.txt)
OUT=${IN//Date:/}
OUT=${OUT//.2015 /$'\t'}
OUT=${OUT//$'\n'TEXT:/$'\t\t'}
echo "$OUT" > msgs_formatted.txt

User avatar
01micko
Posts: 8741
Joined: Sat 11 Oct 2008, 13:39
Location: qld
Contact:

Re: Geany - automate "replace" with regular expressions??

#5 Post by 01micko »

MochiMoppel wrote:
01micko wrote:no need for geany or sed, just shell
Do I see grep? Me thinks you are cheating :lol:
Indeed!

Still not pure shell but printf is a busybox applet..

Code: Select all

#!/bin/ash

while read line;do
	if [ "${line%:*}" = "Date" ];then
		dline=${line#*:}
		dline=${dline%% *}
		date=${dline%.*}
		time=${line##* }
		printf "${date}\t${time}\t\t" >> result
	else
		text=${line#*:}
		echo "$text" >> result
	fi
done < msgs
Puppy Linux Blog - contact me for access

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#6 Post by MochiMoppel »

If it really has to be Geany: Geany provides the option ofSending text through custom commands. The manual gives an example how to built a Replace all command using sed. You can build on that and create more complex commands, but it all boils down to using shell scripting syntax.

Using my previous script you can try it:
Go to Edit > Format > Send selection to > Set custom commands
Double click on an empty command spot (probably command 1) and paste this glorious one-liner:

Code: Select all

/bin/sh -c "IN=\"$(</dev/stdin)\";OUT=${IN//Date:/};OUT=${OUT//.2015 /$'\t'};OUT=${OUT//$'\n'TEXT:/$'\t\t'};echo -n \"$OUT\""
Preferences > Key bindings lets you define keyboard shortcuts for custom commands 1 ~ 3. I used <Primary>1 which translates into Ctrl+1
Now you can select your messages and after hitting Ctrl+1 the custom command can do the rest.

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#7 Post by seaside »

Here's a one liner sed...

Code: Select all

sed  -e 'N;s/\(.*\)\n\(.*\)/\1\2/' -e 's/Date:/ /' -e 's/TEXT:/ /' -e 's/\.[0-9]\{4\}//' msgs
It just appends the line below to the one above and removes unwanted items.

Cheers,
s

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#8 Post by greengeek »

Wow! Thank you all for the options. I am working my way through them now and tried the first three so far and they are all doing the business really well - this is wetting my appetite to extend the script to pick the relevant text lines out of a whole directory full of individual sms text files. This is going to be a real time saver.

I will report back with a summary of which option I find most usable and customisable for my future needs.
cheers!

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#9 Post by technosaurus »

Geany can still do multiple matches; use this:

Code: Select all

Date:([.0-9]*)[.][0-9]* ([.0-9]*).*\nTEXT:

Code: Select all

\1 \2 
each parenthesis is a match, which you can use similar to $1 $2 in shell
... regexes are about 1000% more useful with this 1 feature
sed can do the same thing, but thought it may be helpful to others in geany, because I use it all the time
... especially when I am dealing with stuff that spans multiple lines (where sed is less effective)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#10 Post by greengeek »

6502coder wrote:

Code: Select all

$ cat dt.awk
{   if (substr($1, 1, 5) == "Date:")
    {
        printf( "%s\t%s", substr($1,6), $2);
    }
    else if (substr($1, 1, 5) == "TEXT:")
    {
        printf( "\t%s\n", substr($0,6));
    }
    else
        print
}
I am now trialling this method against the original unedited sms text files where there are other text lines that I have edited out and not shown in my first post above. You mentioned that this code will be "passing everything else through unchanged" which I have decided i need to avoid. I have tried modifying the last line of the code as follows:

Code: Select all

$ cat dt.awk
{   if (substr($1, 1, 5) == "Date:")
    {
        printf( "%s\t%s", substr($1,6), $2);
    }
    else if (substr($1, 1, 5) == "TEXT:")
    {
        printf( "\t%s\n", substr($0,6));
    }
    else
        print > /dev/null
} 
This gives me the result I need in data2.txt although it does also create a file called "0" in /root (where I am doing my testing). Is there a better way for me to dump the extra unneeded data rather than using > /dev/null?

User avatar
6502coder
Posts: 677
Joined: Mon 23 Mar 2009, 18:07
Location: Western United States

#11 Post by 6502coder »

greengeek wrote:Is there a better way for me to dump the extra unneeded data rather than using > /dev/null?
Yes, simply delete the two lines

Code: Select all

     else
            print
That "0" file was being created by your "print > /dev/null" which is not correct awk syntax. (Awk syntax is not the same as shell syntax).

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#12 Post by technosaurus »

technosaurus wrote:Geany can still do multiple matches; use this:
Date:([.0-9]*)[.][0-9]* ([.0-9]*).*\nTEXT:
\1 \2
each parenthesis is a match, which you can use similar to $1 $2 in shell
... regexes are about 1000% more useful with this 1 feature
sed can do the same thing, but thought it may be helpful to others in geany, because I use it all the time
... especially when I am dealing with stuff that spans multiple lines (where sed is less effective)
Has anyone tried the parentheses matches?
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#13 Post by greengeek »

technosaurus wrote:Has anyone tried the parentheses matches?
I haven't yet - I'm still working my way through each of the options suggested. I will get through them all eventually.

As I try each one I am also trying to extend the functionality a bit to handle the original "raw" and unedited sms file format so it's going to take me a while...

EDIT Now that I look at your post again I can see that I misinterpreted it at first - I thought you were extending Mochis custom commands post (which I have not fully understood yet) but now I see that your strings go into the "replace" dialog fields.

I just tried it and it works! Although i could do with another tab just before the text field. And I simply cannot figure out how you managed to dump the .2015

The power of these regexes leaves me speechless. I'm finding this stuff really interesting!
Attachments
geany_multiple_regex.jpg
(12.98 KiB) Downloaded 227 times
Last edited by greengeek on Mon 29 Jun 2015, 08:29, edited 1 time in total.

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#14 Post by MochiMoppel »

technosaurus wrote:Has anyone tried the parentheses matches?
Yes. Works nicely, but needs a very good command of regex syntax. Maybe a bit too complex for simple search & replace operations.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#15 Post by greengeek »

01micko wrote:

Code: Select all

#!/bin/ash

while read line;do
	if echo "$line" | grep -q "^Date";then
		dline=${line#*:}
		dline=${dline%% *}
		date=${dline%.*}
		echo -n "$date" >> result
		time=${line##* }
		echo -en "\t $time" >> result
	else
		text=${line#*:}
		echo -e "\t\t $text" >> result
	fi
done < msgs

Code: Select all

#!/bin/ash

while read line;do
   if [ "${line%:*}" = "Date" ];then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      time=${line##* }
      printf "${date}\t${time}\t\t" >> result
   else
      text=${line#*:}
      echo "$text" >> result
   fi
done < msgs 
The first time i tried these they seemed to work ok but now I try them again it seems that they do not process the final text field. No matter how many Date/TEXT lines I add, the last TEXT field is always missing. Is it just me?

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#16 Post by MochiMoppel »

OK, let's throw in another one. VERY primitive, but easy to master. Finds multiple strings, separated by '|' . The '|' means 'or' and the sequence of the search strings is not important. Unfortunately only 1 replace string can be defined, so it's probably most suited for Find & Delete operations.
Attachments
MultipleDeletes.png
(31.06 KiB) Downloaded 356 times

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#17 Post by seaside »

greengeek, you mentioned....
The first time i tried these they seemed to work ok but now I try them again it seems that they do not process the final text field. No matter how many Date/TEXT lines I add, the last TEXT field is always missing. Is it just me?
Probably, there is no blank line at the end of the input file. Try this-

Code: Select all

while read line || [ "$line" ]; do 
Cheers,
s

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#18 Post by greengeek »

MochiMoppel wrote:OK, let's throw in another one. VERY primitive, but easy to master. Finds multiple strings, separated by '|' . The '|' means 'or' and the sequence of the search strings is not important. Unfortunately only 1 replace string can be defined, so it's probably most suited for Find & Delete operations.
Interesting. That will definitely be handy for some of my needs. A question though - how did you end up with a space between the end of the date field and the beginning of the text field in your image above? When I tried this method the time and text fields compressed together without a space. I guess my input file must be slightly different.

User avatar
greengeek
Posts: 5789
Joined: Tue 20 Jul 2010, 09:34
Location: Republic of Novo Zelande

#19 Post by greengeek »

seaside wrote:Probably, there is no blank line at the end of the input file. Try this-

Code: Select all

while read line || [ "$line" ]; do 
Thanks, that did the trick. the two modified micko scripts are now:

Code: Select all

#!/bin/ash

#mickosmscrop1

#01micko's first script to strip the important data out of my preformatted
#sms messages
#Source file is called 'msgs'
#Changed original 'read line' from micko syntax to seaside's syntax to alleviate
#problem with missing last line.
#while read line;do

while read line || [ "$line" ]; do 
   if echo "$line" | grep -q "^Date";then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      echo -n "$date" >> result
      time=${line##* }
      echo -en "\t $time" >> result
   else
      text=${line#*:}
      echo -e "\t\t $text" >> result
   fi
done < msgs 
and

Code: Select all

#!/bin/ash

#mickosmscrop2

#01micko's second script to help strip text from my preformatted sms messages.
#(this one avoids using grep. Otherwise same)
#Source file is called 'msgs'
#Original 'read line' replaced with seaside's syntax to alleviate problem with
#missing last line.
#while read line;do

while read line || [ "$line" ]; do 
   if [ "${line%:*}" = "Date" ];then
      dline=${line#*:}
      dline=${dline%% *}
      date=${dline%.*}
      time=${line##* }
      printf "${date}\t${time}\t\t" >> result
   else
      text=${line#*:}
      echo "$text" >> result
   fi
done < msgs 

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#20 Post by technosaurus »

double post
Last edited by technosaurus on Tue 30 Jun 2015, 05:27, edited 1 time in total.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply