Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Tue 21 Oct 2014, 23:30
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
A sed expression to deal with parsing wikitext [SOLVED]
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
Page 1 of 1 Posts_count  
Author Message
thunor


Joined: 14 Oct 2010
Posts: 350
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings

PostPosted: Fri 03 May 2013, 10:13    Post_subject:  A sed expression to deal with parsing wikitext [SOLVED]  

I've written and am tweaking a wikitext parser using sed and I want to make it as compatible with Creole 1.0 as possible but I'm having problems with //italic//.

I can't find a way to deal with this:
Code:
//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//


All I've got is this which italicises at least one char:
Code:
sed -e `|//\([^/]\+\)//|<em>\1</em>|g'


To be honest this is acceptable anyway:
Code:
//some text// [[http://www.murga-linux.com/puppy|//Puppy Linux Discussion Forum//]] //some more text//


but I just wondered if there's a sed wizard about who knows how to deal with "// .* not:// .* //" because it might help me tweak some other stuff. I basically want to not more than one char and I think you can only not single chars.

Regards,
Thunor

Edited_time_total
Back to top
View user's profile Send_private_message Visit_website 
seaside

Joined: 11 Apr 2007
Posts: 887

PostPosted: Fri 03 May 2013, 11:52    Post_subject:  

Hey thunor,

I don't have a sed answer, but perhaps a bash solution would do....
Code:
 # line='//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//'
# line=${line/#\/\//<em>}  line=${line/%\/\//</em>}
# echo $line
<em>some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text</em>


Best regards,
s
EDIT: A little experimenting and
Code:
 echo $line |sed 's|^\/\/|<em>|;s|\/\/$|</em>|' 
works Smile
Back to top
View user's profile Send_private_message 
thunor


Joined: 14 Oct 2010
Posts: 350
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings

PostPosted: Fri 03 May 2013, 17:42    Post_subject:  

Thanks seaside but it needs to deal with multiples on the same line which I should've mentioned.

It did get me thinking though about maybe dealing with it before I use sed or after with something like you've done or a case statement and then I thought about temporarily substituting "://" and putting it back afterwards. The conclusion is I managed it in sed using temporary string substitution:
Code:
echo '//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//' | sed \
   -e 's|://|@COLON_SLASH_SLASH@|g' \
   -e 's|//|@SLASH_SLASH@|g' \
   -e 's|/|@SLASH@|g' \
   -e 's|@SLASH_SLASH@|//|g' \
\
   -e 's|//\([^/]\+\)//|<em>\1</em>|g' \
\
   -e 's|@SLASH@|/|g' \
   -e 's|@COLON_SLASH_SLASH@|://|g'

Cheers and regards,
Thunor
Back to top
View user's profile Send_private_message Visit_website 
sunburnt


Joined: 08 Jun 2005
Posts: 5037
Location: Arizona, U.S.A.

PostPosted: Fri 03 May 2013, 18:39    Post_subject:  

thunor; You`re not very clear about what you`re trying to do.

You posted an example input line, can you post what you want it to look like?

Or is this what you want?
Quote:
Input: //some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//

Output: //some text// [[http://www.murga-linux.com/puppy|//Puppy Linux Discussion Forum//]] //some more text//

If so then this does the trick: echo $input |sed 's# \[\[#// \[\[#;s#|#|//#;s#\]\] #//\]\] //#'
You need to escape "\" the "[" and "]" characters as Bash uses them to evaluate expressions: [ -d /root ]&& echo GOOD

### But maybe you`re trying to italicize the "some text" parts?
.
Back to top
View user's profile Send_private_message 
thunor


Joined: 14 Oct 2010
Posts: 350
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings

PostPosted: Fri 03 May 2013, 18:51    Post_subject:  

sunburnt wrote:
thunor; You`re not very clear about what you`re trying to do.

You posted an example input line, can you post what you want it to look like?...

Hi sunburnt

This (I'll give you an example using multiples on the same line which needs to be supported):
Code:
//some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text// some non-italicised text //some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text//

to:
Code:
<em>some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text</em> some non-italicised text <em>some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text</em>

and ultimately once I've processed the wikitext formatted external URLs it'll output as:

some italicised text Learn Linux some italicised text some non-italicised text some italicised text Learn Linux some italicised text

I did solve it by substituting the conflicting slashes with something else and then putting them back afterwards which seems the logical thing to do.

This is just an example of the problem I had. I need to be able to italicise //everything and anything// that appear inside double slashes //multiple times// on the same line.

Regards,
Thunor
Back to top
View user's profile Send_private_message Visit_website 
seaside

Joined: 11 Apr 2007
Posts: 887

PostPosted: Fri 03 May 2013, 20:26    Post_subject:  

Thunor,

I guess this could be done with sed pattern holds and buffer manipulations which I don't really comprehend. Your solution is to the point and much easier to understand (none of those strange char combinations that require a lookup) Smile

Best Regards,
s
(You must be the sed wizard you were looking for Smile )
Back to top
View user's profile Send_private_message 
technosaurus


Joined: 18 May 2008
Posts: 4353

PostPosted: Fri 03 May 2013, 22:07    Post_subject:  

i recommend posting this to stackoverflow if you cant already find the answer there

using awk and assuming they don't span lines (if they can span lines, just set RS="EOF" or something in the BEGIN section)

Code:
awk '
BEGIN{FS="//"}
{
for(i=1;i<=NF;i++){
    print $i
    i++
    if(i<NF){
        print "<em>" $i "</em>"
    }
}
}
'

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send_private_message 
sunburnt


Joined: 08 Jun 2005
Posts: 5037
Location: Arizona, U.S.A.

PostPosted: Sat 04 May 2013, 14:12    Post_subject:  

That`s essentially what I was going to offer up,

A Bash loop to handle the <em></em> tag pairs and ignore "http://".

techysaurus Wink is always spot on for the most succinct script code...
Back to top
View user's profile Send_private_message 
technosaurus


Joined: 18 May 2008
Posts: 4353

PostPosted: Sat 04 May 2013, 15:18    Post_subject:  

sunburnt wrote:
and ignore "http://"
for that you'd need something before the i++ like:
Code:
if(substr($i,length($i),1)==":"){printf "//";continue}

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send_private_message 
seaside

Joined: 11 Apr 2007
Posts: 887

PostPosted: Sat 04 May 2013, 20:18    Post_subject:  

technosaurus,

I tried to get this part -
Code:
if(substr($i,length($i),1)==":"){printf "//";continue}
to work and couldn't. So here's a crossover "Thunor-@colon_slash_slash@" awk version.
Code:
awk '
BEGIN{FS="//"}

{gsub("://","@colon_slash_slash@")}

{
for(i=1;i<=NF;i++){
 
    i++
    if(i<NF){
        sub("@colon_slash_slash@","://",$i)
        print "<em>" $i "</em>"
    }
}
}
'


No speed difference between sed and awk versions.

Best regards,
s
(Hmmm..."@colon_slash_slash@" sounds more like a colonoscopy, only more comfortable in code than in person) Smile
Back to top
View user's profile Send_private_message 
technosaurus


Joined: 18 May 2008
Posts: 4353

PostPosted: Sat 04 May 2013, 22:14    Post_subject:  

damn,... I was trying to do it in my head again without running the code - wasn't 100% sure continue was supported the way it is in shell ... anyhow consider it pseudo code
Quote:
@colon_slash_slash@" sounds more like a colonoscopy
reminds me of a scene in the movie Seven
_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send_private_message 
Display_posts:   Sort by:   
Page 1 of 1 Posts_count  
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
 Forum index » Off-Topic Area » Programming
Jump to:  

Rules_post_cannot
Rules_reply_cannot
Rules_edit_cannot
Rules_delete_cannot
Rules_vote_cannot
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0821s ][ Queries: 12 (0.0042s) ][ GZIP on ]