| Author |
Message |
thunor

Joined: 14 Oct 2010 Posts: 342 Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings
|
Posted: Fri 03 May 2013, 10:13 Post subject:
A sed expression to deal with parsing wikitext [SOLVED] |
|
I've written and am tweaking a wikitext parser using sed and I want to make it as compatible with Creole 1.0 as possible but I'm having problems with //italic//.
I can't find a way to deal with this:
| Code: | | //some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text// |
All I've got is this which italicises at least one char:
| Code: | | sed -e `|//\([^/]\+\)//|<em>\1</em>|g' |
To be honest this is acceptable anyway:
| Code: | | //some text// [[http://www.murga-linux.com/puppy|//Puppy Linux Discussion Forum//]] //some more text// |
but I just wondered if there's a sed wizard about who knows how to deal with "// .* not:// .* //" because it might help me tweak some other stuff. I basically want to not more than one char and I think you can only not single chars.
Regards,
Thunor
Last edited by thunor on Fri 03 May 2013, 17:45; edited 1 time in total
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 834
|
Posted: Fri 03 May 2013, 11:52 Post subject:
|
|
Hey thunor,
I don't have a sed answer, but perhaps a bash solution would do.... | Code: | # line='//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//'
# line=${line/#\/\//<em>} line=${line/%\/\//</em>}
# echo $line
<em>some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text</em> |
Best regards,
s
EDIT: A little experimenting and | Code: | | echo $line |sed 's|^\/\/|<em>|;s|\/\/$|</em>|' | works
|
|
Back to top
|
|
 |
thunor

Joined: 14 Oct 2010 Posts: 342 Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings
|
Posted: Fri 03 May 2013, 17:42 Post subject:
|
|
Thanks seaside but it needs to deal with multiples on the same line which I should've mentioned.
It did get me thinking though about maybe dealing with it before I use sed or after with something like you've done or a case statement and then I thought about temporarily substituting "://" and putting it back afterwards. The conclusion is I managed it in sed using temporary string substitution:
| Code: | echo '//some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//' | sed \
-e 's|://|@COLON_SLASH_SLASH@|g' \
-e 's|//|@SLASH_SLASH@|g' \
-e 's|/|@SLASH@|g' \
-e 's|@SLASH_SLASH@|//|g' \
\
-e 's|//\([^/]\+\)//|<em>\1</em>|g' \
\
-e 's|@SLASH@|/|g' \
-e 's|@COLON_SLASH_SLASH@|://|g' |
Cheers and regards,
Thunor
|
|
Back to top
|
|
 |
sunburnt

Joined: 08 Jun 2005 Posts: 4004 Location: Arizona, U.S.A.
|
Posted: Fri 03 May 2013, 18:39 Post subject:
|
|
thunor; You`re not very clear about what you`re trying to do.
You posted an example input line, can you post what you want it to look like?
Or is this what you want?
| Quote: | Input: //some text [[http://www.murga-linux.com/puppy|Puppy Linux Discussion Forum]] some more text//
Output: //some text// [[http://www.murga-linux.com/puppy|//Puppy Linux Discussion Forum//]] //some more text// |
If so then this does the trick: echo $input |sed 's# \[\[#// \[\[#;s#|#|//#;s#\]\] #//\]\] //#'
You need to escape "\" the "[" and "]" characters as Bash uses them to evaluate expressions: [ -d /root ]&& echo GOOD
### But maybe you`re trying to italicize the "some text" parts?
.
|
|
Back to top
|
|
 |
thunor

Joined: 14 Oct 2010 Posts: 342 Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings
|
Posted: Fri 03 May 2013, 18:51 Post subject:
|
|
| sunburnt wrote: | thunor; You`re not very clear about what you`re trying to do.
You posted an example input line, can you post what you want it to look like?... |
Hi sunburnt
This (I'll give you an example using multiples on the same line which needs to be supported):
| Code: | | //some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text// some non-italicised text //some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text// |
to:
| Code: | | <em>some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text</em> some non-italicised text <em>some italicised text [[http://linux.com/learn|Learn Linux]] some italicised text</em> |
and ultimately once I've processed the wikitext formatted external URLs it'll output as:
some italicised text Learn Linux some italicised text some non-italicised text some italicised text Learn Linux some italicised text
I did solve it by substituting the conflicting slashes with something else and then putting them back afterwards which seems the logical thing to do.
This is just an example of the problem I had. I need to be able to italicise //everything and anything// that appear inside double slashes //multiple times// on the same line.
Regards,
Thunor
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 834
|
Posted: Fri 03 May 2013, 20:26 Post subject:
|
|
Thunor,
I guess this could be done with sed pattern holds and buffer manipulations which I don't really comprehend. Your solution is to the point and much easier to understand (none of those strange char combinations that require a lookup)
Best Regards,
s
(You must be the sed wizard you were looking for )
|
|
Back to top
|
|
 |
technosaurus

Joined: 18 May 2008 Posts: 3843
|
Posted: Fri 03 May 2013, 22:07 Post subject:
|
|
i recommend posting this to stackoverflow if you cant already find the answer there
using awk and assuming they don't span lines (if they can span lines, just set RS="EOF" or something in the BEGIN section)
| Code: | awk '
BEGIN{FS="//"}
{
for(i=1;i<=NF;i++){
print $i
i++
if(i<NF){
print "<em>" $i "</em>"
}
}
}
' |
_________________ Puppy Web Desktop Now with pet packages - Pet Packaging 100 & 101
|
|
Back to top
|
|
 |
sunburnt

Joined: 08 Jun 2005 Posts: 4004 Location: Arizona, U.S.A.
|
Posted: Sat 04 May 2013, 14:12 Post subject:
|
|
That`s essentially what I was going to offer up,
A Bash loop to handle the <em></em> tag pairs and ignore "http://".
techysaurus is always spot on for the most succinct script code...
|
|
Back to top
|
|
 |
technosaurus

Joined: 18 May 2008 Posts: 3843
|
Posted: Sat 04 May 2013, 15:18 Post subject:
|
|
| sunburnt wrote: | | and ignore "http://" | for that you'd need something before the i++ like: | Code: | | if(substr($i,length($i),1)==":"){printf "//";continue} |
_________________ Puppy Web Desktop Now with pet packages - Pet Packaging 100 & 101
|
|
Back to top
|
|
 |
seaside
Joined: 11 Apr 2007 Posts: 834
|
Posted: Sat 04 May 2013, 20:18 Post subject:
|
|
technosaurus,
I tried to get this part - | Code: | | if(substr($i,length($i),1)==":"){printf "//";continue} | to work and couldn't. So here's a crossover "Thunor-@colon_slash_slash@" awk version.
| Code: | awk '
BEGIN{FS="//"}
{gsub("://","@colon_slash_slash@")}
{
for(i=1;i<=NF;i++){
i++
if(i<NF){
sub("@colon_slash_slash@","://",$i)
print "<em>" $i "</em>"
}
}
}
' |
No speed difference between sed and awk versions.
Best regards,
s
(Hmmm..."@colon_slash_slash@" sounds more like a colonoscopy, only more comfortable in code than in person)
|
|
Back to top
|
|
 |
technosaurus

Joined: 18 May 2008 Posts: 3843
|
Posted: Sat 04 May 2013, 22:14 Post subject:
|
|
damn,... I was trying to do it in my head again without running the code - wasn't 100% sure continue was supported the way it is in shell ... anyhow consider it pseudo code
| Quote: | | @colon_slash_slash@" sounds more like a colonoscopy | reminds me of a scene in the movie Seven
_________________ Puppy Web Desktop Now with pet packages - Pet Packaging 100 & 101
|
|
Back to top
|
|
 |
|