sed scratch pad -- A thread of sed examples

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
step
Posts: 1349
Joined: Fri 04 May 2012, 11:20

#21 Post by step »

Just a reminder to also test a Windows-created HTML file, for which \r\n is the line termination sequence. (I didn't but I remember being scorched about this before).
[url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Fatdog64-810[/url]|[url=http://goo.gl/hqZtiB]+Packages[/url]|[url=http://goo.gl/6dbEzT]Kodi[/url]|[url=http://goo.gl/JQC4Vz]gtkmenuplus[/url]

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

Re: html minifier in sed

#22 Post by s243a »

sc0ttman wrote:I'd love to get this working:

An HTML minifier...

This thing nearly does the job, except that it minifies stuff inside <pre> tags...

I would love love love to fix that!!

Code: Select all

function minify_html {
  # temp fix to IFS, just in case the hmtl files contain spaces
  OLD_IFS=$IFS
  IFS="
  "
  for html_file in $html_files
  do
    :
    # dont minify HTML until we can skip contents of <pre>..</pre>
    #sed ':a;N;$!ba;/<div class="highlight"><pre>\.*<\/pre><\/div>/! s@>\s*<@><@g' $html_file > ${html_file//.html/.minhtml}
    #mv ${html_file//.html/.minhtml} ${html_file}
  done
  IFS=$OLD_IFS
}
I'll think about the general problem more later but for now I notice that in "<pre>\.*<\/pre>", you are escaping the period but I think what you actual want is ""<pre>.*<\/pre>" (notice the period is not "not escaped") because even with "Basic Regular expressions" the period character still has it's special meaning and in this case we want it to have it's specail meaning so we don't want to escape it.
In GNU sed, the only difference between basic and extended regular expressions is in the behavior of a few special characters: ‘?’, ‘+’, parentheses, braces (‘{}’), and ‘|’.

With basic (BRE) syntax, these characters do not have special meaning unless prefixed with a backslash (‘\’); While with extended (ERE) syntax it is reversed: these characters are special unless they are prefixed with backslash (‘\’).
https://www.gnu.org/software/sed/manual ... BRE-vs-ERE

You can test this with something like:
# echo abc | sed 's/a.c//'
BTW, why do we need the "div" tags in the above expression?
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

Re: html minifier in sed

#23 Post by sc0ttman »

s243a wrote:BTW, why do we need the "div" tags in the above expression?
mdsh/pygments generates divs arounds pre tags.. this minifier is for mdsh.
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
recobayu
Posts: 387
Joined: Wed 15 Sep 2010, 22:48
Location: indonesia

#24 Post by recobayu »

MochiMoppel wrote:Looks like another abandoned thread :cry:

I'll give it a try anyway since I don't know where to ask.
The challenge is to remove all comments from a XML/HTML document, using only sed.

Example text:

Code: Select all

<JWM>
	<Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
		<!-- Additional TrayButton attribute: label -->
		<TrayButton label="Menu" icon="logo-mini.png" border="true">root:3</TrayButton>
border="true">exec:urxvt</TrayButton>
		<Pager/>
		<!-- Additional TaskList attribute: maxwidth -->
		<TaskList maxwidth="200"/>
		<Dock/>
		<!-- Additional Swallow attribute: height -->
	<!--	<Swallow name="blinky">
			blinkydelayed -bg "#DCDAD5"
		</Swallow> -->
	<!--	<Swallow name="xtmix-launcher">
			xtmix -launch
		</Swallow> -->
	<!--	<Swallow name="asapm">
			asapmshell -u 4
		</Swallow> -->
	<!--	<Swallow name="freememapplet" width="34">
			freememappletshell
		</Swallow> -->
		<Swallow name="xload" width="32">
			xload -nolabel -bg "#888888" -fg red -hl white
		</Swallow>
		<Clock format="%H:%M">minixcal</Clock>
	</Tray>
</JWM>
The problem is that these comments can be multiline. My rough idea is to let sed move a line to the hold buffer when a '<!--' tag is detected, then continue to fill the hold buffer until a '--> is detexted', load the hold buffer into the pattern space and remove the comment, clear the hold buffer and continue with the next cycle. May not be the right way and I'm not even close to achieve the goal. Does anybody know how to do this?
I use this code, only one line code, but it just delete the <!-- and --> that if it is in different line.

Code: Select all

#sed -e '/<!--/,/-->/d' xml
<JWM>
   <Tray  autohide="false" insert="right" x="0" y="-1" border="1" height="28" >
      <TaskList maxwidth="200"/>
      <Dock/>
      <Swallow name="xload" width="32">
         xload -nolabel -bg "#888888" -fg red -hl white
      </Swallow>
      <Clock format="%H:%M">minixcal</Clock>
   </Tray>
</JWM>

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#25 Post by MochiMoppel »

If it solves the problem it can't be that bad, right? :wink:
But frankly it's not really good: A useless cat, a useless '?' in <!--.*?--> and a strange positioning of the :a label. Still the idea is nice. No pattern space <-> hold space acrobatics, just a clever use of a label. The next suggestion in your link is better;

Code: Select all

sed -r '
/<!--/!b
:a
/-->/! {N;ba}
s/<!--.*-->//
' "$TESTFILE"
sc0ttman wrote:This thing nearly does the job, except that it minifies stuff inside <pre> tags..
Which job? Where do you eliminate comments? And what "stuff inside <pre> tags". Do you want to preserve comments inside pre tags? What for? The browser wouldn't show them anyway, so you might delete them as well. Unless you provide a sample of your input it is hard to tell what you are after.
s243a wrote:I get the same output with:

Code: Select all

...some really ugly code here....
:lol:
jamesbond wrote:Confirmed to work with gnu sed and busybox sed.
That's already a nice achievement. But here is my problem with all suggestions so far: They all assume that a line contains only 1 comment, which is a bold assumption. Surely I take the blame for not providing a better example and I will think of a better one. Generally speaking a XLM document is whitespace agnostic. Linefeeds don't matter and even a huge HTML page can be written as a single line (e.g. Goggle does this). A pattern like <!--.*--> is greedy and would eliminate everything from the first <!-- up to the last --> instead of catching only the next comment termination.
step wrote:Just a reminder to also test a Windows-created HTML file, for which \r\n is the line termination sequence.
Thanks for the reminder. I can imagine that Mac documents are even more fun as sed probably would treat the whole document as a single line :lol:

@recobayu Thanks, but this is just too limited

@all I now cooked my own solutions, which appear to do what I want. I'll share them if they pass my acid tests. Let's see.

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#26 Post by jamesbond »

MochiMoppel wrote:That's already a nice achievement. But here is my problem with all suggestions so far: They all assume that a line contains only 1 comment, which is a bold assumption.
True enough.
Surely I take the blame for not providing a better example and I will think of a better one.
You did say it was for general HTML/XML. I will now take this to mean as __valid__ HTML/XML which does not allow nested comments.

My updated test case:

Code: Select all

<p>1</p>
<!--2-->3<br>
<!--
4
--><b>5</b><!--
-6 -7 --8 -9- <10> <-11-> <u>12</u>
-->13
<!-- 14 -->15<!-- <-16-> -->17
Expected output:

Code: Select all

<p>1</p>
3<br>
<b>5</b>13
1517
Here is my updated take on the challenge. Still works on busybox sed and gnu sed too.

Code: Select all

sed -r -e ':a;N;$!ba;s/<!--([^-]*|[^-]*-[^-]|[^-]*--[^>])*-->//g;' test.html
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#27 Post by sc0ttman »

MochiMoppel wrote:
sc0ttman wrote:This thing nearly does the job, except that it minifies stuff inside <pre> tags..
Which job? Where do you eliminate comments? And what "stuff inside <pre> tags". Do you want to preserve comments inside pre tags? What for? The browser wouldn't show them anyway, so you might delete them as well. Unless you provide a sample of your input it is hard to tell what you are after.
Sorry, I should have been more clear, I'm posting "off topic" .. not even attempting to "remove comments"...

So.. I mean it "does the job" of minifying HTML.. Nothing to do with removing comments... Though it is related (I also want to remove comments at some point), hence me posting here..

So the snippet I posted does the job of minifying HTML, except that is _also_ minifies the contents of <pre> tags... which I don't want...

Carry on ....
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#28 Post by s243a »

sc0ttman wrote:
MochiMoppel wrote:
sc0ttman wrote:This thing nearly does the job, except that it minifies stuff inside <pre> tags..
Which job? Where do you eliminate comments? And what "stuff inside <pre> tags". Do you want to preserve comments inside pre tags? What for? The browser wouldn't show them anyway, so you might delete them as well. Unless you provide a sample of your input it is hard to tell what you are after.
Sorry, I should have been more clear, I'm posting "off topic" .. not even attempting to "remove comments"...

So.. I mean it "does the job" of minifying HTML.. Nothing to do with removing comments... Though it is related (I also want to remove comments at some point), hence me posting here..

So the snippet I posted does the job of minifying HTML, except that is _also_ minifies the contents of <pre> tags... which I don't want...

Carry on ....
Did you try my suggestion above, which was removing the backslash before the ".*" inside the pre tags? If you give us some test input then we can try some tests.
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#29 Post by sc0ttman »

I didn't really try anything - it's not my snippet, and already way beyond anything I know about sed (next to nothing)...

And a valid test case would be any HTML file from mdsh that contains highlighted code like this one: https://sc0ttj.github.io/mdsh/posts/201 ... uages.html
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#30 Post by jamesbond »

Scott, I've read your few posts, and I still don't get it. Perhaps it's good if you can give us a sample input and the expected output, as the "incorrect output" as produced by the currently not-working script, so we can get an idea of what it is that you want to do. As it stands, the current sed script will more or less empties out text in-between html tags - leaving basically a blank page full of tags but no text in between. I'm not sure whether that counts as "minify". (I heard of minifying javascript, but minifying html is news to me ...).
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

s243a
Posts: 2580
Joined: Tue 02 Sep 2014, 04:48
Contact:

#31 Post by s243a »

jamesbond wrote:Scott, I've read your few posts, and I still don't get it. Perhaps it's good if you can give us a sample input and the expected output, as the "incorrect output" as produced by the currently not-working script, so we can get an idea of what it is that you want to do. As it stands, the current sed script will more or less empties out text in-between html tags - leaving basically a blank page full of tags but no text in between. I'm not sure whether that counts as "minify". (I heard of minifying javascript, but minifying html is news to me ...).
To me it looks like it will only empty out the space between tags if the space between tags is whitespace:
'\s'
Matches whitespace characters (spaces and tabs). Newlines embedded
in the pattern/hold spaces will also match:
https://www.gnu.org/software/sed/manual/sed.txt

However, if we are truly trying to minimize the HTML shouldn't we also delete the enclosing tags?
Find me on [url=https://www.minds.com/ns_tidder]minds[/url] and on [url=https://www.pearltrees.com/s243a/puppy-linux/id12399810]pearltrees[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#32 Post by MochiMoppel »

jamesbond wrote:Here is my updated take on the challenge. Still works on busybox sed and gnu sed too.

Code: Select all

sed -r -e ':a;N;$!ba;s/<!--([^-]*|[^-]*-[^-]|[^-]*--[^>])*-->//g;' test.html
Wow! Well done! That's an impressive pattern. Took me a while to digest.

At first it seemed perfect but it choked on these 2 variations of your test cases
Case1:

Code: Select all

<!-- 14 -->15<!-- <-16-> -->17
Expected output:

Code: Select all

1517
Output received:

Code: Select all

<!-- 14 -->15<!-- <-16-> -->17
Case2:

Code: Select all

<p>1</p>
<!----NEW---->
text<!----NEW---->
<!--2-->3<br> 
<!-- 
4 
--><b>5</b><!-- 
-6 -7 --8 -9- <10> <-11-> <u>12</u> 
-->13 
<!-- 14 -->15<!-- <-16-> -->17
Expected output:

Code: Select all

<p>1</p>

text
3<br> 
<b>5</b>13 
1517
Output received:

Code: Select all

<p>1</p>
3<br> 
<b>5</b>13 
1517
My own homebrew may look less sophisticated but so far it passed all tests (well, as expected fails on Mac files. Can be fixed. For the time being let's keep it simple):

Code: Select all

sed $':a;$!{N;ba;};s/<!--/\x1/g;s/-->/\x2/g;s/\x1[^\x1]*\x2//g'  test.html
Now it's your turn to break it :lol:

Burunduk
Posts: 80
Joined: Sun 21 Aug 2011, 21:44

#33 Post by Burunduk »

Yes, this link contains another link: https://catonmat.net/sed-one-liners-explained-part-one -- an interesting article I've never come across before. Thank you.


sc0ttman wrote: I'd love to get this working:

An HTML minifier...

This thing nearly does the job, except that it minifies stuff inside <pre> tags...
If I understand the task correctly, this sed script should remove gaps between tags as well as line feeds except inside (possibly nested) <pre> tags:

Code: Select all

sed ':a;$!{N;ba;};s/@/@a/g;s/\n/@n/g;s/<pre/\n&/g;s/<\/pre>/&\n/g' test.html \
  | sed -r '/(^<pre|<\/pre>$)/!{s/@n//g;s/>\s+</></g;}' \
  | sed ':a;$!{N;ba;};s/\n//g;s/@n/\n/g;s/@a/@/g' >min.html
Three sed commands in a row! I think this code itself needs to be minified.

jamesbond wrote:Here is my updated take on the challenge. Still works on busybox sed and gnu sed too.

Code: Select all

sed -r -e ':a;N;$!ba;s/<!--([^-]*|[^-]*-[^-]|[^-]*--[^>])*-->//g;' test.html
This is clever. It has a problem though and MochiMoppel's test revealed it. The 3rd alternative eats too many hyphens at the end of a comment:

Code: Select all

# echo '<!--remove-me-not--->' | sed -r -e ':a;N;$!ba;s/<!--([^-]*|[^-]*-[^-]|[^-]*--[^>])*-->//g;'
<!--remove-me-not--->
That can be fixed by adding any other character to break the serie of hyphens:

Code: Select all

sed -r -e ':a;$!{N;ba;};s/-->/@&/g;s/<!--(-?[^-]|--[^>])*-->//g;' test.htm


MochiMoppel wrote:

Code: Select all

sed $':a;$!{N;ba;};s/<!--/\x1/g;s/-->/\x2/g;s/\x1[^\x1]*\x2//g'  test.html

Now it's your turn to break it
You know it's unbreakable! :) Maybe just one unlikely-to-appear-in-the-input-file character is enough here:

Code: Select all

sed $':a;$!{N;ba;};s/-->/\x1/g;s/<!--[^\x1]*\x1//g'  test.html

And here is my own attempt (now obviously superfluous):

Code: Select all

sed ':a;$!{N;ba;};:c;/<!--/s/-->/&&/;s/<!--.*-->-->//;tc' test.html

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

#34 Post by jamesbond »

@Mochi/@Burunduk: Thanks for the entertainment and insight. I admit defeat :oops:
Yours are simple, yet effective, and most importantly, clear and easy to understand. Image

(PS: @Mochi, it's easy to break yours - just pepper 0x01 and 0x02 in the html and those will get deleted when they shouldn't; but normal html files __won't__ have these in them so the point is moot - the script works as intended for normal HTML, so as far as normal HTML files are concerned, this is now a solved problem).

---

Next, we probably should tackle Scott's request (all the comments about deleting the whitespace are correct, I missed the "\s" in the script) if you guys still want to play :lol:
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#35 Post by MochiMoppel »

Burunduk wrote:You know it's unbreakable! :)
Yeah, as unbreakable as the windows of Elon Musk's Cybertruck :lol: Don't you worry, I just broke it
Maybe just one unlikely-to-appear-in-the-input-file character is enough here:
You are right when you consider my flawed code. For making the code bullet-proof, as I originally intended, ! need the second character.
And here is my own attempt (now obviously superfluous):
Nothing is superfluous in this world. Thank you for introducing sed's t command. It shows that sed is able to perform while...do loops and helps to fix my code.

Let's raise the bar one notch higher and enclose text that contains a comment with another comment, effectively creating a nested comment. Nested comments may be invalid, but they are a fact of life. It's easy to select and comment out a large portion of XML text and not noticing when this block already contains comments. With the same ease it is possible to delete a portion and forget to include an opening or closing tag, thus creating an orphan.

In case of nested comments my code will leave orphans. With Burunduk's "superfluous" code it will even create an infinite loop and will not work at all.

So here is my attempt to fix all problems and build Cybertruck 2.0: (changes marked)
  • sed -r $':a;$!{N;ba;};s/\\r\\n?/\\n/g;s/<!--/\x1/g;s/-->/\x2/g;:c;s/\x1[^\x1\x2]*\x2//g;tc;s/\x1|\x2//g' test.html
The first change converts any Mac or Win line endings to Unix style.
The second change uses Burunduk's loop idea and peels nested comment onions from inside out.
Lastly any orphans are removed.
That should do it. Unbreakable. Reminds me of my face masks ("Keeps out 99.9% of all viruses").Still leaves me with the chance to catch "only" every 1000th virus.

[EDIT]: This didn't last long. Script fails on Scott's linked page which contains a weird comment in the DOCTYPE section:
  • <!--[if gte IE 11]><!--><html lang="en"><!--<![endif]-->
:shock: What is that? The browser sees a closing tag, my script sees an opening tag. OK, if I change the replacement order my script will behave like a browser and leave only <html lang="en"> uncommented:
  • sed -r $':a;$!{N;ba;};s/\\r\\n?/\\n/g;s/-->/\x2/g;s/<!--/\x1/g;:c;s/\x1[^\x1\x2]*\x2//g;tc;s/\x1|\x2//g' test.htm
jamesbond wrote:@Mochi, it's easy to break yours - just pepper 0x01 and 0x02 in the html
Pretty difficult to create intentionally or accidentally. I would consider this to be a corrupted file, in which case eliminating comments should be the least concern of the user :lol:

HerrBert
Posts: 152
Joined: Thu 03 Nov 2016, 15:11
Location: NRW, Germany

#36 Post by HerrBert »

I did not read the whole thread, but maybe this could be of interest too:
http://sed.sourceforge.net/sed1line.txt

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#37 Post by sc0ttman »

I completely forgot about the IE conditional comments - they shouldn't be removed, or minified ... Should be ignored..

I can't remember any other caveats...

But it does remind me to revisit the conditional comments and go with something a little simpler...

EDIT:

Yep, this seems to work for me (minifies the HTML):

Code: Select all

sed ':a;$!{N;ba;};s/@/@a/g;s/\n/@n/g;s/<pre/\n&/g;s/<\/pre>/&\n/g' test.html \
  | sed -r '/(^<pre|<\/pre>$)/!{s/@n//g;s/>\s+</></g;}' \
  | sed ':a;$!{N;ba;};s/\n//g;s/@n/\n/g;s/@a/@/g' >min.html	
Thanks very much Burunduk

...now onto getting a sed based CSS minifier that can remove multi-line comments, based on the above..

This CSS minifer fails on appended and multi-line comments, and is probably crap in 10 other ways:

Code: Select all

    cat $css_bundle \
      | grep -v '/\*' \
      | tr -d '\n' \
      | sed -e '/\/\*/,/\*\//d' \
            -e 's/  / /g' \
            -e 's/ {/{/g' \
            -e 's/{ /{/g' \
            -e 's/ }/}/g' \
            -e 's/: /:/g' \
            -e 's/; /;/g' > "${css_file//.css/.min.css}"
($css_bundle is a space separated list of valid CSS files)

...I really need to go learn how sed actually works.. :oops:
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#38 Post by MochiMoppel »

sc0ttman wrote:Yep, this seems to work for me (minifies the HTML
Looks wrong to me. Not Burunduk's fault as he probably assumed that you would like all linefeeds removed, except those in <pre> tags, which in well constructed HTML pages would be no problem. In the case of your test page linefeeds are present in <p> tags and must not be removed, otherwise your page produces text like
If you have TCC installed, you can evenembed C code in your Markdown

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#39 Post by sc0ttman »

MochiMoppel wrote:
sc0ttman wrote:Yep, this seems to work for me (minifies the HTML
Looks wrong to me. Not Burunduk's fault as he probably assumed that you would like all linefeeds removed, except those in <pre> tags, which in well constructed HTML pages would be no problem. In the case of your test page linefeeds have also to be preserved within <p> tags (or preferably changed to spaces), otherwise your page produces text like
If you have TCC installed, you can evenembed C code in your Markdown
EDIT: It's prettier.js that removes the trailing spaces from the source Markdown.. I disabled it ..

Fixed.. rebuilt a local version of the page without that little annoyance... Probably
also improves the screen reader experience.

..Anyway, I spotted other issues Burunduks code has yesterday (not stripping whitespace inside <a> tags), but I can live with it as
it is TBH - HTML minification would only mainly be for huge pages (over 500kb of HTML or so) - but
obviously an improved answer would remove newlines outside of pre tags generally.
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#40 Post by MochiMoppel »

sc0ttman wrote:It's prettier.js that removes the trailing spaces from the source Markdown.. I disabled it ..
I have no clue what you are talking about. What trailing spaces?
..Anyway, I spotted other issues Burunduks code has yesterday (not stripping whitespace inside <a> tags), but I can live with it as
You mean the 10 spaces between consecutive <a> tags?

Code: Select all

          <a href="/mdsh/tags/seo.html">seo</a>,
          <a href="/mdsh/tags/shell.html">shell</a>,
          <a href="/mdsh/tags/xml.html">xml</a>,
This looks like garbage and is not removed because Burunduk may have tried to guess your requirements from your first post. Your original script (s@>\s*<@><@g) was designed to remove pure whitespace between tags, i.e. spaces, tabs or linefeeds, no other characters. You said that this is what you want, except that you don't want to apply this to <pre> tags. This is basically what Burunduk delivered.. As soon as you put any other character between tags, even a single comma, nothing is or should be removed. Apart from the funny <a> tag spacings there is more questionable code , e.g. the seemingly useless '<span></span>' combos, that could be removed.

Wouldn't it be much more effective if you clean the HTML code first? With a clean HTML design there will be not much left to do for a minify script.

Post Reply