speeding up scripts

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#41 Post by jpeps »

thunor wrote:I've got some fast portable shell scripting tips that I'd like to note.

Code: Select all

## With bash you can read a file speedily like this:

echo `<file`

## This you might think is the ash/dash solution:

cat file

## But I've actually found this to be the fastest for ash, bash and dash:

read -r input < file
echo $input
That prints only the first line ??

echo `<file` doesn't preserve the line feed. "cat file" is faster (on my computer)

User avatar
thunor
Posts: 350
Joined: Thu 14 Oct 2010, 15:24
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings
Contact:

#42 Post by thunor »

jpeps wrote:That prints only the first line ??

echo `<file` doesn't preserve the line feed. "cat file" is faster (on my computer)
Yeah, I should've mentioned that I was only interested in the first line such as when maintaining global variables as files :)

Getting the directory name from a path:

Code: Select all

## This is what you might be tempted to do:

fullpath="/some/path/to some file.txt"
path="`dirname "$fullpath"`"

## But this is portable although you'll need to check that it returns at least root:

fullpath="/some/path/to some file.txt"
path="${fullpath%/*}"
if [ -z "$path" ]; then path="/"; fi
Regards,
Thunor
Last edited by thunor on Sun 04 Dec 2011, 23:51, edited 1 time in total.

User avatar
thunor
Posts: 350
Joined: Thu 14 Oct 2010, 15:24
Location: Minas Tirith, in the Pelennor Fields fighting the Easterlings
Contact:

#43 Post by thunor »

Checking the first character of a string:

Code: Select all

## You can get the first character using bash like this:

char="${path:0:1}"

## Otherwise you can use this (is there a better way?):

char="`echo $path | cut -c 1`"

## But if you know what you're looking for e.g. making sure that a string has an initial forward slash then you can do this:

case "$path" in
	/*) true ;;
	*) path="/$path" ;;
esac

## It's also useful if you're reading an rcfile and you want to filter out lines starting with a '#'.
Regards,
Thunor

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#44 Post by technosaurus »

thunor wrote:Checking the first character of a string:

Code: Select all

## You can get the first character using bash like this:

char="${path:0:1}"

## Otherwise you can use this (is there a better way?):

char="`echo $path | cut -c 1`"

## But if you know what you're looking for e.g. making sure that a string has an initial forward slash then you can do this:

case "$path" in
	/*) true ;;
	*) path="/$path" ;;
esac

## It's also useful if you're reading an rcfile and you want to filter out lines starting with a '#'.
Regards,
Thunor
thunor wrote:
jpeps wrote:That prints only the first line ??

echo `<file` doesn't preserve the line feed. "cat file" is faster (on my computer)
Yeah, I should've mentioned that I was only interested in the first line such as when maintaining global variables as files :)

Getting the directory name from a path:

Code: Select all

## This is what you might be tempted to do:

fullpath="/some/path/to some file.txt"
path="`dirname "$fullpath"`"

## But this is portable although you'll need to check that it returns at least root:

fullpath="/some/path/to some file.txt"
path="${fullpath%/*}"
if [ -z "$path" ]; then path="/"; fi
Regards,
Thunor
That is what the read builtin is for ... it reads one line into a variable. You can read multiple lines using a while loop or a specific number of characters using -n ... combine this with a case statement and some substring manipulation and you can replace many (slow) calls to external commands like awk, grep, sed, tr and others
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#45 Post by MochiMoppel »

technosaurus wrote:That is what the read builtin is for
In case your "That" refers to thunor's "It's also useful if you're reading an rcfile and you want to filter out lines starting with a '#'.":
Builtins are not necessarily fast. Read loops are slow. Using an external command can be much faster.

Eliminating lines starting with '#'

Code: Select all

while IFS=  read line; do
  case "$line" in
    [^#]*|"") echo "$line";;
  esac
done <  /etc/rc.d/rc.sysinit  
Time:
real 0m0.226s
user 0m0.147s
sys 0m0.030s

Code: Select all

sed '/^#.*/d'  /etc/rc.d/rc.sysinit
Time:
real 0m0.046s
user 0m0.007s
sys 0m0.020s

Even when reading smaller files sed tends to be faster.

wiak
Posts: 2040
Joined: Tue 11 Dec 2007, 05:12
Location: not Bulgaria

#46 Post by wiak »

Yes, sed can produce some amazingly efficient fast results. So much code with multiple while read loops and pipes via cut and grep which a single sed command can do (via multiple commands to the single called sed instance). There are also many instances of multiple sed calls being then re-piped into sed again and sometimes again, when a single sed call (with multiple sed command instructions can do).

Part of the problem is that many just know simple basics of sed (mainly just sed 's\\\;'. It can do a lot more than that and all on the one line... but a bit of a learning curve understanding all the things it can do and how to do them altogether in one sed call instance.

wiak

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#47 Post by sc0ttman »

i only know what you guys have shared, but this seems legit:

David Butcher: Speeding Up Your UNIX Shell Scripts
http://www.los-gatos.ca.us/davidbu/faster_sh.html

And i am guilty of piping sed to sed, among a thousand other shell-related sins..
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

wiak
Posts: 2040
Joined: Tue 11 Dec 2007, 05:12
Location: not Bulgaria

#48 Post by wiak »

sc0ttman wrote: And i am guilty of piping sed to sed, among a thousand other shell-related sins..
Yes, me too, but doesn't matter anyway as long as not in some long loop type situation - sometimes/often it's more important the code is easy to read. Nice link about shell code speed up by the way.

I'm pretty sure jlst is constantly tuning woof-CE code to gradually speed it up, but there is a lot of code to work though - plenty of speed up possible I'm sure.

wiak

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#49 Post by sc0ttman »

Yeah, that link it quite good.. Some stuff I never even considered..

I like the idea of replacing

Code: Select all

[ "$var" = 'foo' -a "$var2" = 'bar' ] && echo blah
with

Code: Select all

case "$var1$var2" in
  foobar) echo blah ;;
esac
Although readability is not great (imho)..
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#50 Post by sc0ttman »

Is this

Code: Select all

if "$var" = 'foo'; then 

faster or (different at all) than this:

Code: Select all

if [ "$var" = 'foo']; then 
?
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#51 Post by MochiMoppel »

sc0ttman wrote:Is this

Code: Select all

if "$var" = 'foo'; then 
faster or (different at all) than this:

Code: Select all

if [ "$var" = 'foo']; then 
?
Both are incorrect. You mean which of the error messages you will receive is faster?

User avatar
sc0ttman
Posts: 2812
Joined: Wed 16 Sep 2009, 05:44
Location: UK

#52 Post by sc0ttman »

MochiMoppel wrote:
sc0ttman wrote:Is this

Code: Select all

if "$var" = 'foo'; then 
faster or (different at all) than this:

Code: Select all

if [ "$var" = 'foo']; then 
?
Both are incorrect. You mean which of the error messages you will receive is faster?
lol yeah fine, add the fixes mentally :roll: I think u know what I mean..
[b][url=https://bit.ly/2KjtxoD]Pkg[/url], [url=https://bit.ly/2U6dzxV]mdsh[/url], [url=https://bit.ly/2G49OE8]Woofy[/url], [url=http://goo.gl/bzBU1]Akita[/url], [url=http://goo.gl/SO5ug]VLC-GTK[/url], [url=https://tiny.cc/c2hnfz]Search[/url][/b]

User avatar
MochiMoppel
Posts: 2084
Joined: Wed 26 Jan 2011, 09:06
Location: Japan

#53 Post by MochiMoppel »

if test "$var" = 'foo' ?
if [[ "$var" = 'foo' ]] ?
Wait ... where is my crystal ball?

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#54 Post by musher0 »

Hi scotman.

In the 1st issue of the Puppy Linux Newsletter, January of this year, I have
explained a number of tricks I have used to very rapidly (+/- 1 second)
create a +/- 16Kb wmx menu.

It boils down to :
-- use ash instead of bash if appropriate (ash does not have all of bash's
string manipulation capacities; bash may be faster than ash in such cases.)

-- use internal bash or ash commands as much as possible

-- take advantage of bash's fantastic string manipulation capacities: they
are generally faster than using awk for the same result


-- sort lists and items that need sorting before processing them.
Remember that somewhere in a computer, the alphabetical and numerical
orders are still built-in. It is not just an historical thing. For example, if you
don't respect alphabetical-order processing when you can, expect to lose
precious time.

-- use the case...esac conditions structure whenever possible

-- use LC_ALL=C before any important non-linguistic processing and
know where and when to release it with LC_ALL="" -- at the proper place
in the processing
. Otherwise, you'll get junk results.

(By "non-linguistic processing", I mean any processing not based on
human language.)

With LC_ALL=C, the LANG variable remains untouched. You're only
suspending it for the time being.

LC_ALL=C suspends utf-8 and makes the utf-8 bites available for general
processing, not just for language. So this multiplies the speed of your script
by a factor of 2 to 4. "Your script gets the whole boulevard to itself," in a
manner of speaking.

This will make a bash script approach C processing speed. But you have to
know when to release it, especially if you want to integrate human
languages other than bare-bones English. (No special characters
allowed.)

-- time the loops for speed. A < while read line;do > may be faster or slower
than a < for i bla ble bli;do > loop, depending on the material.

-- avoid writing to disk as much as possible. Prefer string manipulation. If
you need to write to disk, write in the same directory as the script. You'll
save only a millisecond, but they add up when using a loop.

-- use the most efficient logic for the problem at hand. You may gain speed
by changing the order of the "processing steps". This you learn by what I
call "living with the problem" for a little -- or a long! -- while.

Finally, please note that the above are "working notes" derived from my
personal experience. I know they work, but I'm just a "ground-hog" ;) :
some "eagle" with general perspective will have to provide the theory of it.

IHTH.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#55 Post by technosaurus »

MochiMoppel wrote:Even when reading smaller files sed tends to be faster.
Not if you are running sed from inside a script, using a non-bash shell with LANG=C
If memory serves me, you have made several localization contributions, so I am guessing you bothered to set $LANG and probably have /bin/sh as a link to bash since it minimizes problems with all the bashisms that riddle Puppy scripts.

I did a bunch of testing to figure out when to use read loops vs. sed, grep and awk and IIRC on average it came out to just short of 100 lines on average (depending on the shell)

FWIW, Those timing values are quite suspect because of all the echoes, considering I can process all the desktop files in /usr/share/applications/ to generate my jwmrc file in about the same amount of time... but then again, I build a large string and print it once to a file instead of doing it one echo at a time to stdout ... I guess that's another tip ... printing to console/terminal is slow, so batch them up if possible

Edit - for clarification
ex: instead of having echo "$line" inside a while loop, use

Code: Select all

OUTPUT="$OUTPUT
$line" #note: some shells have a faster string concatenation operator
and then after the loop is done just echo "$OUTPUT"

This is because every write to stdout/tty/etc... takes a (variably) long time, so only output when necessary and do as much of it in one go as possible.
I could explain why this is but it gets a bit off topic (filesystems, kernel syscalls and the C interface)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply