Best & Fastest Compression Method

Using applications, configuring, problems
Post Reply
Message
Author
p310don
Posts: 1492
Joined: Tue 19 May 2009, 23:11
Location: Brisbane, Australia

Best & Fastest Compression Method

#1 Post by p310don »

I use a script to back up about 10000 files into a tar.gz file. The original files total around 9gig, and the tar.gz file is 2gig. I can then copy that to a USB stick or google drive for backup.

I've been doing this for a few years and it works perfectly. But then I decided to play around a bit.

I tested 7-zip the other day. Wow, MUCH better compression. Instead of 2 gig, the resultant file was 700mb. BUT, the time taken to do the compression is just over 1 hours, vs 11 minutes with tar.gz. AND, the 7zip uses both cores of my PC, whilst gz uses one. Using both cores speeds things up (apparently) but makes the PC virtually unusable whilst it is working.

The unicorn I am looking for is a compression tool that is as fast as gz and gives high compression like 7zip. Am I just looking for fairytales, or does it exist?

User avatar
Burn_IT
Posts: 3650
Joined: Sat 12 Aug 2006, 19:25
Location: Tamworth UK

#2 Post by Burn_IT »

You have a choice:
Time
Size
You cannot have both.

I suspect
The fast method just removes spaces.
The slow method will search for recurring code snippets and replace with pointers to a single occurrence as well.
"Just think of it as leaving early to avoid the rush" - T Pratchett

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#3 Post by musher0 »

Hi p310don.

Maybe try lrzip? I get very good results with it.

Only inconvenience is that it compresses single files only. E.g., If you wish
to compress a directory, use < tar > or < zip -0 > (without the chevrons, of
course) on the dir first, and then apply lrzip to that tar or zip file.

You can probably get lrzip and install it through the PPM. If not, tell me
which Pup you are using, and I'll try to compile it for you.

IHTH.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

jamesbond
Posts: 3433
Joined: Mon 26 Feb 2007, 05:02
Location: The Blue Marble

Re: Best & Fastest Compression Method

#4 Post by jamesbond »

p310don wrote:I tested 7-zip the other day. Wow, MUCH better compression. Instead of 2 gig, the resultant file was 700mb. BUT, the time taken to do the compression is just over 1 hours, vs 11 minutes with tar.gz.
Indeed.
AND, the 7zip uses both cores of my PC, whilst gz uses one. Using both cores speeds things up (apparently) but makes the PC virtually unusable whilst it is working.
There is a flag to set the max number of CPU used. But you can imagine, if 2 cores takes 1 hour, 1 core will take even longer.
The unicorn I am looking for is a compression tool that is as fast as gz and gives high compression like 7zip. Am I just looking for fairytales, or does it exist?
There is a lot of spectrum between
- fast but large file
- slow but small file
In another words you can always sacrifice speed for size, or vice versa. (Actually there are more aspects than just these two parameters, but I am simplifying).

But whoever finds the "fast and small" will become instant millionaire over night. Literally. (Example: Bandwidth cost money. Youtube dumps gobs of gigabyte every second. Even to 10% savings means 10% cost savings of youtube; which they're willing to pay you if you can save their bandwidth cost; while still able to deliver video at speed required by streaming).
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]

p310don
Posts: 1492
Joined: Tue 19 May 2009, 23:11
Location: Brisbane, Australia

#5 Post by p310don »

Burn_IT wrote
You have a choice:
Time
Size
You cannot have both.
Quote from lrzip's readme file courtesy of musher0
You can either choose to optimise for speed (fast
compression / decompression) or size, but not both.
Seems that it is probably not worth the bother.

All that said, I am fairly certain I'll be investing in a newer PC fairly soon for work, so compression speeds will increase simply as a function of more horsepower..

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#6 Post by musher0 »

Hi p310don,

If you're in a bind (e.g. no money for a faster box ATM), you could use

Code: Select all

nice --some parameters lrzip --some parameters /path/to/big-file-to-compress
Type < man nice > in console to know more about configuring < nice >.

With < nice>, you could make < lrzip > use less of your computer's
resources. The compression process will take longer (hello paradox! ;)),
but you will be able to keep doing other tasks on your box without being
slowed down. Ergo, an overall time savings for you as user.

I'm talking about the lrzip compressor above, but one can apply this
< nice > technique to any resource-hungry executable.

Just a thought.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

p310don
Posts: 1492
Joined: Tue 19 May 2009, 23:11
Location: Brisbane, Australia

#7 Post by p310don »

So I have had a bit of a play with Packit in Xenial64 7.5 with it's various options.

The testing was rough, but gives me some ideas.

I started with a directory of 2788MB size, containing approximately 5000 individual files, mostly various types of text file.

Using Packit I used tar as the first pass and varied the second pass option and recorded the results.

7z compression took 12 minutes and resulted in a file of 299MB

gz compression took 90 seconds and resulted in a file of 564MB

zip compression took 1min 15 secs and resulted in a file of 566MB

compressing the gz file from earlier to 7z took 2 minues and resulted in a 544MB file

7z with fast compression took 2 minutes and resulted in a file of 445MB

gz with best compression took 6 minutes and resulted in a file of 544MB

gz with fast compression took 45 seconds and resulted in a file of 651MB

I haven't come up with a conclusion that I like really. Probably 7z with fast is the best option for me. 7z has the disadvantage of really working the PC so that nothing else works so well. GZ has the advantage of being very light on resources, but not so good with file size.

With the purchase of a new PC for work, I might consider the option of splitting the workload between two machines. There are 10 directories that are backed up, if I do 5 with one machine and 5 with the other and the tar the two resultant compressed files into one file for back up that might also save some time.

User avatar
Flash
Official Dog Handler
Posts: 13071
Joined: Wed 04 May 2005, 16:04
Location: Arizona USA

#8 Post by Flash »

P310don, I'm just curious, have you've ever reconstituted one of the files you've condensed? (I think 'condensed' is a more accurate description than 'compressed' because you're removing redundant stuff, like condensing soup. Compressing would leave everything there and somehow cram it into a smaller space.) If you have, were there any mistakes?

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#9 Post by musher0 »

Hi Flash.

I hope I am getting your meaning.

Is it necessary to reconstitute the file or directory to check it? Archivers
usually have a test setting to check the integrity.
E.g.:

Code: Select all

zip -T archive.zip

Code: Select all

lzop -t archive.lzp

Code: Select all

lz4 -t archive.lz4

Code: Select all

lrzip -t archive.lrz
Example:
lrzip -t zdrv_xenial_7.0.6.sfs.lrz
Decompressing...
100% 30.99 / 30.99 MB
Average DeCompression Speed: 30.000MB/s
[OK] - 32493568 bytes
Total time: 00:00:00.34
If the compression has errors, the user does not get that "OK" mention.
The thing is, the user has to get into the habit of checking the archive
just after it's been created to avoid bad surprises later.

Best regards.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

p310don
Posts: 1492
Joined: Tue 19 May 2009, 23:11
Location: Brisbane, Australia

#10 Post by p310don »

Flash asked
P310don, I'm just curious, have you've ever reconstituted one of the files you've condensed?
Yes.

I have a paranoia about data. This is only one of my backup routines. Every day I copy the resultant tar.gz file to a USB flash drive. The copy routine renames yesterday's to backup1, and that one to backup2 etc up to backup9, just in case there is an issue with the compression or condensation on one of the days. Then I have ten days worth of data to choose from if one is bad.

The uncompression / reconstitution process is the second part of the process besides the compression part. The other day when I first created a 7z file I thought for sure it wasn't correct, because it was so much smaller, so I uncompressed it. It was fine.

I found that extracting a 7z file seems to be much faster than the gz equivalent for some reason.

Normally I shouldn't need to worry about extraction unless something has gone (horribly) wrong. Every few months I do it though, just to see that it is working.

mostly_lurking
Posts: 328
Joined: Wed 25 Jun 2014, 20:31

#11 Post by mostly_lurking »

You could try xz compression instead of gz, which may result in a smaller size (similar to 7z size when I tried it), but it will also take longer to process. When compressing files with pupzip (xarchive), the compression type to use is determined by the file extension, so give the archive file a .tar.xz extension instead of .tar.gz when you create it.
Flash wrote:I'm just curious, have you've ever reconstituted one of the files you've condensed?... If you have, were there any mistakes?
You were probably concerned about the file size and the fact that one tiny error could mean the loss of the whole backup, but that question is also interesting in a different context, especially when it comes to using Windows- or ported-from-Windows programs and archive formats, like 7-zip. Particularly, can they handle Linux-typical things like symbolic links and file ownerships? If you just want to backup some ordinary files, you'll probably be fine using whatever program you want, but lost symlinks or file metadata could cause problems if, for example, you wanted to zip up a program folder that contains them.

In cases where this is a concern, I'd recommend creating a small folder containing things like symlinks and files with changed permissions/ownerships to test the archive program and format you want to use - create an archive from that folder, then extract it again to check if everything has been preserved.

For example, I tested my Wary Puppy's xarchive program with an archive created from a folder that contained a symlink, some files owned by user spot instead of root, and a file with unusual permissions (rw-rw-rw-). With the .tar.gz, and .tar.xz formats, all of these things were preserved, while .zip and .rar archives ended up with lost file ownerships, and a duplicate of the linked file instead of the symlink. With .7z (using p7zip through xarchive), the symlink was intact, but ownerships and permissions were gone. The Windows version of 7-zip, which I'm using as well because it has a decent user interface and can extract a large range of archive types, unsurprisingly can't handle any of these things.

musher0
Posts: 14629
Joined: Mon 05 Jan 2009, 00:54
Location: Gatineau (Qc), Canada

#12 Post by musher0 »

Hello, everyone.

About preserving symlinks: with zip, use the -y setting.
E.g. for a directory with some symlinked files in it:
zip -9ry archive.zip directory/*
Explanation:
-9 -- maximum compression
-r -- recurse in directory (i.e. archive contents of the directory)
-y -- archive the symlinks as symlinks.

Actually, with zip, if you do not provide the -y setting, it will archive the
file itself, so you will not lose anything, but the archive will be bigger.

In the case of the other compressors, I believe the Linux compressors gz
and xz archive the symlinks as symlinks automatically, but I am not sure.

I know 7z cannot do it, however.

IHTH
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)

mostly_lurking
Posts: 328
Joined: Wed 25 Jun 2014, 20:31

#13 Post by mostly_lurking »

musher0 wrote:About preserving symlinks: with zip, use the -y setting.
Good to know, thanks.

To see if I could use that setting with xarchive, I changed the options in its zip wrapper - /usr/local/lib/xarchive/wrappers/zip-wrap.sh, around line 60 - from this:

Code: Select all

# for zip, use recursive when adding files
NEW_OPTS="-r"
ADD_OPTS="-g -r"
REMOVE_OPTS="-d"
to this:

Code: Select all

# for zip, use recursive when adding files
NEW_OPTS="-r -y"
ADD_OPTS="-g -r -y"
REMOVE_OPTS="-d -y"
Of course, this would cause problems if one wanted to use such a .zip file on Windows, and I guess that's why preserving links is not the default option.

Post Reply