Best & Fastest Compression Method
Best & Fastest Compression Method
I use a script to back up about 10000 files into a tar.gz file. The original files total around 9gig, and the tar.gz file is 2gig. I can then copy that to a USB stick or google drive for backup.
I've been doing this for a few years and it works perfectly. But then I decided to play around a bit.
I tested 7-zip the other day. Wow, MUCH better compression. Instead of 2 gig, the resultant file was 700mb. BUT, the time taken to do the compression is just over 1 hours, vs 11 minutes with tar.gz. AND, the 7zip uses both cores of my PC, whilst gz uses one. Using both cores speeds things up (apparently) but makes the PC virtually unusable whilst it is working.
The unicorn I am looking for is a compression tool that is as fast as gz and gives high compression like 7zip. Am I just looking for fairytales, or does it exist?
I've been doing this for a few years and it works perfectly. But then I decided to play around a bit.
I tested 7-zip the other day. Wow, MUCH better compression. Instead of 2 gig, the resultant file was 700mb. BUT, the time taken to do the compression is just over 1 hours, vs 11 minutes with tar.gz. AND, the 7zip uses both cores of my PC, whilst gz uses one. Using both cores speeds things up (apparently) but makes the PC virtually unusable whilst it is working.
The unicorn I am looking for is a compression tool that is as fast as gz and gives high compression like 7zip. Am I just looking for fairytales, or does it exist?
Hi p310don.
Maybe try lrzip? I get very good results with it.
Only inconvenience is that it compresses single files only. E.g., If you wish
to compress a directory, use < tar > or < zip -0 > (without the chevrons, of
course) on the dir first, and then apply lrzip to that tar or zip file.
You can probably get lrzip and install it through the PPM. If not, tell me
which Pup you are using, and I'll try to compile it for you.
IHTH.
Maybe try lrzip? I get very good results with it.
Only inconvenience is that it compresses single files only. E.g., If you wish
to compress a directory, use < tar > or < zip -0 > (without the chevrons, of
course) on the dir first, and then apply lrzip to that tar or zip file.
You can probably get lrzip and install it through the PPM. If not, tell me
which Pup you are using, and I'll try to compile it for you.
IHTH.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
Re: Best & Fastest Compression Method
Indeed.p310don wrote:I tested 7-zip the other day. Wow, MUCH better compression. Instead of 2 gig, the resultant file was 700mb. BUT, the time taken to do the compression is just over 1 hours, vs 11 minutes with tar.gz.
There is a flag to set the max number of CPU used. But you can imagine, if 2 cores takes 1 hour, 1 core will take even longer.AND, the 7zip uses both cores of my PC, whilst gz uses one. Using both cores speeds things up (apparently) but makes the PC virtually unusable whilst it is working.
There is a lot of spectrum betweenThe unicorn I am looking for is a compression tool that is as fast as gz and gives high compression like 7zip. Am I just looking for fairytales, or does it exist?
- fast but large file
- slow but small file
In another words you can always sacrifice speed for size, or vice versa. (Actually there are more aspects than just these two parameters, but I am simplifying).
But whoever finds the "fast and small" will become instant millionaire over night. Literally. (Example: Bandwidth cost money. Youtube dumps gobs of gigabyte every second. Even to 10% savings means 10% cost savings of youtube; which they're willing to pay you if you can save their bandwidth cost; while still able to deliver video at speed required by streaming).
Fatdog64 forum links: [url=http://murga-linux.com/puppy/viewtopic.php?t=117546]Latest version[/url] | [url=https://cutt.ly/ke8sn5H]Contributed packages[/url] | [url=https://cutt.ly/se8scrb]ISO builder[/url]
Burn_IT wrote
All that said, I am fairly certain I'll be investing in a newer PC fairly soon for work, so compression speeds will increase simply as a function of more horsepower..
Quote from lrzip's readme file courtesy of musher0You have a choice:
Time
Size
You cannot have both.
Seems that it is probably not worth the bother.You can either choose to optimise for speed (fast
compression / decompression) or size, but not both.
All that said, I am fairly certain I'll be investing in a newer PC fairly soon for work, so compression speeds will increase simply as a function of more horsepower..
Hi p310don,
If you're in a bind (e.g. no money for a faster box ATM), you could use
Type < man nice > in console to know more about configuring < nice >.
With < nice>, you could make < lrzip > use less of your computer's
resources. The compression process will take longer (hello paradox! ),
but you will be able to keep doing other tasks on your box without being
slowed down. Ergo, an overall time savings for you as user.
I'm talking about the lrzip compressor above, but one can apply this
< nice > technique to any resource-hungry executable.
Just a thought.
If you're in a bind (e.g. no money for a faster box ATM), you could use
Code: Select all
nice --some parameters lrzip --some parameters /path/to/big-file-to-compress
With < nice>, you could make < lrzip > use less of your computer's
resources. The compression process will take longer (hello paradox! ),
but you will be able to keep doing other tasks on your box without being
slowed down. Ergo, an overall time savings for you as user.
I'm talking about the lrzip compressor above, but one can apply this
< nice > technique to any resource-hungry executable.
Just a thought.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
So I have had a bit of a play with Packit in Xenial64 7.5 with it's various options.
The testing was rough, but gives me some ideas.
I started with a directory of 2788MB size, containing approximately 5000 individual files, mostly various types of text file.
Using Packit I used tar as the first pass and varied the second pass option and recorded the results.
7z compression took 12 minutes and resulted in a file of 299MB
gz compression took 90 seconds and resulted in a file of 564MB
zip compression took 1min 15 secs and resulted in a file of 566MB
compressing the gz file from earlier to 7z took 2 minues and resulted in a 544MB file
7z with fast compression took 2 minutes and resulted in a file of 445MB
gz with best compression took 6 minutes and resulted in a file of 544MB
gz with fast compression took 45 seconds and resulted in a file of 651MB
I haven't come up with a conclusion that I like really. Probably 7z with fast is the best option for me. 7z has the disadvantage of really working the PC so that nothing else works so well. GZ has the advantage of being very light on resources, but not so good with file size.
With the purchase of a new PC for work, I might consider the option of splitting the workload between two machines. There are 10 directories that are backed up, if I do 5 with one machine and 5 with the other and the tar the two resultant compressed files into one file for back up that might also save some time.
The testing was rough, but gives me some ideas.
I started with a directory of 2788MB size, containing approximately 5000 individual files, mostly various types of text file.
Using Packit I used tar as the first pass and varied the second pass option and recorded the results.
7z compression took 12 minutes and resulted in a file of 299MB
gz compression took 90 seconds and resulted in a file of 564MB
zip compression took 1min 15 secs and resulted in a file of 566MB
compressing the gz file from earlier to 7z took 2 minues and resulted in a 544MB file
7z with fast compression took 2 minutes and resulted in a file of 445MB
gz with best compression took 6 minutes and resulted in a file of 544MB
gz with fast compression took 45 seconds and resulted in a file of 651MB
I haven't come up with a conclusion that I like really. Probably 7z with fast is the best option for me. 7z has the disadvantage of really working the PC so that nothing else works so well. GZ has the advantage of being very light on resources, but not so good with file size.
With the purchase of a new PC for work, I might consider the option of splitting the workload between two machines. There are 10 directories that are backed up, if I do 5 with one machine and 5 with the other and the tar the two resultant compressed files into one file for back up that might also save some time.
P310don, I'm just curious, have you've ever reconstituted one of the files you've condensed? (I think 'condensed' is a more accurate description than 'compressed' because you're removing redundant stuff, like condensing soup. Compressing would leave everything there and somehow cram it into a smaller space.) If you have, were there any mistakes?
Hi Flash.
I hope I am getting your meaning.
Is it necessary to reconstitute the file or directory to check it? Archivers
usually have a test setting to check the integrity.
E.g.:Example:
The thing is, the user has to get into the habit of checking the archive
just after it's been created to avoid bad surprises later.
Best regards.
I hope I am getting your meaning.
Is it necessary to reconstitute the file or directory to check it? Archivers
usually have a test setting to check the integrity.
E.g.:
Code: Select all
zip -T archive.zip
Code: Select all
lzop -t archive.lzp
Code: Select all
lz4 -t archive.lz4
Code: Select all
lrzip -t archive.lrz
If the compression has errors, the user does not get that "OK" mention.lrzip -t zdrv_xenial_7.0.6.sfs.lrz
Decompressing...
100% 30.99 / 30.99 MB
Average DeCompression Speed: 30.000MB/s
[OK] - 32493568 bytes
Total time: 00:00:00.34
The thing is, the user has to get into the habit of checking the archive
just after it's been created to avoid bad surprises later.
Best regards.
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
Flash asked
I have a paranoia about data. This is only one of my backup routines. Every day I copy the resultant tar.gz file to a USB flash drive. The copy routine renames yesterday's to backup1, and that one to backup2 etc up to backup9, just in case there is an issue with the compression or condensation on one of the days. Then I have ten days worth of data to choose from if one is bad.
The uncompression / reconstitution process is the second part of the process besides the compression part. The other day when I first created a 7z file I thought for sure it wasn't correct, because it was so much smaller, so I uncompressed it. It was fine.
I found that extracting a 7z file seems to be much faster than the gz equivalent for some reason.
Normally I shouldn't need to worry about extraction unless something has gone (horribly) wrong. Every few months I do it though, just to see that it is working.
Yes.P310don, I'm just curious, have you've ever reconstituted one of the files you've condensed?
I have a paranoia about data. This is only one of my backup routines. Every day I copy the resultant tar.gz file to a USB flash drive. The copy routine renames yesterday's to backup1, and that one to backup2 etc up to backup9, just in case there is an issue with the compression or condensation on one of the days. Then I have ten days worth of data to choose from if one is bad.
The uncompression / reconstitution process is the second part of the process besides the compression part. The other day when I first created a 7z file I thought for sure it wasn't correct, because it was so much smaller, so I uncompressed it. It was fine.
I found that extracting a 7z file seems to be much faster than the gz equivalent for some reason.
Normally I shouldn't need to worry about extraction unless something has gone (horribly) wrong. Every few months I do it though, just to see that it is working.
-
- Posts: 328
- Joined: Wed 25 Jun 2014, 20:31
You could try xz compression instead of gz, which may result in a smaller size (similar to 7z size when I tried it), but it will also take longer to process. When compressing files with pupzip (xarchive), the compression type to use is determined by the file extension, so give the archive file a .tar.xz extension instead of .tar.gz when you create it.
In cases where this is a concern, I'd recommend creating a small folder containing things like symlinks and files with changed permissions/ownerships to test the archive program and format you want to use - create an archive from that folder, then extract it again to check if everything has been preserved.
For example, I tested my Wary Puppy's xarchive program with an archive created from a folder that contained a symlink, some files owned by user spot instead of root, and a file with unusual permissions (rw-rw-rw-). With the .tar.gz, and .tar.xz formats, all of these things were preserved, while .zip and .rar archives ended up with lost file ownerships, and a duplicate of the linked file instead of the symlink. With .7z (using p7zip through xarchive), the symlink was intact, but ownerships and permissions were gone. The Windows version of 7-zip, which I'm using as well because it has a decent user interface and can extract a large range of archive types, unsurprisingly can't handle any of these things.
You were probably concerned about the file size and the fact that one tiny error could mean the loss of the whole backup, but that question is also interesting in a different context, especially when it comes to using Windows- or ported-from-Windows programs and archive formats, like 7-zip. Particularly, can they handle Linux-typical things like symbolic links and file ownerships? If you just want to backup some ordinary files, you'll probably be fine using whatever program you want, but lost symlinks or file metadata could cause problems if, for example, you wanted to zip up a program folder that contains them.Flash wrote:I'm just curious, have you've ever reconstituted one of the files you've condensed?... If you have, were there any mistakes?
In cases where this is a concern, I'd recommend creating a small folder containing things like symlinks and files with changed permissions/ownerships to test the archive program and format you want to use - create an archive from that folder, then extract it again to check if everything has been preserved.
For example, I tested my Wary Puppy's xarchive program with an archive created from a folder that contained a symlink, some files owned by user spot instead of root, and a file with unusual permissions (rw-rw-rw-). With the .tar.gz, and .tar.xz formats, all of these things were preserved, while .zip and .rar archives ended up with lost file ownerships, and a duplicate of the linked file instead of the symlink. With .7z (using p7zip through xarchive), the symlink was intact, but ownerships and permissions were gone. The Windows version of 7-zip, which I'm using as well because it has a decent user interface and can extract a large range of archive types, unsurprisingly can't handle any of these things.
Hello, everyone.
About preserving symlinks: with zip, use the -y setting.
E.g. for a directory with some symlinked files in it:
zip -9ry archive.zip directory/*
Explanation:
-9 -- maximum compression
-r -- recurse in directory (i.e. archive contents of the directory)
-y -- archive the symlinks as symlinks.
Actually, with zip, if you do not provide the -y setting, it will archive the
file itself, so you will not lose anything, but the archive will be bigger.
In the case of the other compressors, I believe the Linux compressors gz
and xz archive the symlinks as symlinks automatically, but I am not sure.
I know 7z cannot do it, however.
IHTH
About preserving symlinks: with zip, use the -y setting.
E.g. for a directory with some symlinked files in it:
zip -9ry archive.zip directory/*
Explanation:
-9 -- maximum compression
-r -- recurse in directory (i.e. archive contents of the directory)
-y -- archive the symlinks as symlinks.
Actually, with zip, if you do not provide the -y setting, it will archive the
file itself, so you will not lose anything, but the archive will be bigger.
In the case of the other compressors, I believe the Linux compressors gz
and xz archive the symlinks as symlinks automatically, but I am not sure.
I know 7z cannot do it, however.
IHTH
musher0
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
~~~~~~~~~~
"You want it darker? We kill the flame." (L. Cohen)
-
- Posts: 328
- Joined: Wed 25 Jun 2014, 20:31
Good to know, thanks.musher0 wrote:About preserving symlinks: with zip, use the -y setting.
To see if I could use that setting with xarchive, I changed the options in its zip wrapper - /usr/local/lib/xarchive/wrappers/zip-wrap.sh, around line 60 - from this:
Code: Select all
# for zip, use recursive when adding files
NEW_OPTS="-r"
ADD_OPTS="-g -r"
REMOVE_OPTS="-d"
Code: Select all
# for zip, use recursive when adding files
NEW_OPTS="-r -y"
ADD_OPTS="-g -r -y"
REMOVE_OPTS="-d -y"