Ever wanted to download a manual or a wiki from the web
- gposil
- Posts: 1300
- Joined: Mon 06 Apr 2009, 10:00
- Location: Stanthorpe (The Granite Belt), QLD, Australia
- Contact:
Ever wanted to download a manual or a wiki from the web
PMirrorget is page grabber...you point it at the path or index page you want and it grabs the text links to that page and downloads them to your local machine...with thanks to Lobster for the original to work off.
.
.
- Attachments
-
- pmwget.jpg
- Screen Shot
- (21.55 KiB) Downloaded 1887 times
-
- PMirrorget-0.1.pet
- version 0.1
- (1.41 KiB) Downloaded 648 times
[img]http://gposil.netne.net/images/tlp80.gif[/img] [url=http://www.dpup.org][b]Dpup Home[/b][/url]
- Lobster
- Official Crustacean
- Posts: 15522
- Joined: Wed 04 May 2005, 06:06
- Location: Paradox Realm
- Contact:
Very useful Well done.
Many thanks for using pwget, which is in Puppy and maybe this could be combined with that - or should they be separate?
Hope this finds its way into Puppy
Anyway I used your program to download a website for local/offline viewing. Works as advertised
Have enclosed the code (find it at user/bin/pmwget) to give people an indication of how simple GTK + a command line utility such as wget can be . . .
Many thanks for using pwget, which is in Puppy and maybe this could be combined with that - or should they be separate?
Hope this finds its way into Puppy
Anyway I used your program to download a website for local/offline viewing. Works as advertised
Have enclosed the code (find it at user/bin/pmwget) to give people an indication of how simple GTK + a command line utility such as wget can be . . .
Code: Select all
#! /bin/bash
# Pmwget created by gposil with thanks to Lobster for Pwget
# April 2009 GPL v3 License
# http://gposil.netne.net
export HELP_DIALOG='
<window title="PMirrorget - Help" resizable="false">
<vbox>
<text>
<label>PMirrorget allows you to download an entire web page and its text linked pages to a folder on you PC.Copy and paste the URL you wish to download. Use the folder selector to choose the destination. It is designed primarily for grabbing manuals and wiki pages without sifting through them, so you can view them later.</label>
</text>
<button>
<label>Close</label>
<action type="closewindow">HELP_DIALOG</action>
</button>
</vbox>
</window>
'
export Pmwget='
<window title="PMirrorget - Site Grabber Utility" resizable="false">
<vbox>
<hbox>
<text><label>Copy and Paste or type the URL of the required site into "URL". Choose your destination folder and then "Grab It Now!"</label></text>
</hbox>
<frame>
<hbox>
<text><label>URL: </label></text>
<entry accept="directory"><variable>SOURCE</variable><input>/tmp/pm_source_dir</input></entry>
</hbox>
<hbox>
<text><label>Folder:</label></text>
<entry accept="directory"><variable>DEST</variable><input>/tmp/pm_mirror_dir</input></entry>
<button>
<input file icon="gtk-open"></input>
<action type="fileselect">DEST</action>
<action>refresh:DEST</action>
</button>
</hbox>
</frame>
<hbox>
<frame>
<button help>
<action type="launch">HELP_DIALOG</action>
</button>
<button cancel></button>
</frame>
<button>
<input file>/usr/share/mini-icons/mini.checkmark.xpm</input>
<label>Grab It Now! </label>
<action type="exit">OK</action>
</button>
</hbox>
</vbox>
</window>'
I=$IFS; IFS=""
for STATEMENTS in $(gtkdialog3 --program=Pmwget --center); do
eval $STATEMENTS
done
IFS=$I
if [ $EXIT = OK ]; then
rxvt -name PMirrorget -bg "#F3F2DF" -e wget -m -c -r -np -P "$DEST" $SOURCE
rox -d "$DEST"
fi
for what it is worth,
here is the code I used to use when I was paying an arm and a leg for intenet access.
dlist.txt contans the urls that dont get downloaded
and rawbookmarks.html the list of full urls of the sites to download.
I recall there are issues with line ending and seperators in both these list so you have to experiment.
ie. uses of blank spaces or carriage returns or etc. also issues with the actual file type of the list.
I think from memory that one of the terms removes images also but you will have to check the help text at wget to be sure.
i post this in the hope that it helps with the evolution of the website downloading GUI. to help in incorporation possible upgrades for new functions and fields.
here is the code I used to use when I was paying an arm and a leg for intenet access.
Code: Select all
#!/bin/sh
rxvt -e wget -D dlist.txt -R js,css -E -H -k -p -i rawbookmarks.html
Code: Select all
http://xyz.freelogs.com
http://www.google-analytics.com
http://ads.bloomberg.com
http://pagead2.googlesyndication.com
http://us.js2.yimg.com
http://visit.webhosting.yahoo.com
I recall there are issues with line ending and seperators in both these list so you have to experiment.
ie. uses of blank spaces or carriage returns or etc. also issues with the actual file type of the list.
I think from memory that one of the terms removes images also but you will have to check the help text at wget to be sure.
i post this in the hope that it helps with the evolution of the website downloading GUI. to help in incorporation possible upgrades for new functions and fields.
Oh yes the rawbookmarks.html doesnt contain any tags at all just straight list of urls. Also recall issues with special characters in the url strings.
Code: Select all
<html>
Is it possible to use pmwget to download a blog from google blogspot.
I can use their export facility but that only gives me an xml file which is of no use.
I want to be able to do an exact backup in html format. I want to be able to capture all the comments and photographs I had previously uploaded.
I have even added the -k parameter on the command line but it did not work either. I think the problem is username and password associated. I even tried adding these to the command line without success.
My desired outcome is to copy the entire blog onto a CD or DVD or maybe just to the HDD so it can be viewed as if it was on the web.
Anyone got any ideas.
Thanks
I can use their export facility but that only gives me an xml file which is of no use.
I want to be able to do an exact backup in html format. I want to be able to capture all the comments and photographs I had previously uploaded.
I have even added the -k parameter on the command line but it did not work either. I think the problem is username and password associated. I even tried adding these to the command line without success.
My desired outcome is to copy the entire blog onto a CD or DVD or maybe just to the HDD so it can be viewed as if it was on the web.
Anyone got any ideas.
Thanks
Ok, I made the site public and pmwget worked a treat.
It did not download the entire site as there were 6 pages. I had to download each page separately then do a little manual html linking between pages.
It didn't download the bigger photos either, just the thumbnails but if you click on the thumbnail when you are online it will display the larger photo. Not a bad compromise.
I didn't bother trying to make the comments work either although they were downloaded. It looked like too much trouble but all of the posts downloaded and displayed just like they were on the web.
Good outcome.
It did not download the entire site as there were 6 pages. I had to download each page separately then do a little manual html linking between pages.
It didn't download the bigger photos either, just the thumbnails but if you click on the thumbnail when you are online it will display the larger photo. Not a bad compromise.
I didn't bother trying to make the comments work either although they were downloaded. It looked like too much trouble but all of the posts downloaded and displayed just like they were on the web.
Good outcome.