| Author |
Message |
gposil

Joined: 06 Apr 2009 Posts: 1305 Location: Stanthorpe (The Granite Belt), QLD, Australia
|
Posted: Sun 26 Apr 2009, 06:26 Post subject:
Ever wanted to download a manual or a wiki from the web Subject description: I have this should help |
|
PMirrorget is page grabber...you point it at the path or index page you want and it grabs the text links to that page and downloads them to your local machine...with thanks to Lobster for the original to work off.
.
| Description |
Screen Shot |
| Filesize |
21.55 KB |
| Viewed |
1140 Time(s) |

|
| Description |
version 0.1
|

Download |
| Filename |
PMirrorget-0.1.pet |
| Filesize |
1.41 KB |
| Downloaded |
370 Time(s) |
_________________
Dpup Home
|
|
Back to top
|
|
 |
Lobster
Official Crustacean

Joined: 04 May 2005 Posts: 15109 Location: Paradox Realm
|
Posted: Sun 26 Apr 2009, 08:22 Post subject:
|
|
Very useful Well done.
Many thanks for using pwget, which is in Puppy and maybe this could be combined with that - or should they be separate?
Hope this finds its way into Puppy
Anyway I used your program to download a website for local/offline viewing. Works as advertised
Have enclosed the code (find it at user/bin/pmwget) to give people an indication of how simple GTK + a command line utility such as wget can be . . .
| Code: | #! /bin/bash
# Pmwget created by gposil with thanks to Lobster for Pwget
# April 2009 GPL v3 License
# http://gposil.netne.net
export HELP_DIALOG='
<window title="PMirrorget - Help" resizable="false">
<vbox>
<text>
<label>PMirrorget allows you to download an entire web page and its text linked pages to a folder on you PC.Copy and paste the URL you wish to download. Use the folder selector to choose the destination. It is designed primarily for grabbing manuals and wiki pages without sifting through them, so you can view them later.</label>
</text>
<button>
<label>Close</label>
<action type="closewindow">HELP_DIALOG</action>
</button>
</vbox>
</window>
'
export Pmwget='
<window title="PMirrorget - Site Grabber Utility" resizable="false">
<vbox>
<hbox>
<text><label>Copy and Paste or type the URL of the required site into "URL". Choose your destination folder and then "Grab It Now!"</label></text>
</hbox>
<frame>
<hbox>
<text><label>URL: </label></text>
<entry accept="directory"><variable>SOURCE</variable><input>/tmp/pm_source_dir</input></entry>
</hbox>
<hbox>
<text><label>Folder:</label></text>
<entry accept="directory"><variable>DEST</variable><input>/tmp/pm_mirror_dir</input></entry>
<button>
<input file icon="gtk-open"></input>
<action type="fileselect">DEST</action>
<action>refresh:DEST</action>
</button>
</hbox>
</frame>
<hbox>
<frame>
<button help>
<action type="launch">HELP_DIALOG</action>
</button>
<button cancel></button>
</frame>
<button>
<input file>/usr/share/mini-icons/mini.checkmark.xpm</input>
<label>Grab It Now! </label>
<action type="exit">OK</action>
</button>
</hbox>
</vbox>
</window>'
I=$IFS; IFS=""
for STATEMENTS in $(gtkdialog3 --program=Pmwget --center); do
eval $STATEMENTS
done
IFS=$I
if [ $EXIT = OK ]; then
rxvt -name PMirrorget -bg "#F3F2DF" -e wget -m -c -r -np -P "$DEST" $SOURCE
rox -d "$DEST"
fi |
_________________ Puppy WIKI
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Sun 26 Apr 2009, 09:20 Post subject:
|
|
this is just what i have been thinking of asking for. having used webzip.exe for windows quite a lot i feel your gui needs some extra fields to be competitive. eg link block for google ads etc., required file types, follow link depth and to where... , see the webzip appication for more ideas.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
hillside

Joined: 02 Sep 2007 Posts: 642 Location: Minnesota, USA. The frozen north.
|
Posted: Sun 26 Apr 2009, 14:23 Post subject:
|
|
This made it very easy to get a quick backup of everything on my website.
Backup. I have to remember to do that now and then.
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Fri 01 May 2009, 02:44 Post subject:
|
|
for what it is worth,
here is the code I used to use when I was paying an arm and a leg for intenet access.
| Code: | #!/bin/sh
rxvt -e wget -D dlist.txt -R js,css -E -H -k -p -i rawbookmarks.html |
dlist.txt contans the urls that dont get downloaded
| Code: | http://xyz.freelogs.com
http://www.google-analytics.com
http://ads.bloomberg.com
http://pagead2.googlesyndication.com
http://us.js2.yimg.com
http://visit.webhosting.yahoo.com |
and rawbookmarks.html the list of full urls of the sites to download.
I recall there are issues with line ending and seperators in both these list so you have to experiment.
ie. uses of blank spaces or carriage returns or etc. also issues with the actual file type of the list.
I think from memory that one of the terms removes images also but you will have to check the help text at wget to be sure.
i post this in the hope that it helps with the evolution of the website downloading GUI. to help in incorporation possible upgrades for new functions and fields.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Fri 01 May 2009, 03:18 Post subject:
|
|
Oh yes the rawbookmarks.html doesnt contain any tags at all just straight list of urls. Also recall issues with special characters in the url strings.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
droope

Joined: 31 Jul 2008 Posts: 814 Location: Uruguay, Mercedes
|
Posted: Fri 01 May 2009, 11:23 Post subject:
|
|
Somehting that would be usefull would be to have the possibility of choosing to DL only the links that are in between a particular div.
Maybe it's too complex, but i think it'd be really really usefull.
Cheers,
Droope
|
|
Back to top
|
|
 |
smokey01

Joined: 30 Dec 2006 Posts: 1605 Location: South Australia
|
Posted: Fri 21 May 2010, 23:01 Post subject:
|
|
Is it possible to use pmwget to download a blog from google blogspot.
I can use their export facility but that only gives me an xml file which is of no use.
I want to be able to do an exact backup in html format. I want to be able to capture all the comments and photographs I had previously uploaded.
I have even added the -k parameter on the command line but it did not work either. I think the problem is username and password associated. I even tried adding these to the command line without success.
My desired outcome is to copy the entire blog onto a CD or DVD or maybe just to the HDD so it can be viewed as if it was on the web.
Anyone got any ideas.
Thanks
_________________ Puppy Software <-> Distros <-> Puppy Linux Tips
|
|
Back to top
|
|
 |
smokey01

Joined: 30 Dec 2006 Posts: 1605 Location: South Australia
|
Posted: Fri 21 May 2010, 23:06 Post subject:
|
|
I wonder if it has something to do with the security I have setup.
Only people I give access to can see it.
Maybe I need to make it public to download it.
_________________ Puppy Software <-> Distros <-> Puppy Linux Tips
|
|
Back to top
|
|
 |
smokey01

Joined: 30 Dec 2006 Posts: 1605 Location: South Australia
|
Posted: Sat 22 May 2010, 05:21 Post subject:
|
|
Ok, I made the site public and pmwget worked a treat.
It did not download the entire site as there were 6 pages. I had to download each page separately then do a little manual html linking between pages.
It didn't download the bigger photos either, just the thumbnails but if you click on the thumbnail when you are online it will display the larger photo. Not a bad compromise.
I didn't bother trying to make the comments work either although they were downloaded. It looked like too much trouble but all of the posts downloaded and displayed just like they were on the web.
Good outcome.
_________________ Puppy Software <-> Distros <-> Puppy Linux Tips
|
|
Back to top
|
|
 |
|