Ever wanted to download a manual or a wiki from the web

A home for all kinds of Puppy related projects
Post Reply
Message
Author
User avatar
gposil
Posts: 1300
Joined: Mon 06 Apr 2009, 10:00
Location: Stanthorpe (The Granite Belt), QLD, Australia
Contact:

Ever wanted to download a manual or a wiki from the web

#1 Post by gposil »

PMirrorget is page grabber...you point it at the path or index page you want and it grabs the text links to that page and downloads them to your local machine...with thanks to Lobster for the original to work off.

.
Attachments
pmwget.jpg
Screen Shot
(21.55 KiB) Downloaded 1887 times
PMirrorget-0.1.pet
version 0.1
(1.41 KiB) Downloaded 648 times
[img]http://gposil.netne.net/images/tlp80.gif[/img] [url=http://www.dpup.org][b]Dpup Home[/b][/url]

User avatar
Lobster
Official Crustacean
Posts: 15522
Joined: Wed 04 May 2005, 06:06
Location: Paradox Realm
Contact:

#2 Post by Lobster »

Very useful :) Well done.
Many thanks for using pwget, which is in Puppy and maybe this could be combined with that - or should they be separate?
Hope this finds its way into Puppy

Anyway I used your program to download a website for local/offline viewing. Works as advertised ;)

Have enclosed the code (find it at user/bin/pmwget) to give people an indication of how simple GTK + a command line utility such as wget can be . . .

Code: Select all

#! /bin/bash

# Pmwget created by gposil with thanks to Lobster for Pwget
# April 2009 GPL v3 License
# http://gposil.netne.net

export HELP_DIALOG='
<window title="PMirrorget - Help" resizable="false">
  <vbox>
    <text>
      <label>PMirrorget allows you to download an entire web page and its text linked pages to a folder on you PC.Copy and paste the URL you wish to download. Use the folder selector to choose the destination. It is designed primarily for grabbing manuals and wiki pages without sifting through them, so you can view them later.</label>
    </text>
    <button>
      <label>Close</label>
      <action type="closewindow">HELP_DIALOG</action>
    </button>
  </vbox>
  </window>
'

export Pmwget='
<window title="PMirrorget - Site Grabber Utility" resizable="false">
<vbox>
 <hbox>
  <text><label>Copy and Paste or type the URL of the required site into "URL". Choose your destination folder and then "Grab It Now!"</label></text>
 </hbox>
 <frame>
 <hbox>
  <text><label>URL:    </label></text>
  <entry accept="directory"><variable>SOURCE</variable><input>/tmp/pm_source_dir</input></entry>
 </hbox>
 <hbox>
  <text><label>Folder:</label></text>
  <entry accept="directory"><variable>DEST</variable><input>/tmp/pm_mirror_dir</input></entry>
  <button>
   <input file icon="gtk-open"></input>
   <action type="fileselect">DEST</action>
   <action>refresh:DEST</action>
  </button>
 </hbox>
 </frame>
 <hbox>
 <frame>
  <button help>
<action type="launch">HELP_DIALOG</action> 
  </button>
  <button cancel></button>
  </frame>
  <button>
  <input file>/usr/share/mini-icons/mini.checkmark.xpm</input>
       <label>Grab It Now! </label>
       <action type="exit">OK</action>
  </button>

 </hbox>
</vbox>
</window>'

I=$IFS; IFS=""
for STATEMENTS in  $(gtkdialog3 --program=Pmwget --center); do
   eval $STATEMENTS
done
IFS=$I
if [ $EXIT = OK ]; then
  rxvt -name PMirrorget -bg "#F3F2DF" -e wget -m -c -r -np -P "$DEST" $SOURCE
  rox -d "$DEST"
fi 
Puppy Raspup 8.2Final 8)
Puppy Links Page http://www.smokey01.com/bruceb/puppy.html :D

aarf

#3 Post by aarf »

this is just what i have been thinking of asking for. having used webzip.exe for windows quite a lot i feel your gui needs some extra fields to be competitive. eg link block for google ads etc., required file types, follow link depth and to where... , see the webzip appication for more ideas.

User avatar
hillside
Posts: 633
Joined: Sun 02 Sep 2007, 18:59
Location: Minnesota, USA. The frozen north.

#4 Post by hillside »

This made it very easy to get a quick backup of everything on my website.

Backup. I have to remember to do that now and then.

aarf

#5 Post by aarf »

for what it is worth,

here is the code I used to use when I was paying an arm and a leg for intenet access.

Code: Select all

#!/bin/sh
rxvt -e wget -D dlist.txt -R js,css -E -H -k -p -i rawbookmarks.html
dlist.txt contans the urls that dont get downloaded

Code: Select all

 http://xyz.freelogs.com
http://www.google-analytics.com
http://ads.bloomberg.com
http://pagead2.googlesyndication.com
http://us.js2.yimg.com
http://visit.webhosting.yahoo.com
and rawbookmarks.html the list of full urls of the sites to download.
I recall there are issues with line ending and seperators in both these list so you have to experiment.
ie. uses of blank spaces or carriage returns or etc. also issues with the actual file type of the list.
I think from memory that one of the terms removes images also but you will have to check the help text at wget to be sure.

i post this in the hope that it helps with the evolution of the website downloading GUI. to help in incorporation possible upgrades for new functions and fields.

aarf

#6 Post by aarf »

Oh yes the rawbookmarks.html doesnt contain any

Code: Select all

<html>
tags at all just straight list of urls. Also recall issues with special characters in the url strings.

User avatar
droope
Posts: 801
Joined: Fri 01 Aug 2008, 00:17
Location: Uruguay, Mercedes

#7 Post by droope »

Somehting that would be usefull would be to have the possibility of choosing to DL only the links that are in between a particular div.

Maybe it's too complex, but i think it'd be really really usefull.

Cheers,
Droope

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#8 Post by smokey01 »

Is it possible to use pmwget to download a blog from google blogspot.

I can use their export facility but that only gives me an xml file which is of no use.

I want to be able to do an exact backup in html format. I want to be able to capture all the comments and photographs I had previously uploaded.

I have even added the -k parameter on the command line but it did not work either. I think the problem is username and password associated. I even tried adding these to the command line without success.

My desired outcome is to copy the entire blog onto a CD or DVD or maybe just to the HDD so it can be viewed as if it was on the web.

Anyone got any ideas.

Thanks

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#9 Post by smokey01 »

I wonder if it has something to do with the security I have setup.

Only people I give access to can see it.

Maybe I need to make it public to download it.

User avatar
smokey01
Posts: 2813
Joined: Sat 30 Dec 2006, 23:15
Location: South Australia :-(
Contact:

#10 Post by smokey01 »

Ok, I made the site public and pmwget worked a treat.

It did not download the entire site as there were 6 pages. I had to download each page separately then do a little manual html linking between pages.

It didn't download the bigger photos either, just the thumbnails but if you click on the thumbnail when you are online it will display the larger photo. Not a bad compromise.

I didn't bother trying to make the comments work either although they were downloaded. It looked like too much trouble but all of the posts downloaded and displayed just like they were on the web.

Good outcome.

Post Reply