| Author |
Message |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Sat 11 Jul 2009, 03:16 Post subject:
how to download a complete website Subject description: with wget |
|
copy and paste (use shift+Ins(ert) to paste into a console) this onto a console.
| Code: | | #wget -c --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains www.farmfountain.com -P mnt/home/home/user/ --no-parent www.farmfountain.com |
explanations
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
eztuxer

Joined: 06 Nov 2008 Posts: 461 Location: Belgium
|
Posted: Sat 25 Jul 2009, 17:42 Post subject:
|
|
Thanks !
Interesting website by the way.
_________________ Don't poop it down... Pup it Up ! http://pupitup.org/
|
|
Back to top
|
|
 |
eztuxer

Joined: 06 Nov 2008 Posts: 461 Location: Belgium
|
Posted: Sat 25 Jul 2009, 17:53 Post subject:
|
|
Too bad it only downloads the index page here:
| Code: | # wget -c --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains www.pupitup.phpbb3now.com -P mnt/home/home/user/ --no-parent www.pupitup.phpbb3now.com
--23:45:23-- http://www.pupitup.phpbb3now.com/
=> `mnt/home/home/user/www.pupitup.phpbb3now.com/index.html'
Resolving www.pupitup.phpbb3now.com... 174.37.114.54
Connecting to www.pupitup.phpbb3now.com|174.37.114.54|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://pupitup.phpbb3now.com/ [following]
File `mnt/home/home/user/pupitup.phpbb3now.com/index.html' already there; not retrieving.
FINISHED --23:45:23--
Downloaded: 0 bytes in 0 files
Converting mnt/home/home/user/pupitup.phpbb3now.com/index.html... 0-2
Converted 1 files in 0.002 seconds.
|
What could I do to force total copy of this forum ?
http://pupitup.phpbb3now.com/
_________________ Don't poop it down... Pup it Up ! http://pupitup.org/
|
|
Back to top
|
|
 |
paulh177

Joined: 22 Aug 2006 Posts: 870 Location: ST862228
|
Posted: Sat 25 Jul 2009, 18:07 Post subject:
|
|
wget -m might do it
_________________ Want to report a bug or problem? Have a read of this first ...
|
|
Back to top
|
|
 |
eztuxer

Joined: 06 Nov 2008 Posts: 461 Location: Belgium
|
Posted: Sun 26 Jul 2009, 05:38 Post subject:
|
|
Didn't work either.
The problem is that this host is bugs ridden, or partially locked ? (can't register any new members), this could also affect wget.
Or they purposely configured the server to not respond to wget (by implementing a minimum delay period between pages view ?) to make their "clients" captive of their lousy service.
I've downloaded another forum with the same parameters, and it worked fine.
_________________ Don't poop it down... Pup it Up ! http://pupitup.org/
|
|
Back to top
|
|
 |
gposil

Joined: 06 Apr 2009 Posts: 1305 Location: Stanthorpe (The Granite Belt), QLD, Australia
|
Posted: Sun 26 Jul 2009, 05:46 Post subject:
|
|
Try PMirrorget...its in new Puppy but not on iblio yet... let me know
http://www.gposil.com/pets/PMirrorget-0.1.pet
.Tested on your site and works perfectly...for me
_________________
Dpup Home
|
|
Back to top
|
|
 |
BarryK
Puppy Master

Joined: 09 May 2005 Posts: 6866 Location: Perth, Western Australia
|
Posted: Sun 26 Jul 2009, 06:19 Post subject:
|
|
Yes, Gposil's nice little Pmirrorget will be in 4.3beta (or pre-beta) but it is delayed a couple of days. Expect it mid-week.
See announcement re Pmirrorget:
http://puppylinux.com/blog/?viewDetailed=00915
_________________ http://bkhome.org/blog2/
|
|
Back to top
|
|
 |
eztuxer

Joined: 06 Nov 2008 Posts: 461 Location: Belgium
|
Posted: Sun 26 Jul 2009, 08:23 Post subject:
|
|
It's working !
Thank you gposil, I'll just have to FTP it when finished downloading.
Great soft cause HTTrack wasn't working on this site either.
You saved my butt.
And yes, Barry, it is a must have standard soft in the new Puppy, small size, great job.
_________________ Don't poop it down... Pup it Up ! http://pupitup.org/
|
|
Back to top
|
|
 |
eztuxer

Joined: 06 Nov 2008 Posts: 461 Location: Belgium
|
Posted: Sun 26 Jul 2009, 08:36 Post subject:
|
|
OOOPPPSSS !!!
It downloaded the forum OK, but when trying to view it off line it's not working, I guess it's normal cause the pages links haven't been rearranged for off line operation, and presume it should work All Right once uploaded within the forum via ftp.
I'll do that AFTER walking Arobas (my dog), and let you know.
_________________ Don't poop it down... Pup it Up ! http://pupitup.org/
|
|
Back to top
|
|
 |
tlchost
Joined: 05 Aug 2007 Posts: 1487 Location: Baltimore, Maryland USA
|
Posted: Sun 26 Jul 2009, 09:05 Post subject:
|
|
| gposil wrote: | | Try PMirrorget...its in new Puppy but not on iblio yet |
Does it respect the robot.txt file and thus not look in directories protected by robots.txt ?
Will it download files in /cgi-bin and other files such as graphics and css files that are called by the website, but stored on the server other than the site directory itself?
Thanks
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Sun 26 Jul 2009, 10:13 Post subject:
|
|
without taking anything away from PMirrorget which i haven't tried and didn't know of previously,
| eztuxer wrote: | Too bad it only downloads the index page here:
| Code: | # wget -c --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains www.pupitup.phpbb3now.com -P mnt/home/home/user/ --no-parent www.pupitup.phpbb3now.com
--23:45:23-- http://www.pupitup.phpbb3now.com/
=> `mnt/home/home/user/www.pupitup.phpbb3now.com/index.html'
Resolving www.pupitup.phpbb3now.com... 174.37.114.54
Connecting to www.pupitup.phpbb3now.com|174.37.114.54|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://pupitup.phpbb3now.com/ [following]
File `mnt/home/home/user/pupitup.phpbb3now.com/index.html' already there; not retrieving.
FINISHED --23:45:23--
Downloaded: 0 bytes in 0 files
Converting mnt/home/home/user/pupitup.phpbb3now.com/index.html... 0-2
Converted 1 files in 0.002 seconds.
|
What could I do to force total copy of this forum ?
http://pupitup.phpbb3now.com/ |
www.pupitup.phpbb3now.com is possibly the problem, pupitup.phpbb3now.com may work
| Code: | | # wget -c --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains .pupitup.phpbb3now.com -P mnt/home/home/user/ --no-parent pupitup.phpbb3now.com |
started it and it seems to go ok then stopped it. seems to be downloading a lot of files with .html extensions (perhaps remove --html-extension) which might make it difficult to upload, unless you can use the batch file name changer in rox. With html extension they will link locally, with/for php extensions they will need a locally installed php enabled server to display properly plus you will also need the original php code for the pages which you will NOT get using non privileged access to any website.
overall though your best bet would be to do a backup of the database from within admin and then upload that backup file to your new hosting site database. perhaps ask the site owner for the backup file if you are not privileged to that resource else it is still a lot of work you are aiming at.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
Last edited by aarf on Sun 26 Jul 2009, 11:40; edited 2 times in total
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Sun 26 Jul 2009, 10:40 Post subject:
|
|
have edited the wget code to be as it should in my last post.
as far as i know mirroring a site that has a database is NOT possible in any way, shape or form that is reasonably useful via internet global user access.
the only way to mirror a forum successfully is to get the 'backup.sql' from the database on the server then use that to create tables in your new site. it may or may not also be possible to relocate a forum from within phpbb2 admin depending on your privileges.
using any form of over the web copy/mirror of a php forum is a recipe for vast amounts of time wasting effort. however that is your choice.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
aarf
Joined: 30 Aug 2007 Posts: 3620 Location: around the bend
|
Posted: Sun 26 Jul 2009, 11:59 Post subject:
|
|
| tlchost wrote: | | gposil wrote: | | Try PMirrorget...its in new Puppy but not on iblio yet |
Does it respect the robot.txt file and thus not look in directories protected by robots.txt ?
Will it download files in /cgi-bin and other files such as graphics and css files that are called by the website, but stored on the server other than the site directory itself?
Thanks |
i haven't had a look at PMirrorget but i very much doubt that you will get the source code out of the cgi-bin however a html image of cgi-bin output is possible as are graphics files.
css files are downloaded with/by wget so I cant see that they would be a problem for PMirrorget.
robot.txt is also downloaded first but how or if it is used in any way i dont know.
wget cannot access htaccess locked directories without having the username/password. Whether robot.txt is able to give that sort of denial protection i dont know and haven't tested it.
_________________
ASUS EeePC Flare series 1025C 4x Intel Atom N2800 @ 1.86GHz RAM 2063MB 800x600p ATA 320G
_-¤-_
<º))))><.¸¸.•´¯`•.#.•´¯`•.¸¸. ><((((º>
|
|
Back to top
|
|
 |
eztuxer

Joined: 06 Nov 2008 Posts: 461 Location: Belgium
|
Posted: Sun 26 Jul 2009, 13:40 Post subject:
|
|
After a 4 miles walk and over an hour of Taï Chi Chuan, I feel refreshed and I'm back in the starting blocks.
As I was fearing, this is no simple challenge and it goes way beyond my knowledge in Linux and Puppy.
I've managed a few forums as admin, but this is my first time for domain based phpBB3.
Here's the URL with me logged in as admin:
http://pupitup.phpbb3now.com/adm/index.php?sid=ad157ef79760fa499eecfe13da175d27&i=bots&mode=bots
I don't know if this might help.
I do not have access to the data base, and the folks running phpBB3now are not reachable:
http://forum.phpbb3now.com/
Can't register either:
http://forum.phpbb3now.com/ucp.php?mode=register&sid=e945b6e37cab250141aa2c85bb3e45ee
One thing I could do is add a new bot to let robot.txt roam trough freely, maybe.
If it would be possible to convert (if necessary) links so that it could work @ http://pupitup.org/forum/phpBB3/ that would be a breeze.
If not,I'll jusst copy/paste all forum sections manually, and would eventually forget about the existing posts.
_________________ Don't poop it down... Pup it Up ! http://pupitup.org/
|
|
Back to top
|
|
 |
gposil

Joined: 06 Apr 2009 Posts: 1305 Location: Stanthorpe (The Granite Belt), QLD, Australia
|
Posted: Sun 26 Jul 2009, 19:39 Post subject:
|
|
PMirrorget will download all normal site files, html, css, txt and graphics pointed to by site files, it will not download sql database material...
_________________
Dpup Home
|
|
Back to top
|
|
 |
|