Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Thu 23 Oct 2014, 04:25
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Use shell functions to fetch information online
Post new topic   Reply to topic View previous topic :: View next topic
Page 1 of 1 [3 Posts]  
Author Message
Flash
Official Dog Handler


Joined: 04 May 2005
Posts: 11120
Location: Arizona USA

PostPosted: Tue 03 Jan 2012, 23:05    Post subject:  Use shell functions to fetch information online  

I don't know if this will be useful or even belongs in the Programming section. I just saw it and thought it looked like it might.
Quote:
By Marco Fioretti
January 2, 2012, 9:00 AM PST

Takeaway: Marco Fioretti shows two examples of shell functions that you can use for web scraping when all you need is a quick way to extract text from a given website.
Back to top
View user's profile Send private message 
technosaurus


Joined: 18 May 2008
Posts: 4353

PostPosted: Wed 04 Jan 2012, 17:31    Post subject:  

If anyone wants more examples, I have written quite a few examples of web scraping. L18L is using my google translate code for localizing shell scripts, jpeps has started using my yahoo finance example, Barry incorporated my google search grokking into puppy's alternative man command after die.net changed their formatting, there are a lot more, but that is all I can remember.

here is the basic process:
use the forms to get the appropriate results (keep a note of what does what)
save and open the html of the page and look for <form> .... </form>
(you will need to add the website and any subdirectories to the "action")
grok the hell out of that till you get it down to a minimum

[stop here if you just want to use it in a web page]

each one of the name=name1 value=value1 pairs translates to a corresponding &name1=value1

you can simulate the form being submitted by opening a browser to:

<URLofpage><action>?name1=value1&name2=value2....

[stop here if you just want to use it to get a page]

if that works - try it with wget (you may need to add -U firefox to wget to defeat anticrawler blocks)

if you output wget to stdout, you can pipe it through sed, grep, cut, etc... to format however you like

[see other tutorials for various types of formatting]

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send private message 
sunburnt


Joined: 08 Jun 2005
Posts: 5037
Location: Arizona, U.S.A.

PostPosted: Wed 04 Jan 2012, 23:13    Post subject:  

It`s what I used for the gtkDialog GUIs for a Debian downloader.
Download the page with wget and parse it for the needed text.

But trying to get it to resolve dependencies proved to be a real struggle.
Back to top
View user's profile Send private message 
Display posts from previous:   Sort by:   
Page 1 of 1 [3 Posts]  
Post new topic   Reply to topic View previous topic :: View next topic
 Forum index » Off-Topic Area » Programming
Jump to:  

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0463s ][ Queries: 11 (0.0062s) ][ GZIP on ]