Puppy Linux Discussion Forum Forum Index Puppy Linux Discussion Forum
Puppy HOME page : puppylinux.com
"THE" alternative forum : puppylinux.info
 
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

The time now is Sat 25 Oct 2014, 08:56
All times are UTC - 4
 Forum index » Off-Topic Area » Programming
Use shell functions to fetch information online
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
Page 1 of 1 Posts_count  
Author Message
Flash
Official Dog Handler


Joined: 04 May 2005
Posts: 11121
Location: Arizona USA

PostPosted: Tue 03 Jan 2012, 23:05    Post_subject:  Use shell functions to fetch information online  

I don't know if this will be useful or even belongs in the Programming section. I just saw it and thought it looked like it might.
Quote:
By Marco Fioretti
January 2, 2012, 9:00 AM PST

Takeaway: Marco Fioretti shows two examples of shell functions that you can use for web scraping when all you need is a quick way to extract text from a given website.
Back to top
View user's profile Send_private_message 
technosaurus


Joined: 18 May 2008
Posts: 4353

PostPosted: Wed 04 Jan 2012, 17:31    Post_subject:  

If anyone wants more examples, I have written quite a few examples of web scraping. L18L is using my google translate code for localizing shell scripts, jpeps has started using my yahoo finance example, Barry incorporated my google search grokking into puppy's alternative man command after die.net changed their formatting, there are a lot more, but that is all I can remember.

here is the basic process:
use the forms to get the appropriate results (keep a note of what does what)
save and open the html of the page and look for <form> .... </form>
(you will need to add the website and any subdirectories to the "action")
grok the hell out of that till you get it down to a minimum

[stop here if you just want to use it in a web page]

each one of the name=name1 value=value1 pairs translates to a corresponding &name1=value1

you can simulate the form being submitted by opening a browser to:

<URLofpage><action>?name1=value1&name2=value2....

[stop here if you just want to use it to get a page]

if that works - try it with wget (you may need to add -U firefox to wget to defeat anticrawler blocks)

if you output wget to stdout, you can pipe it through sed, grep, cut, etc... to format however you like

[see other tutorials for various types of formatting]

_________________
Web Programming - Pet Packaging 100 & 101
Back to top
View user's profile Send_private_message 
sunburnt


Joined: 08 Jun 2005
Posts: 5037
Location: Arizona, U.S.A.

PostPosted: Wed 04 Jan 2012, 23:13    Post_subject:  

It`s what I used for the gtkDialog GUIs for a Debian downloader.
Download the page with wget and parse it for the needed text.

But trying to get it to resolve dependencies proved to be a real struggle.
Back to top
View user's profile Send_private_message 
Display_posts:   Sort by:   
Page 1 of 1 Posts_count  
Post_new_topic   Reply_to_topic View_previous_topic :: View_next_topic
 Forum index » Off-Topic Area » Programming
Jump to:  

Rules_post_cannot
Rules_reply_cannot
Rules_edit_cannot
Rules_delete_cannot
Rules_vote_cannot
You cannot attach files in this forum
You can download files in this forum


Powered by phpBB © 2001, 2005 phpBB Group
[ Time: 0.0442s ][ Queries: 11 (0.0028s) ][ GZIP on ]