how to do web scrapping

Quote from mgabriel01:

Ive done some work scraping specific data for people in the bond markets.
One useful (though quirky) product is djuggler
www.djuggler.com

After much tinkering around with this and other products - I finally decided its easier (for me) in the long run to simply use the C# APIs and write my own code (fewer things to learn-and relearn)

Hey, Thanks for the link. Looks promising, even though you say it is buggy. Will check it out.
 
Quote from mgabriel01:

ahhhh.... just wait until you encounter the vast number of different ways web sites are implemented
:)

I don't know much (read anything) about this. LOL.

I have heard - html, php, css. Never got a chance to build a website.
 
Quote from gmst:

Hey, Thanks for the link. Looks promising, even though you say it is buggy. Will check it out.



Not so much buggy as quirky
i.e. - it takes a fair amount of effort to learn their interface
Once learned, it handles a pretty good variety of things you will find on individual web sites
 
My goal is to copy and paste specific information from some websites every 1/5 minutes to an excel sheet. Thanks.

I`m using C# with Html Agility Pack => http://htmlagilitypack.codeplex.com/ on my desktop soft to this kind of stuff.

On my server I`ve got some php & perl scripts( + cron).

Some of my php scripts are here(they are old, most of them i rewrote to OO style + DOM(not parsing using regex - its an ugly way))

http://213.227.70.223/public/php_code/YahooQuotes/quotes_update.phps
http://213.227.70.223/public/php_code/
 
are you scraping news data for statistical significance? if you ping a site from the same ip a bunc hof time you will get blocked.. most sites have a protocol to do business with them upon.. like RSS or some xml ddt
 
Quote from cdcaveman:

are you scraping news data for statistical significance? if you ping a site from the same ip a bunc hof time you will get blocked.. most sites have a protocol to do business with them upon.. like RSS or some xml ddt

sorry, didn't see your message before. Thanks for the tip!!!

I am not going to scrap newsfeed at the moment. My aim currently is to scrap some data on stocks from yahoo/google finance finviz and other interesting sites and see if I can make any sense of them.
 
Quote from gmst:
I am not going to scrap newsfeed at the moment. My aim currently is to scrap some data on stocks from yahoo/google finance finviz and other interesting sites and see if I can make any sense of them.

You might take a look at the Coursera course from Georgia Tech on "Computational Investing Part I". The first few weeks covers getting data from yahoo/google. The last part of the course was using Python scripts to build a portfolio. There is also a lot of information on the class discussion forum....

https://www.coursera.org/course/compinvesting1

Good luck
 
Back
Top