Latent Semantic Analysis. We are using it to filter the news. It is not important what source of the news you use, it is important, however, to have lots of it. We use about 250 different WEB crawlers (all free) to collect the news.
http://lsa.colorado.edu/papers/dp1.LSAintro.pdf