Phase 1: Determine what problem you are going to solve
by Nick
As a follow up to yesterday’s post, I’ve decided to try and blog my steps as I go. This way when I’m groaning about it a year later I can see exactly where I went wrong…. :) Anyways, the first step in problem solving is to identify the problem. My problem is I want to be able to determine if a company (i.e. its stock) is doing well or not in an analytical way. My first proposal: Analyze news about the company to determine if the news is good or bad. Bad news usually is an indicator of a problem, good news is an indicator of a company doing well (from a stock point of view). Using a simple Naive Bayes Classifier I think it should be fairly straight forward to analyze the news for each company and determine if the news is “positive”, “negative”, or “neutral”. After building up a corpus of news articles to train the system, the system should be “good enough” to run on its own, and even to have it re-train itself with newer news articles. I’ve managed to bang out the code to retrieve the news pages (viva la python!), so the next step is to port a simple perl-based classifier (found here in a great article that requires registration) to python, and then let ‘er rip. Note: Normally I discourage people from trying to reinvent the wheel. After all, if someone else has already done the hard work, why not follow in their footsteps? In this case the concept behind the code is pretty important to what I want to accomplish, so I feel it is important to really understand it. As a bonus the original code is pretty straightforward, so it should be hard for me to screw it up. So those two factors led me to think I should port the algorithm instead of using the original code.
tags: