I just find the subset lines by finding start & end indicator points then write a custom parser for the subset section. I might be wrong, but for my needs a full-blown html parser would be much slower and I'm hitting the same file structure every time (for each stock).