FFXU Analyst.:
Thanks for the mention of Beautiful Soup. I'd looked for scraping libraries a long time ago, didn't find anything better than what I could write, and I've been using my own stuff.
The name "Beautiful Soup" led to a targeted Google that led to this:
http://htmlagilitypack.codeplex.com/
I'll be checking it out tonight.
BTW, with my homebrew stuff, I have a complete capture of a LOT of posts, targeted by thread, and by user. I'm using a custom "Entity" system for data, but what I have is in C#. I'm storing it in a SQL Server Compact database, but the lack of features in the compact edition means I'll probably be moving to full SQL Server.
Anyway, given recent developments, I've gotten more interested in getting back to this project. If you're willing to comparing notes, let me know. I'm targeting things including edits, user interactions, posting time patterns, etc.
Someone posted about an analysis tool elsewhere... it looked interesting, but I'm sure that developing a "dataset" for it is an entirely different matter.