Categorizing Hacker News Posts

26 July 2024 #projects #web #go

Related Posts

Deploying a Go App on Apache (14 Aug 2024)

tl;dr: check it out at hn.caleb.software.

Front Page News

I like Hacker News. For all of its warts (looking at you, crypto bros) it’s a great place for finding interesting tech articles and blogs. Of course, I don’t feel like reading everything that gets posted - especially when things devolve into drama-politics. The site guidelines do say that “if they’d cover it on TV news, it’s probably off-topic,” but I guess it’s pretty hard to change human nature, because these stories still pop up on the front page from time to time.

Would it really be that difficult to just consider “is Caleb going to find this story interesting?” before submitting something? Why can’t everyone just post the things that I want??

Domain Categorization

Joking aside, I’ve noticed that most of the stories I don’t care about tend to come from the same hundred or so websites. The other day I had an idea: what if I could build a simple frontend for HN that would allow me to sort each post (and its domain) into a few categories? Then, if I’m just in the mood for some indie tech blog articles I could just the front page to only show those types of stories?

… and then I can use these domain classifications to train a neural network on keywords found in the article titles, maybe create a latent Dirichlet allocation, and then use that to preemptively predict what category a new post should… hold on, let’s just start with that first idea.

So I made a tiny REST API in Go which basically just acts as a proxy for the HN API1, and a simple webpage which displays all of the stories alongside a little select box to categorize each one. It stores the sorted domain list in the browser’s local storage, so anyone can use it without me needing to deal with user accounts or a database. And yes, I spent exactly seven minutes creating the CSS styles for the page. Isn’t it beautiful?

Right now there’s just a handful of default categories: Technology, Miscellaneous, Business, Science, Projects & Companies, and Garbage. There’s also a settings page where you can import/export your domain lists for sharing as well as modify what categories are available.

I’ll release more details on the technical implementation and some things I learned pretty soon. In the meantime, you can check out the source yourself here on Github.

  1. I really wanted to make this project frontend-only for ease of hosting, but sadly the CORS settings on the HN API made this impossible.