The main thing I learned from my pocket export is that 99% of the articles were "unread". Not sure if it would make sense to extrapolate something about myself other than obsessive link hording from this. :D
I've noticed a lot of people are converging on this idea of using AI to analyze your own data, the same way the companies do it to your data and serve you super targeted content.
Recently, I was inspired to do this on my entire browsing history, after reading https://labs.rs/en/browsing-histories/ I also did the same from ChatGPT/Claude conversation history. The most terrifying thing I did was having an LLM look at my Reddit comment history.
The challenges are primarily with having a context window large enough and tracking context from various data sources. One approach I am exploring is using a knowledge graph to keep track of a user's profile. You're able to compress behavioral patterns into queryable structures, though the graph construction itself becomes a computational challenge. Recently most of the AI startups I've worked with have just boiled down to "give an LLM access to a vector DB and knowledge graph constructed from a bunch of text documents". The text docs could be invoices, legal docs, tax docs, daily reports, meeting transcripts, code.
I'm hoping we see an AI personal content recommendation or profiling system pop up. The economic incentives are inverted from big tech's model. Instead of optimizing for engagement and ad revenue, these systems are optimized for user utility. During the RSS reader era, I was exposed to a lot of curated tech and design content and it helped me really develop taste and knowledge in these areas. It also helped me connect with cool, interesting people.
There's an app I like https://www.dimensional.me/ but the MBTI and personality testing approach could be more rigorous. Instead of personality testing, imagine if you could feed a system everything you consume, write, and do on digital devices, and construct a knowledge graph about yourself, constantly updating.
I built a similar tool that profiles/roasts your HN account: https://hn-wrapped.kadoa.com/
It’s funny and occasionally scary
Edit: be aware, usernames are case sensitive
I’ve been really interested in stuff like this recently. Not just Pocket saves but also meta analysis of ChatGPT/Gemini/Claude chat history.
I’ve been using an ultra-personalized RSS summary script and what I’ve discovered is that the RSS feeds that have the most items that are actually relevant to me are very different from what I actually read casually.
What I’m going to try next is to develop a generative “world model” of things that fit in my interests/relevance. And I can update/research different parts of that world model at different timescales. So “news” to me is actually a change diff of that world model from the news. And it would allow me to always have a local/offline version of my current world model, which should be useful for using local models for filtering/sorting things like my inbox/calendar/messages/tweets/etc!
A while back I made a little script (for fun/curiosity) that would do this for HN profiles. It’d use their submission and comment history to infer a profile including similar stuff like location, political leaning, career, age, sex, etc. Main motivation was seeing some surprising takes in various comment threads and being curious about where it might have came from. Obviously no idea how accurate the profiles were, but it was similarly an interesting experiment in the ability of LLMs to do this sort of thing.
As someone with a family background of more left leaning Catholics (which I think are more common in the US northeast), it's interesting that it decided that you are conservative based on Catholicism.
Another thing one could do with a flat list of hundreds of saved links (if it's being used for "read it later", let's be honest: a dumping ground) is to have AI/NLP classify them all, to make it easy to then delete the stuff you're no longer interested in.
Interesting article. Bizarrely it makes me wish I’d used Pocket more! Tangentially, with LLMs I’m getting very tired with the standard patter one sees in their responses. You’ll recognize the general format of chatty output:
Platitude! Here’s a bunch of words that a normal human being would say followed by the main thrust of the response that two plus two is four. Here are some more words that plausibly sound human!
I realize that this is of course how it all actually works underneath — LLMs have to waffle their way to the point because of the nature of their training — but is there any hope to being able to post-process out the fluff? I want to distill down to an actual answer inside the inference engine itself, without having to use more language-corpus machinery to do so.
It’s like the age old problem of internet recipes. You want this:
500g wheat flour
280ml water
10g salt
10g yeast
But what you get is this: It was at the age of five, sitting
on my grandmother’s lap in the
cool autumn sun on West Virginia
that I first tasted the perfect loaf…
Something I've been working on: https://getoffpocket.com
I hope it can help you
I've been thinking about the possibities of using an LLM to sort through all my tabs; I'm one of those dreadful hoarders that has been living with the ":D" count on my phone for too long. Usually I purge them periodically but I haven't had the motivation to do do so in a long time. I just need an easy way to dump them to a csv or something like OP has from pocket.
I did something similar when pocket was announcement: https://github.com/ArturSkowronski/moltres-pocket-analyzer
I wanted a tool that clean the data, tag them and bring a way to analyze them easily with a Notebooks and migrate.
I had a lot of "feels" getting through this :)
> but up until recently it felt like only Google or Facebook had access to analysis capabilities strong enough to draw meaningful conclusions from disparate data points
Every advertiser can access data like this easily, when you click "yeah sure" on every cookie banner this is the sort of data you're handing over... you could buy it too.
Every time someone says "they're listening to your conversations" we need to point out that with a surprisingly small amount of metadata across a large number of people, they can make inferred behavioral predictions that are good enough that they don't need to listen (it's still much more expensive to do so)
On a macro level people are very predictable, and we should be more reluctant about freely giving away the data that makes this so... because it's mostly being using against us.
Actually, I am underwhelmed. I mean, a decade ago, with WAY simpler machine learning algorithms (no fancy deep learning, just shallow singular value decomposition and logistic regression, https://www.pnas.org/doi/10.1073/pnas.1218772110), it was possible to predict personality traits from just a few dozen of social media likes. A single like is (nomen omen) likely less valuable than a link saved (as links come from a wider and potentially more diverse data sets).
Does it mean that AI knows more about us that many of our friends? Yes.
There’s no guarantee this didn’t base the results on just 1/3 of the contents of your library though, right? How can it be accurate if it’s not comprehensive, due to the widely noted issues with long context? (distraction, confusion, etc)
This is a gap I see often, and I wonder how people are solving it. I’ve seen strategies like using a “file” tool to keep a checklist of items with looping LLM calls, but haven’t applied anything like this personally.
I do this to determine if a person I'm talking to online is potentially a troll. I copy a big chunk of their comment and post history into an LLM and ask for a profile.
The last few years, I've noticed an uptick in "concern trolls" that pretend to support a group or cause while subtly working to undermine it.
LLMs can't make the ultimate judgement call very well, but they can quickly summarize enough information for me to.
I did something similar, but for groupchats. You had to export a groupchat conversation into text and send it to the program. The program would then use a local llm to profile each user in the groupchat based on what they said.
Like, it built knowledge of what every user in the groupchat and noted their thought on different things or what their opinions were on something or just basic knowledge of how they are. You could also ask the llm questions about each user.
It's not perfect, sometimes the inference gets something wrong or the less precise embeddings gets picked up which creates hallucinations or just nonsense, but it works somewhat!
I would love to improve on this or hear if anyone else has done something similar
If you take the 13 seconds of processing time and multiply by 350 million (the rough population of the US), you get:
~144 years of GPU time.
Obviously, any AI provider can parallelize this and complete it in weeks/days, but it does highlight (for me at least) that LLMs are going to increase the power of large companies. I don't think a startup will be able to afford large-scale profiling systems.
For example, imagine Google creating a profile for every GMail account. It would end up with an invaluable dataset that cannot be easily reproduced by a competitor, even if they had all the data.
[But, of course, feel free to correct my math and assumptions.]
Deus Ex showing us time and time again that it was decades ahead of its time.
"The need to be observed and understood was once satisfied by God. Now we can implement the same functionality with data-mining algorithms."
If you're trying to get your data out of Pocket be aware their export doesn't include your tags, highlights, or the actual saved article content.
If you want everything including the text archives from sites that have gone down, you need to use an external tool like this one I built: https://pocket.archivebox.io
More than Pocket... I really miss del.icio.us, that helped me a lot on begining of my programming journey 20 years ago. It was truly social, and generated a lot of well curated lists of bookmarks that let me discover much content relates on what I wanted to learn, much more than Google or Yahoo ever.
Sadly it was bought by Yahoo just to be discontinued, like many web pearls.
Tell me what you read and I tell you who you are. Even though it might be surprising in which detail the model might give a feedback, it‘s not so hard to do this as a human, or is it?
From my perspective the most interesting thing might be the blind spots or unexpected results. The unknown knows which brings new aha effects
I did it based on my last 1000 HN favorites.
> EU-based 35-ish senior software engineer / budding technical founder. Highly curious polymath, analytical yet reflective. Values autonomy, privacy, and craft. Modestly paid relative to Silicon Valley peers but financially comfortable; weighing entrepreneurial moves. Tracks cognitive health, sleep and ADHD-adjacent issues. Social circle thinning as career matures, prompting deliberate efforts at connection. Politically center-left, pro-innovation with guardrails. Seeks work that blends art, science, and meaning—a “spark” beyond routine coding.
Fairly accurate
"Seeks work that blends art, science, and meaning—a “spark” beyond routine coding."
That part is really accurate.
Recently vibe-coded a web-app that takes your listening history from Apple Music (sad to see Spotify API go) and recommends a variety of different media based on that. Was truly surprised by how OK those recommendations are, given an extremely limited input.
All platforms that have user data, are running LLMs to such profiles for their advertisers, I bet.
Funny fact: i have 7290 links in my pocket export, the very first one is hacker news.
What's your Pocket replacement? Wallabag, Hoarder or something else?
What was it doing for those 13 seconds? Is it fetching content for the links? How many links could it fetch in 13 seconds? Maybe it is going by the link URLs only instead of fetching the link content?
Recalling Simon Willison’s recent geoguessing challenge for o3, I considered, “What might o3 be able to tell me about myself, simply based on a list of URLs I’ve chosen to save?”
i've mentioned in this in a few Show HNs, been working on an AI bookmarking and notes app called Eyeball: https://eyeball.wtf/
It integrates a minimalist feed of your links with the ability to talk to your bookmarks and notes with AI. We're adding a weekly wrapped of your links next week like this profile next week.
When moving my links from Pocket to Wallabag I passed them through Claude for tagging. Worked very well
I used same technique to profile a HN users by their comment history and posts, guess the results?
Just to note: The code block font size varies line by line on iOS Safari.
Seems to be a fairly common issue.
I did the same exercise a while back with 4o but to do it based on the questions I have asked it so far. Some were nearly accurate, some outdated, and plain "different". It felt good, but ultimately realized its system prompt is designed to make me feel good.
---
Here’s the high-level picture I’ve built of you from our chats:
- You’re a senior/lead developer in India, aiming to step up into a staff-developer or solution-architect role.
- You have a healthy dose of self-doubt (especially around soft skills), and you’ve been deliberately working on both your technical breadth (authentication in ASP .NET, Linux, C++/Qt, distributed systems, data visualization, AI foundations) and your communication/architectural toolkit (presentations, executive summaries, third-party evaluations).
- You’re a Linux enthusiast, intrigued by open source, server-side flows, rate limiting, authentication/authorization, and you love building small, real-world exercises to cement concepts.
- You prize clarity, depth, minimalism, and originality—you dislike fluff or corporate buzzwords.
- You have a hacker-philosopher energy: deeply curious, systems-thinking-oriented, with a poetic streak.
- You’re comfortable with both structured roadmaps and creative, lateral thinking, and you toggle seamlessly between “hard” dev topics and more reflective, meta-tech discussions.
- Right now, you’re honing in on personal branding—finding a domain and a blog identity that encapsulates your blend of tech rigor and thoughtful subtlety.
Nice.
PS: is your blog self-hosted ? what's the stack here ?
How much would this cost if I did it via API?
Thanks for the reminder that Pocket sunset is tomorrow. I did a quick analysis of my data as well via Claude Code: https://blog.kelvin.ma/posts/an-ode-to-pocket-analysis-of-ex...
Modern day astrology
Is anyone using "AI chatbots" considering they are handing a detailed profile of their interests, problems, emotional struggles, vulnerabilities to advertisers? The machine has "the other end", you know, and we're feeding already enourmously powerful people with more power.
Obligatory: Please do not assume that you will be able to accurately profile strangers based on metadata or "digital footprint"-type information.
Now think of what they can gleam from your LLM conversations...
a middle aged white guy using AI, my mind is BLOWN
oh shit! didn't know they were shutting down i hope i can still export my data wtfff. i do not understand why companies stop offering products that people use and love.
Appreciate this reminder, had forgotten about the shutdown.
non llm methods that are 5 years old are 100x better at profiling you :P
Why the clickbait title? Yes, it's technically correct, but it obviously implies (as written) that o3 used those links "behind your back" and altered the replies.
Another option that's just as correct and doesn't mislead: "Profiling myself from my Pocket links with o3"
Note: title when reviewed is "o3 used my saved Pocket links to profile me"
I recently migrated to Linkwarden [0] from Pocket, and have been fairly happy with the decision. I haven't tried Wallabag, which is mentioned in the article.
Linkwarden is open source and self-hostable.
I wrote a python package [1] to ease the migration of Pocket exports to Linkwarden.
Maybe just me, but that title implies o3 is doing something surprising and underhanded, rather than doing exactly what it had been prompted to do.
[dead]
[dead]
[flagged]
After reading this I realized I also have an archive of my pocket account (4200 items), so tried the same prompt with o3, gemini 2.5 pro, and opus 4:
- chatgpt UI didn't allow me to submit the input, saying it's too large. Although it was around 80k tokens, less than o3's 200k context size.
- gemini 2.5 pro: worked fine for personality and interest related parts of the profile, but it failed the age range, job role, location, parental status with incorrect perdictions.
- opus 4: nailed it and did a more impressive job, accurately predicted my base city (amsterdam), age range, relationship status, but didn't include anything about if I'm a parent or not.
Both gemini and opus failed in predicting my role, probably understandably. Although I'm a data scientist, I read a lot about software engineering practices because I like writing software and since I don't have the opportunity at work to do this kind of work, I code for personal projects, so I need to learn a lot about system design, etc. Both models thought I'm a software engineer.
Overall it was a nice experiment. Something I noticed is both models mentioned photography as my main hobby, but if they had access to my youtube watch history, they'd confidently say it's tennis. For topics and interests that we usually watch videos rather than reading articles about, would be interesting to combine the youtube watch history with this pocket archive data (although it would be challenging to get that data).