Archive your Reddit data before it's too late

xavdid | 459 points

Personally, I wiped all of my personal Reddit accounts last year. I now use short-lived anonymous accounts which I also wipe after awhile. I use Shreddit [^0] to wipe the account before deleting it. I do wish that I was able to backup some valuable conversations, but I honestly wouldn't find much worth in the backups without the additional context. So perhaps something to explore -- also storing the context of a particular comment with a configurable depth.


ezekg | 4 months ago

Errr... why not just use their official tool?

I sent in a request and a few days later had a bunch of .csv files containing everything I've done with my account.

OK, CSV isn't JSON - but it's pretty easy to parse or import into a database of your choice.

edent | 4 months ago

Your reddit data is already available via torrent, along with everyone else's, but you'll have to dig for it

Reddit comments and submissions collected by Pushshift:

MicropenisMike | 4 months ago

All this debate regarding Reddit makes me wonder do we even need these communities. HN sure has value to offer but something like Reddit and YouTube are huge waste of time. Before 2000s people lived their life without these communities. They lived without the constant influx of info and opinions from others. And it seems they were way more happy than us. Probably because they were more focused on solving their own problems than global problems. Who knows if that's exactly the thing we need to solve global problems.

None of the online communities are necessary to live a happy life, not even HN.

quaintdev | 4 months ago

Is Lemmy the leading candidate for a viable reddit replacement? Are there any other serious efforts in this space?

HN has such an insane depth of talent that I am surprised I haven't seen a few ShowHN posts that read something like:

"Hi guys, I was bored last weekend so I thought it would be fun to build a reddit clone as a single Rust binary with an imbedded bespoke graph database. It uses a fine tuned LLaMA model for optional auto-moderation. So far it's handling sustained 1.6M / posts sec on my 2015 MacBook Pro. If I have some time this next week I will add distributed mode with Raft or CRDTs. Hope you guys like it."

cpeth | 4 months ago

I'm not sure it'll be possible to create rich archives of your own data once Reddit's API changes go through, so I made a tool that creates a SQLite archive of everything (that you can update over time).

xavdid | 4 months ago

Huh. So apparently I'm in the minority here- I point a browser at , skim a few favorite forums, maybe make a comment or two then do something else for the rest of the day.

IOW, is the sky not falling for my use-case?

kennethrc | 4 months ago

I made a series of reddit posts that collectively involved a fair amount of research, so I ended up copying them over to my personal website in order to have a better primary source.

But I'll probably run this to grab everything else.

nfriedly | 4 months ago

Shameless plug: I've written a utility which delivers you your favourited posts and comments. While it doesn't support backfilling at the moment, you're welcome to take a look:

rounakdatta | 4 months ago

Too late? I already lost a ton of data in the covid purge. (was active user of /r/NoNewNormal) Every comment, post or interaction on a banned sub is suddenly deleted permanently with no way to retrieve it. That's really losing data, not a made up crisis of "enshittification."

Most people worried about the API change have the "correct politics" so your data is safe. It's only people who participate in wrongthink that need to backup everything.

halfjoking | 4 months ago

Archive your Reddit data?

You guys must be using Reddit way differently than I am. It's just a social site. The subreddits I subscribe to are hobby-related. People sharing what they're doing with regards to their hobby. What data would I have to archive? If Reddit were to go away I'd just find another similar service allowing me to hangout with people interested in my hobby.

What are you guys doing?

taylodl | 4 months ago

> The non-monetary value Reddit as a knowledge store is literally priceless; it's a modern-day Library of Alexandria.

Is it though?

clnq | 4 months ago

>These changes mark the beginning of the (apparently) inevitable enshittification of Reddit as a platform

This started long ago, when the new UI was announced. This was the first obvious "we're going to make it worse for everybody because that's how we operate and there's nothing you can do about it" step for me.

TremendousJudge | 4 months ago

What is funny is there seems to feeling of urgency to get data from Reddit even though it's basically the users who are making subreddits private. The protest really does seem to me like a "cut my nose off to spite my face" protest.

Realistically, there are ways to protest that increase costs. Everyone uploading 20minute videos of their wall for example. That would increase costs but not really affect how user's use the site. People want to hit their pockets yet all they can think of "If we don't use it, that'll hurt them" when if it really hurt them they wouldn't allow it. They own and control the site, they can make it impossible to make subreddits private with probably a few minutes of code - just make the process error out for a few days and then remove the error at the start of the request.

that_guy_iain | 4 months ago

Am I the only person in the world that just uses a browser to access Reddit? No ads, no nags, no idiotic crippled "app" interface on a postage-stamp screen...

Gordonjcp | 4 months ago

I just tried installing this from my very new Linux install (I didn't even have pip installed!) and got the following error:

  File "/home/XXX/.local/pipx/venvs/reddit-user-to-sqlite/lib/python3.10/site-packages/reddit_user_to_sqlite/", line 1, in <module>
    from typing import Any, Literal, NotRequired, Optional, Sequence, TypedDict, final
    ImportError: cannot import name 'NotRequired' from 'typing' (/usr/lib/python3.10/
wvenable | 4 months ago

This only gives you 1000 posts. You'll need to ask Reddit for an archive of all of your data. Their SLO is 30 days. I doubt that now given the madness going on.

nunez | 4 months ago

Why did people not get this apoplectic when Twitter and Facebook killed 3rd party clients?

Yhippa | 4 months ago

Why is there a 1K comment limit and can you add a way to override that? Just hitting the reddit API looks like it will fetch further back.

I looked at my 1000th comment in the archive and it's only back to Sat May 28 2022 (I guess I comment a lot), if I want to save 10+ years I'll need a higher limit.

Edit: Looks like this might a reddit limit? I'm seeing other tools mention 1K. I guess I'll use Reddit's data export tool instead.

joshstrange | 4 months ago
Boy, I sure can't wait until 2 days after this "strike" occurs when people can move on to whatever we're supposed to be outraged about next. Most people (a) have a better version of information actually worth saving already saved somewhere else, and (b) have zero use for their reddit posts without the context surrounding the post.

MisterBastahrd | 4 months ago

Just backed up my data, ~60 posts and ~1000 comments. Thank you! Worked flawlessly (except for pythons incredibly convoluted package system for dabblers like me, but that’s not your fault)

Feature request: delete all comments and posts (can only be done 1 by 1 afaik). There used to be a nuke Reddit chrome extension but it appears removed and/or out of date.

klabb3 | 4 months ago

I wonder whether AT Proto, the protocol used in bluesky, might make for a good base for a decentralized reddit alternative.

- public

- extensible

- bring your own client

- domains as usernames

- federated but with escape hatches so you're never tied to a single host

- considerations for twitter-scale from the start, eg with "big graph servers"

Still under a waitlist / closed for now / under very active development, but seems very promising.

raphaelrk | 4 months ago

I delete all my accounts every year or so. I don’t understand people’s fascination with their own data online. It’s mostly less than worthless, so their fascination with storing stuff they will never read again is weird.

I have a friend who, back in 1998 or 1999, printed out all his emails. He was really active on IRC so he had a lot of email, and he wasnt savvy enough to know he could copy his mbox. He carried those papers around with him for decades and finally realized he never flipped through them once so he recycled them all. I think that’s the case with almost all the data that we produce. It’s not useful for us, but probably useful for companies like Google to create models of our thought patterns.

remote_phone | 4 months ago

My misc digital personas have gone to the great /dev/null in the sky many times.

BIX, CompuServe, FidoNet, usenet, dozens of email accounts, voicemails, interviews, chat logs, facebook...

Heck, I have zero trace of the specialty BBS (and network) I hosted. Which was my everything at the time.

Young me thought it'd be great to record and archive my lifestream.

Current me is glad it didn't work out.

Sure, I may have written some clever bits of code & prose.

Now I see great value in forgetting. It's hard to adapt, grow, accept, forgive, and be forward looking while lugging around a lifetime of baggage.

If I ever did say something useful, I'm confident someone will pick up the thread. Whether independently or by quoting me, it's all good.

specialist | 4 months ago

This is akin to saying get all your trash back from the landfill before it closes.

nutate | 4 months ago

I got chased off Reddit by a cabal of power hungry woke mods stalking me. I'm going to be celebrating when Reddit becomes the next Digg. They did it to themselves and deserve everything that's coming to them.

jsz0 | 4 months ago

Imo, the most valuable data to save is not what you have posted, but the things that you have read that have made a substantial difference in how you view and exist in the world.

bsnnkv | 4 months ago

Does this include any contextual data surrounding your posts, like what you replied to, etc? That seems to me to be as important as what someone has written.

Baeocystin | 4 months ago

Awesome, I really wanted something like this. I have a lot of reddit posts and it would be neat to cringe at my younger self.

EamonnMR | 4 months ago

Does it capture saves and upvotes too? I use those actions more than posting and commenting.

andrethegiant | 4 months ago

Data-wise, what additional data does this include compared to Reddit's default export?

phoenixreader | 4 months ago

I would love to be able to download and archive all the posts I ever upvoted. Honestly that would have more value to me than my own comments. Does anyone know if there is a project for this?

doctoboggan | 4 months ago

> When I heard the news, I realized I'd be upset if I wasn't able to access my contributions anymore.

Clearly all I’ve put on Reddit is dross. I can’t think of anything I’d care about losing.

irrational | 4 months ago

@xavdid Very nice tool. I was able to trivially find my earliest post (17 yrs ago). With comments, sorting on the timestamp only went back to 2021, though. Is this a feature?

(And, the process was really easy, just as the blog post described. Requires Python 3.11+.)

pixelmonkey | 4 months ago

Please delete all my old stuff. Esp the embarrassing comments and likes.

rr808 | 4 months ago

I figure I'll get sent to the gulag sooner or later anyway.

WalterBright | 4 months ago

imo this is not the beginning of reddit’s enshittification. if one can even pick a beginning that is probably when they started the redesign.

26fingies | 4 months ago

What reddit data do people care about?

manicennui | 4 months ago

If you hope Reddit dies, you are also indirectly hoping for HN to become more like Reddit, because that will be one of the consequences.

acyou | 4 months ago

This is very helpful! Thanks

daveidol | 4 months ago

The GDPR / CPRA data request allows for downloading all data.

Has anyone here tried it? I'm curious what kind of formatting is used for the data dump.

simple10 | 4 months ago

"before it's too late"? Is GDPR being repealed or something? Kind of a FUD title.

I always thought "reddit is cesspool" was a meme. I made an account a month ago after lurking for months and good god its worst than i imagined. It was fun when browsing and commenting on small subs but write anything in the bigger one and you'll get swarmed. Just don't steer from any of these narratives : 1.America is always good. They never did anything bad. They saved the world and they're heroes. 2. Apple is god. 2. China and Russia are very bad. American invasions were justified. 3. Chinese,product, companies are all bad. All they do is copy the almighty American companies.

diabolo96 | 4 months ago