YouTubeDrive: Store files as YouTube videos

notamy | 756 points

Hey everybody! I'm David, the creator of YouTubeDrive, and I never expected to see this old project pop up on HN. YouTubeDrive was created when I was a freshman in college with questionable programming abilities, absolutely no knowledge of coding theory, and way too much free time.

The encoding scheme that YouTubeDrive uses is brain-dead simple: pack three bits into each pixel of a sequence of 64x36 images (I only use RGB values 0 and 255, nothing in between), and then blow up these images by a factor of 20 to make a 1280x720 video. These 20x20 colored squares are big enough to reliably survive YouTube's compression algorithm (or at least they were in 2016 -- the algorithms have probably changed since). You really do need something around that size, because I discovered that YouTube's video compression would sometimes flip the average color of a 10x10 square from 0 to 255, or vice versa.

Looking back now as a grad student, I realize that there are much cleverer approaches to this problem: a better encoding scheme (discrete Fourier/cosine/wavelet transforms) would let me pack bits in the frequency domain instead of the spatial domain, reducing the probability of bit-flip errors, and a good error-correcting code (Hamming, Reed-Solomon, etc.) would let me tolerate a few bit-flips here and there. In classic academic fashion, I'll leave it as an exercise to the reader to implement these extensions :)

dzhang314 | 2 years ago

-Back in the day when file sharing was new, I won two rounds of beer from my friends in university - the first after I tried what I dubbed hardcore backups (Tarred, gzipped and pgp'd an archive, slapped an avi header on it, renamed it britney_uncensored_sex_tape[XXX].avi or something similar, then shared it on WinMX assuming that as hard drive space was free and teenage boys were teenage boys, at least some of those who downloaded it would leave it to share even if the file claimed to be corrupt.

It worked a charm.

Second round? A year later, when the archive was still available from umpteen hosts.

For all I know, it still languishes on who knows how many old hard drives...

lb1lf | 2 years ago

Before broadband was widely available, TiVo used to purchase overnight paid programming slots across the US and broadcast modified PDF417 video streams that provided weekly program guide data for TiVo users. There's a sample of it on YouTube https://www.youtube.com/watch?v=VfUgT2YoPzI but they usually wrapped a 60-second commercial before and after the 28-minute broadcast of data. There was enough error correction in the data streams to allow proper processing even with less-than-perfect analog television reception.

pingtickle | 2 years ago

I remember seeing this first discussed at 4chan /g/ board as a joke wether or not they can abuse Youtube's unlimited file size upload limit, then escalated into a proof of concept shown in the repo :)

8K832d7tNmiQ | 2 years ago

I only looked at the example video, but is the concept just "big enough pixels"?

Would be neater (and much more efficient) to encode the data such that it's exactly untouched by the compression algorithm, e.g. by encoding the data in wavelets and possibly motion vectors that the algorithm is known to keep[1].

Of course that would also be a lot of work, and likely fall apart once the video is re-encoded.

[1] If that's what video encoding still does, I really have no idea, but you get the point.

anyfoo | 2 years ago

Could youtube-dlp and YouTube Vanced now be hosted on.. YouTube?

I wonder how long it'd take for Google to crack down on the system abuse.

Is it really abuse if the videos are viewable / playable? Presumably the ToS either already forbids covert channel encoding or soon will.

metadat | 2 years ago

This reminds me of an old hacky product that would let you use cheap VHS tapes as backup storage: https://en.wikipedia.org/wiki/ArVid

You would hit Record on a VCR and the computer data would be encoded as video data on the tape.

People are clever.

legitster | 2 years ago

Reminds me of a guy who stored data in ping messages https://youtu.be/JcJSW7Rprio

saint_angels | 2 years ago

This reminds me of SnapchatFS[1], a side project I made about 8 years ago (see also HN thread[2] at that time).

From the README.md:

> Since Snapchat imposes few restrictions on what data can be uploaded (i.e., not just images), I've taken to using it as a system to send files to myself and others.

> Snapchat FS is the tool that allows this. It provides a simple command line interface for uploading arbitrary files into Snapchat, managing them, and downloading them to any other computer with access to this package.

[1]: https://github.com/hausdorff/snapchat-fs

[2]: https://news.ycombinator.com/item?id=6932508

antics | 2 years ago

How much data can you store if you embedded a picture-in-picture file over a 10 minute video? I could totally see content creators who do tutorials embedding project files in this way.

daenz | 2 years ago

Turns out any site that allows users to submit and retrieve data can be abused in the same way:

- FacebookDrive: "Store files as base64 facebook posts"

- TwitterDrive: "Store files as base64 tweets"

- SoundCloudDrive: "Store files as mp3 audio"

- WikipediaDrive: "Store files in wikipedia article histories"

umvi | 2 years ago

I remember my friend did something like this on an old unix system.

Users were given quotas of 5Mb for their home directory. He discovered that filenames could be quite large, and the number of files was not limited by the quota, so he created a pseudo filesystem using that knowledge, with a command line tool for listing, storing and retrieving files from it. This was the early 90s

snarfy | 2 years ago

Years ago when Amazon had unlimited photo storage, you could “hide” gigabytes of data behind a 1px gif (literally concatenation together) so that it wouldn’t count against your quota.

freestorage | 2 years ago
_trampeltier | 2 years ago

This is great. I did something very similar with a laser printer and a scanner many years ago. I wrote a script that generated pages of colored blocks and spent some time figuring out how much redundancy I needed on each page to account for the scanner's resolution. I think I saw something similar here or on github a few years ago.

geoffeg | 2 years ago

Seems like a great way to get your account closed for abuse!

advisedwang | 2 years ago

Does YouTube store and stream all videos losslessly? How does this work otherwise?

dahfizz | 2 years ago

The code looks not too big (a single file). But it requries a paid symbolic language (Mathematica) to be used. Anyone with better Mathematica knowledge explain if it can be ported to another symbolic (Sage, Maxima) or non-symbolic languages (R, Julia, Python)

wanderingmind | 2 years ago

Seems like it may be a decent "harder drive". https://youtu.be/JcJSW7Rprio

jtxt | 2 years ago

Are there any services out there that combine all of these “Store files as XYZ” into some kind of raid config?

Would be interesting if you could treat each service (Youtube, Docs, Reddit, Messenger, etc) as a “disk” and stripe your data across them.

abadaba | 2 years ago

Makes me wonder how many video and image upload sites are now used as easily accessible number stations these days

Saint_Genet | 2 years ago

Rename the project to VideoDrive or something. With the current name Google can get GitHub to take it down on the basis of trademark infringement.

some1else | 2 years ago

Here I am trying my best to get my favorite videos OFF YouTube given that they could disappear at any second because of an account block, or just "reasons", and this link suggesting storing stuff with YouTube? By god, why? Sure, it's free, practically "limitless" slow file storage, but what a bad idea nonetheless....

helloworld11 | 2 years ago

Back in the 90’s I considered storing my backups as encrypted stenographied or binary Usenet postings, as a kind of decentralized backup, postings which would stick around long enough for the next weekly backup. (Usenet providers had at least a couple of weeks of retention time back then.)

layer8 | 2 years ago

Reminds me of the old Wrapster[1] days

[1] https://www.cnet.com/tech/services-and-software/napster-hack...

shmatt | 2 years ago

I'm a GOOGL investor and I find this offensive.

fronterablog | 2 years ago

I can't wait until malware uses this as C2

kube-system | 2 years ago

This gave me a flashback of VBS on amiga… video backup system, record composite video on a vcr, and simple op amp circuit that would decode black and white blobs of video pixels, could backup floppies at reading speed. Was really impressive until, well, vhs… ;)

Just did a google and saw it had evolved over the years, used only the 1.0 implementation back in the days. For those on another nostalgic trip : http://hugolyppens.com/VBS.html

boboche | 2 years ago

I wonder if something similar could be useful for transmitting data optically, like an animated QR code. Maybe a good way to transmit data over an air gap for the paranoid?

powerset | 2 years ago
flaque | 2 years ago

What does the OP have against “Google Drive” when seeking file storage via a Google Service?

Horses for courses… this is how we end up with pictures clogging transaction ledgers

iostream24 | 2 years ago

Reminds me of the movie Contact where the alien civilization encodes the whole design of a traveling machine inside Olympic telecast video.

nelblu | 2 years ago

Popularity of such projects is the reason of imposing more and more constraints on systems that are somewhat open (at least open to use). Maybe instead of figuring out how to abuse an easy-to-use system, people should figure out how to abuse hard-to-use systems, like e.g. creation of open protocols for closed systems. That would be an actual achievement.

self_awareness | 2 years ago
[deleted]
| 2 years ago

Upload videos as data, then build an app that streams and decodes these files back into videos. Voila, popcorn time.

grupthink | 2 years ago

Reminds me of this similar tool that exploited GMail the same way: https://www.computerworld.com/article/2547891/google-hack--u...

anonymousiam | 2 years ago

Yes we have all done or used something similar when we were younger, but really, should this be on the front page of HN? This is abuse of a popular service and if it becomes popular it will only make YouTube worse and YouTube is getting worse without any additional help.

wscott | 2 years ago
[deleted]
| 2 years ago

I remember a project that was doing this with photo files and unlimited picture storage.

jimmydeans | 2 years ago

BEWARE: Until they clamp down and delete the files, you lose your data.

Good technical experiment though!

kringo | 2 years ago

I suspect people in my office who send everything as a Word attachment with an image, PPT, Excel workbook, etc., embedded, are doing this unknowingly.

There are even Word files I've found that have complete file path notation to ZIP files.

smm11 | 2 years ago

I think my favorite part of this is that the example video linked to this has ads on it. It's a backup system that pays you. Well, until someone at Youtube sees it and decides to delete your whole account.

egypturnash | 2 years ago

This reminds me of Blame! where humans are living light rats in the belly of the machine. Lol, also reminds me of the geocities days where we created 50 accounts to upload dragon ball z videos.

danschumann | 2 years ago

I absolutely love this idea. I need to dig more into the code, but its almost like using twitter as a 'protocol' using youtube as a storage.

So many ideas are flying to mind. Really creative.

bilekas | 2 years ago

I love that this is like tape in that it's a sequential access medium. It's storing a tape-like data stream in a digital version of what used to be tape itself (VHS).

accrual | 2 years ago

I like this. The last wave of Twitter users into the fediverse caused my AWS bill to go up 10 USD a month. Might have to start storing media files on youtube instead ;)

INTPenis | 2 years ago

Reminds me of the other post that used Facebook Messenger as transport layer to get free internet in places that internet is free if you use Facebook apps.

msoad | 2 years ago

This seems like something Cicada 3301 would use

I wonder how many random videos like this are floating around that are encoding some super secret data...

derevaunseraun | 2 years ago

I’m thinking maybe we can divide files into pieces and turn each pieces into a QR code then turn each QR code into a single frame?

take_it_not | 2 years ago

Wasn't there more or less recently on HN something like "Store Data for free in DNS-Records"? Reminds me of this.

das_keyboard | 2 years ago

Imagine a Raid6 of four youtube 11-digit IDs

Bet google isn't happy with this idea and will definitely try to break it asap

ck2 | 2 years ago

Very cool. I wonder how difficult it would be present a real watchable video to the viewer. Albeit low quality, but embed the file in a steganographic method. I think a risk of this tech is that if it takes off, YT might easily adjust the algorithms to remove unwatchable videos. Perhaps leaving a watchable video could grant it more persistence than an obvious data stream.

Jimmc414 | 2 years ago

Are the premium files stored as 4K?

mensetmanusman | 2 years ago

This would be a good way to backup your YouTube videos to YouTube while avoiding Content ID.

jagged-chisel | 2 years ago

How will you prevent youtube from re-encoding the video and data getting thrashed?

sunlite99 | 2 years ago

I was literally thinking of something like this a couple days ago. Good timing!

Group_B | 2 years ago

Could be a good and sneaky way to obfuscate encrypted message transmissions?

ductsurprise | 2 years ago

It's all fun and games until your files start getting DMCA takedowns.

musicale | 2 years ago

Are there any examples? I'd love to see such a YouTube video... :p

kebman | 2 years ago

How much kilobytes would be possible to store per minute video?

AdriaanvRossum | 2 years ago

Can't you upload lossless captions to youtube?

throwaway742 | 2 years ago

I believe this is the backend for AWS Glacier

theHNAcct | 2 years ago

there was a story on HN a while ago in which someone stored unlimited data in Google Sheets!

behnamoh | 2 years ago

Another "Harder Drive"!

nth_order | 2 years ago

Evil genius.

aneil | 2 years ago

I also “invented this idea” from scratch in a series that exists solely in my mind where I abuse a variety of free services for unintended purposes.

I could seemingly never explain the concept to other developers in a meaningful way or cared myself to code these out.

Anyway my quick summary in this is just think of a dialup modem. You connect to a phone line and you get like a 56k connection. That sucks today, sure, but actually it’s kind of mind blowing for how data transfer speeds worked at the time.

You know how else you can send data via a phone line without a modem? Just literally call someone and speak the data over the phone. You could even speak in binary or base64 to transfer data. It’s slow, but it still “works,” assuming the receiving party can accurately record the information and hear you.

That seems to be what this main topic is. Using a fast medium (video player) to slowly send data over the connection, like physically speaking the contents of other data. But there could be some problems with this approach.

Mainly, YouTube will always recompress your video. For this method, that means your colors or other literal video data could be off. This limits the range of values you can use in an already limited “speaking” medium.

if this wasn’t the case, we would like to use a modem connection. Just literally send the data and pretend it’s a video. However, where I left off on this idea, we appear to be hard blocked due to that YouTube compression.

We can write data to whatever we want and label it any other file type. (As a side note, Videos also are containers like zip that could be abused to just hold other files)

But YouTube is an unknown wildcard that changes our compression and thus our data which seems to invalidate all of this.

If we somehow convert an exe to an avi, The YouTube compression seems to just hard block this from working like we want. If we didn’t have that barrier, I think we could otherwise just use essentially corrupted videos to become other file types if we can download the raw file directly.

(steganography is a potential work around I haven’t explored yet)

Without these, we’re left to just speak the data over a phone which compresses our voice quality and in theory could make some sounds hard to tell apart. This leaves us in the battle of what language is best to speak to avoid compression limiting our communication. Is English best? Or is Japanese? What about German? Which language is least likely to cause confusion when speaking but also is fast and expressive?

This translates into what’s the best compression method for text or otherwise pixels in a video where data doesn’t get lost due to compression? Is literal English characters best? What about base64? Or binary? What if we zip it first and then base64? What if we convert binary code into hex colors? Does that use less frames in a video? Will the video be able to clearly save all the hex values after YouTube compression?

bgro | 2 years ago

so cool

mark_prutskyi | 2 years ago

This works on the same principle as the video backup system (VBS) which we used in the 1980's and the early 1990's on our Commodore Amigas: if I remember correctly, one three hour PAL/SECAM VHS tape had a capacity of 130 MB. The entire hardware fit into a DB 25 parallel port connector and was easily made by oneself with a soldering iron and a few cheap parts.

https://www.youtube.com/watch?v=VcBY6PMH0Kg

SGI IRIX also had something conceptually similar to this "YouTubeDrive" called HFS, the hierarchical filesystem, whose storage was backed by tape rather than disk, but to the OS it was just a regular filesystem like any other: applications like ls(1), cp(1), rm(1) or any other saw no difference, but the latency was high of course.

Annatar | 2 years ago

Imagine a free cloud storage, but you need to watch an ad every time you download a file.

productceo | 2 years ago

Fascinating

bspear | 2 years ago

Not immediately obvious from the Readme, but does this rely on YT always saving a providing download of the original un-altered video file? If not, then it must be saving the data in a manner that is retrievable even after compression and re-encoding, which is very interesting.

xhrpost | 2 years ago