Microsoft's paper on OpenAI's GPT-4 had hidden information

georgehill | 317 points

The comments in interpretability read like science fiction to me. There's paragraphs on DV3 explaining other models and itself and the emergent properties that appear with bigger models. There's so much commented out related to functional explainability and counterfactual generations.

"we asked DV3 for an explanation. DV3 replied that it detected sarcasm in the review, which it interpreted as a sign of negative sentiment. This was a surprising and reasonable explanation, since sarcasm is a subtle and subjective form of expression that can often elude human comprehension as well. However, it also revealed that DV3 had a more sensitive threshold for sarcasm detection than the human annotator, or than we expected -- thereby leading to the misspecification.

To verify this explanation, we needed to rewrite the review to eliminate any sarcasm and see if DV3 would revise its prediction. We asked DV3 to rewrite the review to remove sarcasm based on its explanation. When we presented this new review to DV3 in a new prompt, it correctly classified it as positive sentiment, confirming that sarcasm was the cause of the specification error."

The published paper instead says "we did not test for the ability to understand sarcasm, irony, humor, or deception, which are also related to theory of mind" .

The main conclusion I took away from this is "the remarkable emergence of what seems to be increasing functional explainability with increasing model scale". I can see the reasoning for why OpenAI decided not to publish any more details about the size or steps to reproduce their model. I assumed we would need a much bigger model to see these level of "human" understanding from LLMs. I can respect Meta, Google, and OpenAI's decision, but I hope this accelerates the research into truly open source models. Interacting with these models shouldn't be locked behind corporate doors.

knaik94 | 3 months ago

Sigh. What an idiot (no offense). Why tell the world you got this from the comments? Now every damn researcher is going to strip them out and for those of us who knew to look for them, take away our fun.

Never. Ever. Reveal your sources.

withinboredom | 3 months ago

There is a tool my supervisor always used to make me use to avoid this when posting to ArXiv:

max_expectation | 3 months ago

According to the Latex source, one of the original titles for this paper was "First Contact With an AGI System".

zamnos | 3 months ago

I was prepared to be very amused if this was the result of Windows screenshot tool acropalypse.

xnx | 3 months ago

Interesting that it was originally called DV3 in the paper - looks similar to the name of the existing, older "davinci-003" model, which powered GPT-3

kgeist | 3 months ago

Interesting that they note the power consumption and climate change impact. I believe there's a long list of folks who said this wasn't the case weeks ago.

kodah | 3 months ago

Err wow, they left all the (very weird) comments in looking at the source. Our group always makes sure to strip the comments.

buildbot | 3 months ago

Most of the paper probably was written by the model itself. They just removed the hallucinations.

jurimasa | 3 months ago

> we asked DV3 for an explanation. DV3 replied that it detected sarcasm

They should have asked it "Why do you say there is sarcasm?" A human can answer that but I don't think a bot can, can it?

galaxyLogic | 3 months ago

Always remember to clean up the .tex files before submitting to arxiv

vaastav | 3 months ago

Can anyone post the remaining hidden information from that paper in an easy to read form? Is there really some secret sauce in there ?

polskibus | 3 months ago

tweet #2 in the thread was missing, wonder what happened there?

subsubzero | 3 months ago

> we were worried about the unknown alignment procedures that OpenAI had taken to reduce the harmfulness of this powerful AI model

Is the user referring to the ai alignment? the elitists who wants to nerf all the ai research?

Alifatisk | 3 months ago

I mean does it mean it's accurate information? Could be a repurposed copy of some other document. Maybe they wanted it to be written by DV-3 and it didn't pan out, but they continued using the draft document anyway.

I know from personal experience that I've had draft documents that were WILDLY wrong before I published to anyone but myself. Whole sections I just went back and completely deleted. In fact my senior project paper (LaTeX) in college had a whole section with big ASCII bull taking a shit on a paragraph because it was some work I'd done that didn't pan out at all. I left it in the source because I found it funny. lol, I found it:

This was before I'd ever heard of a VCS system. Subversion 1.0 was released 6 months after I graduated, it turns out. So commented out code and multiple copies was all I had.

ChickenNugger | 3 months ago

Every so often, very rarely, I end up wanting to read some twitter content.

And I realise how agonisingly painful twitter threads are to consume.

It's just as bad as those YOU WONT BELIEVE WHERE THOSE CELEBS ARE NOW. Where you had to click next one by one.

psychphysic | 3 months ago


boringuser1 | 3 months ago


mudlus | 3 months ago

This is generally clickbait with a large amount of vapid information. I am surprised to be honest that HackerNews is giving the attention to it that they are. I would not encourage giving this any more attention.

As an early draft, them putting in a placeholder of ~'this model uses a lot of compute {TODO: put in cost estimates here?}' does not at all equal 'the authors didn't even know how much it cost to train the model!' Additionally, of course the toxicity went down. There's a world of RLHF between here and that original draft, and they've shown how the non-RLHF model has lowered the toxicity of the untrained base model significantly. If the author of the tweets had done their due diligence, they might have noticed that.

Rather obviously around the time when the model was originally being developed, text-only was sorta really the only way that LLMs were done. Them pivoting to multi-modal is just a natural part of following what works and what doesn't. This is really straightforward, I am mind-boggled that this is getting attention over discourse that is meaningful to the tidal wave of change coming with these models.

One final note that this is a bit of shoveltext is at the bottom the author offers up vague concerns followed by mass-tagging accounts with high follow counts, to include Elon Musk.

I'd encourage you not even to give the tweet the benefit of your view count and to just move on to more valuable discussions that are taking place. Why not take a look at a fun little thread like ? (not affiliated other the fact that I made the first comment on it, I just pulled from the rising threads on HN's frontpage)

tysam_and | 3 months ago

The spontaneous toxic content stuff is a little alarming, but probably in the future there will be gpt-Ns that have their core training data filtered so all the insane reddit comments aren't part of their makeup.

goodgoblin | 3 months ago