Interning in Go

todsacerdoti | 138 points

Interestingly enough, by following up some references of the article I discovered that Go is also following up on Java and .NET design decisions, that maybe could be there early on.

- Deprecating finalizers and using cleaner queues (https://openjdk.org/jeps/421)

- Weak references (https://learn.microsoft.com/en-us/dotnet/api/system.weakrefe..., https://docs.oracle.com/javase/8/docs/api/java/lang/ref/Weak...)

Related tickets,

"runtime: add AddCleanup and deprecate SetFinalizer" - https://github.com/golang/go/issues/67535

"weak: new package providing weak pointers" - https://github.com/golang/go/issues/67552

One day Go will eventually have all those features that they deemed unnecessary from "complex" languages.

pjmlp | 15 hours ago

Interning is neat. Most of my experience is really dated. Primarily in the JVM, and mostly for class names, for reflection and class loaders. It's sort of surprising seeing this added to go, with its desires for minimalism. But when you can use it, it can be a big win.

Look past the "loading the whole book in memory" the author gets to the point soon enough.

The ip address example is ok. It's true, and highlights some important points. But keep in mind pointers are 64 bit. If you're not ipv6, and you're shuffling a lot of them, you're probably better off just keeping the uint64 and converting to string and allocating the struct as needed. interning doesn't appear to be much of a win in that narrow case. but if you do care about ipv6, and you're connecting to millions of upstreams, it's not unreasonable.

It's neat it's available. it's good to be aware of interning, but it's generally not a huge win. For a few special cases, it can be really awesome.

** edit uint32 for ipv4. bit counting is hard.

jfoutz | 21 hours ago

The unique package is my top feature for go1.23. I've been experimenting with it in rclone.

People often want to have millions of S3 objects in memory and reducing the memory used would be very desirable.

I interned all the strings used - there are a lot of duplicates like Content Type and it reduced the memory usage by about 30% which is great.

I wonder how much difference this little fix mentioned in the article for go1.23.2 will make? https://github.com/golang/go/issues/69370

The article also mentions strings.Clone which has been around for a while. Using that is very easy and it stops big strings being pinned into memory. I had a problem with this in the S3 backend where some string was pinning the entire XML response from the server which strings.Clone fixed.

nickcw | 14 hours ago

Beware the trade-offs of interning affecting GC behavior. Now you can’t have a stack-allocation optimization, for example.

survivedurcode | 21 hours ago

This is new for Go? I remember learning about Java string interning decades ago in the context of xml parsers. If I remember correctly, there were even some memory leaks associated with it and thread locals?

morkalork | 21 hours ago

I missed the initial blogpost about this; thanks for the solid explanation and the links. Probably won't make much of a difference for my use cases but cool to know this is now in the stdlib.

peterldowns | a day ago

Cool idea, but sounds detrimental in terms of cache efficiency. Typically processing a string by reading it sequentially is quite cache efficient as the processor will prefetch, but with this method it seems like the string will not be contiguous in memory which will lead to more cache misses.

cherryteastain | 16 hours ago

Will this work across different goroutines?

saylisteins | 7 hours ago

  > I have a very large plaintext file and I’m loading it fully in memory. This results in a string variable book that occupies 282 MiB of memory.
At what point does something become (very) large?
pjot | 20 hours ago

Is this the same as grpc.SharedBufferPool? The gRPC implementation does a lot of memory allocation.

favflam | 16 hours ago

For reference, the term comes from Lisp’s INTERN. [1]

[1] http://clhs.lisp.se/Body/f_intern.htm

User23 | 21 hours ago

Is this essentially dictionary compression ?

liotier | a day ago

Couldn’t you read in the book a bit smarter by deduplicating the io stream?

smellybigbelly | 17 hours ago

Eh? Seems rather trivial to me. In python one could implement it by themselves with a set, a list, and a for loop.

It hardly seems worthy having a library over such a specific use case.

guappa | 12 hours ago
[deleted]
| 9 hours ago