C++ proposal: There are exactly 8 bits in a byte

Twirrim | 288 points

Previously, in JF's "Can we acknowledge that every real computer works this way?" series: "Signed Integers are Two’s Complement" <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p09...>

favorited | 9 months ago

During an internship in 1986 I wrote C code for a machine with 10-bit bytes, the BBN C/70. It was a horrible experience, and the existence of the machine in the first place was due to a cosmic accident of the negative kind.

pjdesno | 9 months ago

D made a great leap forward with the following:

1. bytes are 8 bits

2. shorts are 16 bits

3. ints are 32 bits

4. longs are 64 bits

5. arithmetic is 2's complement

6. IEEE floating point

and a big chunk of wasted time trying to abstract these away and getting it wrong anyway was saved. Millions of people cried out in relief!

Oh, and Unicode was the character set. Not EBCDIC, RADIX-50, etc.

WalterBright | 9 months ago

Some people are still dealing with DSPs.

https://thephd.dev/conformance-should-mean-something-fputc-a...

Me? I just dabble with documenting an unimplemented "50% more bits per byte than the competition!" 12-bit fantasy console of my own invention - replete with inventions such as "UTF-12" - for shits and giggles.

MaulingMonkey | 9 months ago

Is C++ capable of deprecating or simplifying anything?

Honest question, haven't followed closely. rand() is broken,I;m told unfixable and last I heard still wasn't deprecated.

Is this proposal a test? "Can we even drop support for a solution to a problem literally nobody has?"

harry8 | 9 months ago

Hi! Thanks for the interest on my proposal. I have an updated draft based on feedback I've received so far: https://isocpp.org/files/papers/D3477R1.html

jfbastien | 9 months ago

I have mixed feelings about this. On the one hand, it's obviously correct--there is no meaningful use for CHAR_BIT to be anything other than 8.

On the other hand, it seems like some sort of concession to the idea that you are entitled to some sort of just world where things make sense and can be reasoned out given your own personal, deeply oversimplified model of what's going on inside the computer. This approach can take you pretty far, but it's a garden path that goes nowhere--eventually you must admit that you know nothing and the best you can do is a formal argument that conditional on the documentation being correct you have constructed a correct program.

This is a huge intellectual leap, and in my personal experience the further you go without being forced to acknowledge it the harder it will be to make the jump.

That said, there seems to be an increasing popularity of physical electronics projects among the novice set these days... hopefully read the damn spec sheet will become the new read the documentation

bcoates | 9 months ago

This is both uncontroversial and incredibly spicy. I love it.

TrueDuality | 9 months ago

I'm totally fine with enforcing that int8_t == char == 8-bits, however I'm not sure about spreading the misconception that a byte is 8-bits. A byte with 8-bits is called an octet.

At the same time, a `byte` is already an "alias" for `char` since C++17 anyway[1].

[1] https://en.cppreference.com/w/cpp/types/byte

kreco | 9 months ago

Nothing to do with C++, but:

I kinda like the idea of 6-bit byte retro-microcomputer (resp. 24-bit, that would be a word). Because microcomputers typically deal with small number of objects (and prefer arrays to pointers), it would save memory.

VGA was 6-bit per color, you can have a readable alphabet in 6x4 bit matrix, you can stuff basic LISP or Forth language into 6-bit alphabet, and the original System/360 only had 24-bit addresses.

What's there not to love? 12MiB of memory, with independently addressable 6-bits, should be enough for anyone. And if it's not enough, you can naturally extend FAT-12 to FAT-24 for external storage. Or you can use 48-bit pointers, which are pretty much as useful as 64-bit pointers.

js8 | 9 months ago

There are DSP chips that have C compilers, and do not have 8 bit bytes; smallest addressable unit is 16 (or larger).

Less than a decade ago I worked with something like that: the TeakLite III DSP from CEVA.

kazinator | 9 months ago

I just put static_assert(CHAR_BITS==8); in one place and move on. Haven't had it fire since it was #if equivalent

bobmcnamara | 9 months ago

Not sure about that, seems pretty controversial to me. Are we forgetting about the UNIVACs?

JamesStuff | 9 months ago

What will be the benefit?

- CHAR_BIT cannot go away; reams of code references it.

- You still need the constant 8. It's better if it has a name.

- Neither the C nor C++ standard will be simplified if CHAR_BIT is declared to be 8. Only a few passages will change. Just, certain possible implementations will be rendered nonconforming.

- There are specialized platforms with C compilers, such as DSP chips, that are not byte addressable machines. They are in current use; they are not museum pieces.

kazinator | 9 months ago

> We can find vestigial support, for example GCC dropped dsp16xx in 2004, and 1750a in 2002.

Honestly kind of surprised it was relavent as late as 2004. I thought the era of non 8-bit bytes was like 1970s or earlier.

bawolff | 9 months ago

JF Bastien is a legend for this, haha.

I would be amazed if there's any even remotely relevant code that deals meaningfully with CHAR_BIT != 8 these days.

(... and yes, it's about time.)

Quekid5 | 9 months ago

The current proposal says:

> A byte is 8 bits, which is at least large enough to contain the ordinary literal encoding of any element of the basic character set literal character set and the eight-bit code units of the Unicode UTF-8 encoding form and is composed of a contiguous sequence of bits, the number of which is bits in a byte.

But instead of the "and is composed" ending, it feels like you'd change the intro to say that "A byte is 8 contiguous bits, which is".

We can also remove the "at least", since that was there to imply a requirement on the number of bits being large enough for UTF-8.

Personally, I'd make a "A byte is 8 contiguous bits." a standalone sentence. Then explain as follow up that "A byte is large enough to contain...".

boulos | 9 months ago

Hmm, I wonder if any modern languages can work on computers that use trits instead of bits.

https://en.wikipedia.org/wiki/Ternary_computer

pabs3 | 9 months ago

While we're at it, perhaps we should also presume little-endian byte order. As much as I prefer big-endian, little-endian had won.

As consolation, big-endian will likely live on forever as the network byte order.

RJIb8RBYxzAMX9u | 9 months ago

As a person who designed and built a hobby CPU with a sixteen-bit byte, I’m not sure how I feel about this proposal.

DowsingSpoon | 9 months ago

But how many bytes are there in a word?

throwaway889900 | 9 months ago

So please do excuse my ignorance, but is there a "logic" related reason other than hardware cost limitations ala "8 was cheaper than 10 for the same number of memory addresses" that bytes are 8 bits instead of 10? Genuinely curious, as a high-level dev of twenty years, I don't know why 8 was selected.

To my naive eye, It seems like moving to 10 bits per byte would be both logical and make learning the trade just a little bit easier?

donatj | 9 months ago

I wish I knew what a 9 bit byte means.

One fun fact I found the other day: ASCII is 7 bits, but when it was used with punch cards there was an 8th bit to make sure you didn't punch the wrong number of holes. https://rabbit.eng.miami.edu/info/ascii.html

AlienRobot | 9 months ago

Ignoring this C++ proposal, especially because C and C++ seem like a complete nightmare when it comes to this stuff, I've almost gotten into the habit of treating a "byte" as a conceptual concept. Many serial protocols will often define a "byte", and it might be 7, 8, 9, 11, 12, or whatever bits long.

bmitc | 9 months ago

  #define SCHAR_MIN -127
  #define SCHAR_MAX 128
Is this two typos or am I missing the joke?
lowbloodsugar | 9 months ago

And then we lose communication with Europa Clipper.

aj7 | 9 months ago

Why? Pls no. We've been told (in school!) that byte is byte. Its only sometimes 8bits long (ok, most of the time these days). Do not destroy the last bits of fun. Is network order little endian too?

hexo | 9 months ago

This is entertaining and probably a good idea but the justification is very abstract.

Specifically, has there even been a C++ compiler on a system where bytes weren't 8 bits? If so, when was it last updated?

masfuerte | 9 months ago

Don't Unisys' Clearpath mainframes (still commercially available, IIRC) 36-bit word and 9-bit bytes?

OTOH, I believe C and C++ are not recommended as languages on the platform.

rbanffy | 9 months ago

C++ 'programmers' demonstrating their continued brilliance at bullshitting people they're being productive (Had to check if publishing date was April fools. It's not.) They should start a new committee next to formalize what direction electrons flow. If they do it now they'll be able to have it ready to bloat the next C++ standards no one reads or uses.

Uptrenda | 9 months ago

the fact that this isn't already done after all these years is one of the reasons why I no longer use C/C++. it takes years and years to get anything done, even the tiniest, most obvious drama free changes. contrast with Go, which has had this since version 1, in 2012:

https://pkg.go.dev/builtin@go1#byte

38 | 9 months ago

Incredible things are happening in the C++ community.

adamnemecek | 9 months ago

I wish the types were all in bytes instead of bits too. u1 is unsigned 1 byte and u8 is 8 bytes.

That's probably not going to fly anymore though

vitiral | 9 months ago

I like the diversity of hardware and strange machines. So this saddens me. But I'm in the minority I think.

IAmLiterallyAB | 9 months ago

Hoesntly at thought this might be an onion headline. But then I stopped to think about it.

whatsakandr | 9 months ago

There are FOUR bits.

Jean-Luc Picard

starik36 | 9 months ago

I'm appalled by the parochialism in these comments. Memory access sizes other than 8 bits being inconvenient doesn't make this a good idea.

zombot | 8 months ago

Amazing stuff guys. Bravo.

gafferongames | 9 months ago

This is an egoistical viewpoint, but if I want 8 bits in a byte I have plenty of choices anyway - Zig, Rust, D, you name it. Should the need for another byte width come up, either for past or future architectures C and C++ are my only practical choice.

Sure, it is selfish to expect C and C++ do the dirty work, while more modern languages get away skimping on it. On the other hand I think especially C++ is doing itself a disservice trying to become a kind of half-baked Rust.

weinzierl | 9 months ago

Bold leadership

scosman | 9 months ago

But think of ternary computers!

cyberax | 9 months ago

How many bytes is a devour?

MrLeap | 9 months ago

In a char, not in a byte. Byte != char

Iwan-Zotow | 9 months ago

Just mandate that everything must be run on an Intel or ARM chip and be done with it. Stop pretending anything else is viable.

Suzuran | 9 months ago

formerly or formally?

time4tea | 9 months ago

Obviously

CephalopodMD | 9 months ago

[dead]

electricdreams | 9 months ago