the biggest example of the "it's impossible to tell if a given computer problem is easy or hard" is that correctly solving "I want to store, show, and share some text" involves a lot of "Well, first of all, we have to talk about Unicode..."

I spend time in a c# advice chat, and let me tell you, people get really mad when they want to know "why can't I just convert a string to an array of bytes" and you start talking about encodings

it's a more general version of the problem that, by the time people show up in a chat like this, they're pretty much at the end of their rope and you're like the third person who's tried to say "your question doesn't make any sense, you have to know about X"

This is a common thing in computer town, but if it's your first time running up against "this thing is much more complicated than you think", hoo boy, you're in for a ride

(everything's more complicated than you think. computers are sand we tricked into thinking. it's a giant pillar of abstraction and nobody knows how the whole thing works.)

"convert" is usually a red flag that the person in question doesn't really know that they're trying to do, they just know that they have THIS and they want THAT

you don't "convert a string to an array of bytes", you get the bytes that correspond to that string in some encoding. this sounds like nitpicking, but the thing about bytes is that they don't mean anything unless you know how to interpret them

the only thing computers can do is shuffle bytes around in specific ways extremely quickly. everything else is stuff humans did to give meaning to the bytes you're shuffling around.

Nobody tells you this, by the way. You kinda have to figure it out.

notice how I didn't say "knows how to do". I, and a lot of other people who work with computers, tend to anthropomorphize the computer. Computers can't know things. They can't be smart or dumb. They can store bytes, and they can either be programmed to do something or not. Some things are easy to program, some things are hard to program, and some things are impossible.

Joel Spolsky is wrong about a lot of things, but he was correct on at least two:
- you have to know what Unicode is and what encodings are
- one of the most valuable computer science skills is being able to work at multiple layers of abstraction

also, abstractions are extremely good and critical to actually getting anything done, but it's immensely helpful to, at some point, know what's being abstracted from you

I hated taking an assembly language course, but I love having taken an assembly course

see: all the discussions about "is Java/C# pass by value or pass by reference", to which the answer is "by value, including the references", and then you have to say "a reference is an abstraction over a pointer" and then you have to have the pointer discussion that you were trying to avoid by working in Java or C#

@BestGirlGrace you want to _show_ text? Grace, that's absurd, _fonts_ are involved.

@VyrCossont Bitmap fonts, ideally burned into a ROM chip somewhere: easy peasy
anything more sophisticated than that: oh buddy, hope you wanted to learn an inappropriate amount of vector math

@BestGirlGrace @VyrCossont store text?! lol how about you write some disk drivers and filesystems first

@BestGirlGrace @VyrCossont I'm up for it, just pray that you'll not need overlapping strokes because those are inefficient and thus, not allowed :3

@VyrCossont @BestGirlGrace Alrighty, I am though. If you looked at me you'd see me Z-fighting.

mildly lewd, graphics-related request to reader 

mildly lewd, graphics-related request to reader 

mildly lewd, graphics-related request to reader 

@BestGirlGrace one time when I was still an intern with SPAWAR, this whole project ground to a halt because the database connection kept crashing whenever the program went to store text from the tweets we wanted to save for later analysis and it took me two days to figure out that emojis and non-ascii chars were the problem (the SQL database’s implementation of UTF8 was non-standard) and then fix it.

@Sapphicgiraffic @BestGirlGrace there’s like at least two XKCD comics about this let me see if I can find ‘em...

@Sapphicgiraffic I was specifically thinking about the "picture of a bird" XKCD, but there's gotta be one about how emoji break things

@BestGirlGrace yeah that was one of them and the other was the “N competing standards becomes N+1 competing standards” wrt character encodings.

Now if u really wanna fuck some people’s minds, just wait for somebody to ask about video codecs...

@Sapphicgiraffic @BestGirlGrace ahahaha was this MySQL and the infamous utf8 aka utf8mb3 encoding? it took my last company months to migrate everything to utf8mb4, but we had to because our players were pissed about not being able to yell at each other in emojis and/or Chinese

@VyrCossont @BestGirlGrace that sounds familiar. It was definitely MySQL. I don’t remember what version I was on and I think I switched to something called “utf8-extended” which is a funny thing to call the full implementation but 🤷🏻‍♀️

Fortunately for me at the time it was just an early stage research project so I could dump it and start fresh instead of having to migrate everything.

@Sapphicgiraffic @BestGirlGrace look, this is MySQL, you used to have to tell it if you didn't want it to silently truncate overlength strings and allow division by zero

but on the other hand, its default mode now includes something called STRICT_TRANS_TABLES, so who can say if it's bad or not

@VyrCossont @Sapphicgiraffic This is when they hire me to stand there with a whip and keep things in shape.

@VyrCossont @Sapphicgiraffic At my first programming job, I was trying to convince the DBA to use the "Unicode text" columns for names instead of just messages in case anyone had an accent in their name, and she said (and I hope it was a joke) that "those people are all terrorists anyways"

@BestGirlGrace @Sapphicgiraffic the kind of military population that is responsible for ending the world in 90 minutes or less, yeah? 😖

@VyrCossont @Sapphicgiraffic I don't think they keep nukes at Ellsworth any more, but once upon a time, yeeep

@Sapphicgiraffic @BestGirlGrace I'm thinking of a post I saw a while back about how JS and python3 interpret "length of a string" differently... apparently JS is just (the length of the utf16 in bytes, divided by 2) which of course is not really intuitive for codepoints which take more than two bytes in utf16

@transbian_tronbreon @Sapphicgiraffic Yeah, the two reasonable meanings for "length of a string" are "how many bytes are in this for serialization reasons", in which case you really want to talk about encodings, or [long Unicode discussion because is a letter with a combining diacritic after two characters or one? what about those ZWJ sequences?]

@BestGirlGrace @Sapphicgiraffic I would just say there are 3 useful "length of string":
1) num bytes in the encoded version (C's strlen() or similar)
2) num unicode scalars (rust's .chars() or similar)
3) pixels wide when rendered (SDL's TTF_Size*() or similar)

@BestGirlGrace all true, but the other side of the problem is advisors who think the querant needs *expertise* in some domain before they can possibly be helped. For every "look, you need to know some basics about Unicode and encodings to approach this" there are a dozen "you need to shave the Unicode Yak before you are worthy"

@BestGirlGrace "Returns a byte array corresponding to a bitmap representation of the rendered text in some font"

@BestGirlGrace "How do I get text back out?"

"Tell me when you can do that reliably and then you'll be famous and know things"

@BestGirlGrace i'm still having trouble with "why?"

it's never explained in any book or video i've seen.

okay, so i added these two numbers and put them on the stack. why do i want to do that? what does it do? how does knowing this result in a new cracktro with demoscene music?

@BestGirlGrace Abstractions are too leaky to actually help simplify things in the end :D

@violet At some point, the abstraction will leak and you'll have to learn about what it's hiding from you, yeah, but it's still useful to have- and, ideally, limit the amount of stuff you have to learn to fix it.

@BestGirlGrace I still think working from the bottom up is probably a better didactic practice, but ultimately that takes a lot of time.

Sign in to participate in the conversation
Princess Grace's Space Base Place

Don't let the name fool you. All the pornography here is legal, and much of it is hand-written. No fascists, no bigots.