This is a common thing in computer town, but if it's your first time running up against "this thing is much more complicated than you think", hoo boy, you're in for a ride
@BestGirlGrace I find that rites during the full moon tend to work best, but the chalice has to be poured over your hard drive at the EXACT moment of the moon's zenith or else nothing compiles right.
@BestGirlGrace hhh i feel this
@BestGirlGrace like, just this toot sans context
@canary yeah, both computers and otherwise
@BestGirlGrace one time when I was still an intern with SPAWAR, this whole project ground to a halt because the database connection kept crashing whenever the program went to store text from the tweets we wanted to save for later analysis and it took me two days to figure out that emojis and non-ascii chars were the problem (the SQL database’s implementation of UTF8 was non-standard) and then fix it.
@Sapphicgiraffic I was specifically thinking about the "picture of a bird" XKCD, but there's gotta be one about how emoji break things
@BestGirlGrace yeah that was one of them and the other was the “N competing standards becomes N+1 competing standards” wrt character encodings.
Now if u really wanna fuck some people’s minds, just wait for somebody to ask about video codecs...
@VyrCossont @BestGirlGrace that sounds familiar. It was definitely MySQL. I don’t remember what version I was on and I think I switched to something called “utf8-extended” which is a funny thing to call the full implementation but 🤷🏻♀️
Fortunately for me at the time it was just an early stage research project so I could dump it and start fresh instead of having to migrate everything.
@Sapphicgiraffic @BestGirlGrace I'm thinking of a post I saw a while back about how JS and python3 interpret "length of a string" differently... apparently JS is just (the length of the utf16 in bytes, divided by 2) which of course is not really intuitive for codepoints which take more than two bytes in utf16
@transbian_tronbreon @Sapphicgiraffic Yeah, the two reasonable meanings for "length of a string" are "how many bytes are in this for serialization reasons", in which case you really want to talk about encodings, or [long Unicode discussion because is a letter with a combining diacritic after two characters or one? what about those ZWJ sequences?]
@BestGirlGrace all true, but the other side of the problem is advisors who think the querant needs *expertise* in some domain before they can possibly be helped. For every "look, you need to know some basics about Unicode and encodings to approach this" there are a dozen "you need to shave the Unicode Yak before you are worthy"
Don't let the name fool you. All the pornography here is legal, and much of it is hand-written. No fascists, no bigots.