Advanced Voice Mode is amazing.

First of all, I'd like to make it clear that my ChatGPT usage is very "non-personal". I never made it my therapist or friend or things like that because I never thought it was very personal and legitimately insightful or inspiring, just generic. I love it for coding and summarizing texts, which I pay plus for, so I decided to chat with Advanced Voice Mode.

Holy fucking shit. What the hell. This thing is alive.

So, me and my fiancée were just talking and wanted a Netflix recommendation. Sure, it gave a few bad ones. And on the search box for netflix it suggested Trivia Quest, and it's been a few years that I don't play Trivia Quest because I think the questions people put there are way too easy.

Cue to us playing 20 minutes of Trivia Quest with AVM. Dude. What the hell. It actually went slightly biased against asking History because I mentioned we were history grads, but it never mentioned it was deliberately doing that, we just realized it after. It gave her classic literature and arts questions and me comp sci and others. We asked for it to bump the difficulty and it did well. It even covertly gave me two very hard ones when my fiancée got two for her previously wrong just so we kept a tie for most of the game. Sure, text chat can do those. But the most interesting part was the actual voice interactions. This is not a quick Speech-to-Text, it

  1. Knew exactly when it was me or my fiancée speaking, differentiating between male and female voices
  2. Knew not to be confused by one interrupting the other, or simultaneous talking
  3. Had the subtlety to identify when a player was thinking out loud or actually answer from the voice tone. I was wondering if the 2001 Space Odyssey computer (it asked me its name) was HAL9001 or HAL9000 and it was like "I'll give you the point, it's HAL9000". I also made fun of it for mentioning HAL9000 and it got serious all of the sudden going "my directives forbid me to discuss this" and went right back (cool)
  4. Easily went back to exactly were it stopped when we interrupted it accidentally.
  5. Changed languages seamlessly, in the middle of sentences if asked.
  6. Could easily navigate some chaotic, party-like conversation of people correcting themselves while others were talking while others were thinking out loud.
  7. Individually addressed each of the talking points
  8. From the subtlety of our voices if calibrated the difficulty. Like, it asked my fiancée who painted The Garden of Earthly Delights and she woo'ed it verbally, and after that it asked her who wrote the Divine Comedy and she got it right, so it wanted to keep the theme of Renaissance but in a way that she could get it right.

I hate Alexa. I hate Google Assistant. I hate Siri. They're all so dumb and unresponsive and laggy and bad listeners. It's so fucking amazing that for ChatGPT's AMV we don't need to "guess" exactly when the program's going to stammer, what he will get wrong, what will need to be repeated, emulate specific accents, talk a person at a time very slowly and hope it understands when a music is not the language it's programmed for.

This is groundbreaking. How the hell did OpenAI do this? From nowhere? I thought Generative Pre-Trained Transformers were for language tokens, but this is not a simple matter of passing a STT-generated token through 4o. This is insane. There's no caveat to this. There's no "catch". No habit you need to have so the robot does what you want it to do. It actually feels like you're talking to someone over the phone.