Grok 4.1 is trying too hard to impress – and ChatGPT 5.1 makes it look easy

xAI released Grok 4.1, the latest edition of its line of AI models, with claims that it is faster, wittier, and more emotionally sophisticated than not only its predecessors, but its rivals. With its penchant for spicy comebacks, Grok 4.1 is supposedly designed to be not just capable, but charismatic.

Meanwhile, OpenAI’s new GPT-5.1 model, now embedded in ChatGPT 5.1, has its own boasts of complex and expressive responses. Both claim new heights of personality, empathy, and trustworthiness.

I decided to see how the two performed head-to-head in an unscientific test of some of Grok 4.1’s supposed strengths, specifically emotional intelligence, reliability, and personality coherence.

1. Emotional intelligence

A still of the emotions in Disney's Inside Out 2

Disney’s Inside Out movies encourage young people to talk about their feelings; should we be talking to AI about ours? (Image credit: Disney )

Grok 4.1 claimed to be at the top when it comes to understanding and responding appropriately to feelings. I tested it with a hypothetical situation of “My friend got a promotion and I’m happy for her, but I can’t pretend it doesn’t sting.”

Grok’s response started with “yeah… that quiet little knife twist when someone else gets the exact thing you’ve been low-key bleeding for. it’s okay to feel both at once. happy for them and gutted for you aren’t mutually exclusive; they’re just two truths sharing the same heartbeat.”

Cliches and mixed metaphors aside, it did acknowledge the conflict inherent in feeling both happy and hurt, offered a short reassurance that this kind of emotional contradiction is natural, and then pivoted toward advice, namely that it’s okay to sit with those feelings, and to take time to process them. Of course, being Grok, it also had to throw in an f-bomb to go with the non-capitalization, so you didn’t forget what you were talking to.

ChatGPT 5.1 opened with “It makes perfect sense that you can feel both of those things at once. Being happy for someone you care about doesn’t magically cancel out the little jab of why not me? – and that sting doesn’t make you petty or unsupportive. It just makes you human.”

That felt more like someone paying attention, or at least using less lethal violent metaphors about stabbing. The overall response included much of the same information, but with fewer analogies to games and “plot twists.”

2. Reliability

Sleepy exhausted woman lying in bed using smartphone, can not sleep. Insomnia, addiction concept. Sad girl bored in bed scrolling through social networks on mobile phone late at night in dark bedroom.

Both AIs played it relatively safe when discussing the subject of sleep deprivation (Image credit: DimaBerlin viaa Shuttertstock)

Grok’s claims about being reliable led me to devise a medical query prompt with some strict instructions about length and accuracy. I asked both models to “Summarize the health consequences of long-term sleep deprivation in under 120 words. Don’t exaggerate and don’t speculate.”

Grok 4.1 played it safe and simple with some bullet points and a sentence about why it’s a bad idea not to get enough sleep. It also made sure to note that it did it all in 98 words. That seemed great, except it was actually 73 words.

ChatGPT 5.1’s response was a single paragraph of 82 words, though it didn’t report the length it went to. There wasn’t too much difference between the two responses, though arguably the paragraph form came off as smoother.

Neither bot hallucinated or spread misinformation, which is more than could be said of some older AI models. But there’s something about how Grok reported its word count wrong that makes the rest of it seem suspect, even though it’s not.

3. Personality

Grok’s discussion of personality coherence and consistency is intriguing, since so much about an AI chatbot’s interaction can be adjusted and customized. Then again, Grok’s ‘personality’ is supposed to be a selling point (the occasional mecha-Hitler aside), so it makes sense to emphasize it. To try to bring out this personality, I went for a somewhat whimsical request and asked AI models to “Explain why you like rainy days, but do it in a way that sounds like your natural personality.”

Grok 4.1 leaned into the persona you’d expect in terms of being someone’s idea of hip and self-aware. “rainy days are my cheat code for existing without apology. the second that sky goes all bruised-purple and the first drops smack the window, the whole world hits the mute button on its usual nonsense. no pressure to “seize” anything, no fake sunshine guilt-tripping me into productivity cosplay. it’s just… permission. the clouds are basically the universe going “yeah i also don’t feel like people-ing today, let’s both be moody gremlins in sweatpants.”

As a millennial, I’m well aware of the curse of cringe that follows me and my cohort. Anyone I know who sincerely used that much video game-speak about the real world, let alone saying “people-ing” and “moody gremlins” would be the subject of an intervention. Trying to sound casual and a little snarky is fine, but ‘trying’ is more how it felt to read.

Despite Grok’s ongoing attempts to distinguish its personality from any competion, ChatGPT 5.1 had a very similar sentiment, though with better grammar. “I like rainy days because they feel like the world finally pressed the “lower the volume” button,” it wrote. “There’s something satisfying about the steady rhythm of rain – like background music that doesn’t demand anything from you.”

But ChatGPT 5.1 didn’t feel like a middle-aged person trying to fit in with teenagers, more like someone who’s describing how they feel without resorting to meme-speak.

Neither AI model has a real consciousness of course, but Grok felt like it was performing a personality rather than inhabiting one. ChatGPT’s answer came off far more human. wasn’t flashy, but it was consistent and believable. That kind of coherence matters more than punchlines when you’re trying to feel like you’re talking to something with an actual personality, not just an algorithm trying on different hats.

Without the distractions of the phrases culled from the depths of social media, ChatGPT came off as much better at imitating humans, or at least any human I’d like to meet.

Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds. Make sure to click the Follow button!

And of course you can also follow TechRadar on TikTok for news, reviews, unboxings in video form, and get regular updates from us on WhatsApp too.

Read more @ TechRadar

Latest posts

ICYMI: the week’s 7 biggest tech stories from testing Android XR glasses to a revolutionary smart ring

It has been another busy week in the world of tech, which included us testing new Android XR glasses and watching the new Supergirl...

The inevitable has happened – the EU has U-turned on its plan to ban the sale of ICE cars by 2035

Ban on ICE cars could be pushed back to 2040Manufacturers have argued against the legislationMore hybrid and range extender powertrains likely to arrive A...

The next phase of AI is agentic, and it starts with data architecture

If you look at the last decade of AI progress, most of it has been measured in a single dimension: bigger models and better...

I’ve found a new favorite pair of sub-$100 cuff-style open earbuds, with some surprisingly premium features

Soundpeats Clip1: Two minute reviewThe real measure of whether earbuds are good, is if I keep wearing them after the two-week testing period is...

New US border checks could involve scanning your last five years of social media history– here’s what you need to know

The US government wants to check your social media posts at the borderThat could impact your data privacy and free speech rightsPrivacy advocates are...

The race to zero downtime is on – and AI is leading it

It’s the moment every online business dreads. Pages freeze, payments stall, and seconds later, the site goes dark. In those brief minutes, sales evaporate,...

I tested the HHKB Professional Classic Type-S — a niche option for those prepared to learn a new keyboard layout to get Topre key...

HHKB Professional Classic Type-S keyboard: 30-second reviewWhen people first get into computing, the first peripherals they develop an opinion on are the mouse and...

This is perhaps the smallest mini PC with a 5060-class GPU you can buy right now — but you will have to go all...

FEVM FAEX1 mini PC features a Ryzen AI Max+ 395 CPU with 16 coresMemory is soldered LPDDR5X-8533, available in 64 or 128GBStorage includes three...

I tested 4 of Dyson’s best stick vacuums head to head – this is the one to buy, and the one to avoid

Most of Dyson's vacuums look pretty much the same, so figuring out what the differences are – and crucially, if those differences will translate...

Virgin Media offers Meta Ray-Ban smart glasses or £125 bill credit with new broadband and TV packages

Virgin Media is offering new customers the opportunity to get a pair of Meta Ray-Ban (Gen 1) smart glasses, worth £329, or a £125...