Home Gotopia Articles Beyond the Hype:...

Beyond the Hype: What AI Actually Can (and Can't) Do

Are we in an AI bubble? Jodie Burchell and Michelle Frost bring their combined backgrounds in psychology, data science, and ML ethics to cut through the hype with history, rigor, and a healthy dose of pragmatism.

Share on:

Copied!

About the experts

Jodie Burchell ( expert )

Data Scientist and Developer at JetBrains

Michelle Frost ( expert )

Read further

From Clinical Psychology to Data Science

Michelle Frost: Hello and welcome to GOTO Unscripted. I'm Michelle Frost, developer advocate at JetBrains.

Jodie Burchell: And I'm Jodie Burchell. I also work as a developer advocate at JetBrains, specializing in data science.

Michelle Frost: Jodie, before we go too deep, I'd like to reflect on your history and career path. Can you give us a brief walkthrough?

Jodie Burchell: Like many data scientists, my background is academic — but it's a little unusual in that I actually trained as a clinical psychologist. That's what I did my PhD in, and I was licensed as a psychologist for a short time before I stopped practicing and eventually lost my licensing. Apart from clinical practice, the thing I fell in love with in psychology was statistics. I absolutely adored it. I love the scientific method, and I actually did a second degree in biology because I love science so much.

Jodie Burchell: After finishing my PhD, I wanted something with meatier data, so I did my post-doc in biostatistics and public health data mining, looking at how to improve the Australian hospital system. Then I decided to leave academia. It was 2015, data science was very hot, and I got my first job at a company in Australia, then moved to Germany and have been a data scientist ever since.

Jodie Burchell: I've been doing advocacy for around three years. My data science career gravitated toward natural language processing — I was working in that from around 2016 onwards, which gave me exposure to more classic NLP techniques pre-LLMs, and then the early LLMs when they came out, back when all we had was the Flair package from Zalando to access the early models. I now have a strong intersection between ethics from my psychology background, rigorous measurement, scientific rigor, and a lot of practical NLP experience. For the last three years I've been spending a lot of time myth-busting around LLMs, which is what we'll be discussing today.

Michelle Frost: I love that. My background is also in classical machine learning — specifically some work in fairness — along with consulting and software for many years. It was NDC London 2023, and we were at the speakers' dinner. Christina Arden introduced us, saying 'Michelle, do you know Jodie? You both talk about some of the same things.' You were in a conversation with Layla Porter, and within two minutes the three of us became very close friends. There's this funny candid photo of us looking at our phones, clearly excited about something. I loved the fact that we were two women talking in a space that usually gets some pushback and nasty comments.

Jodie Burchell: Yes. Back then it was really difficult to talk about this stuff because people didn't want to hear it. Towards the end of last year, I felt people were becoming more receptive — I could get more than 50 people to a 'let's debunk this' talk. It's been a really interesting shift. I do have a bit of AI fatigue at this point, I think everyone does, but it's also been something of a privilege to be in this space at this time and spread a message that I hope is more measured and helps relieve people's fears.

Defining AI— and Why It Matters

Michelle Frost: I sometimes struggle with how we talk about AI in 2025 — for reference, we're recording this on December 10th, 2025. We don't know what's happening tomorrow; something could occur next week that changes this conversation entirely. How we talk about AI at the end of 2025 is different from how we talked about it in 2023, 2017, or 2012. It's been an evolution of a conversation and of definitions. So what are we actually talking about when we say 'AI' — from the perspective of right now, but also five years ago?

Jodie Burchell: I'm not the first person to say this, but 'AI' is not a well-defined term and still doesn't have an industry definition. What we're talking about now when we say AI is generative AI models — all types of neural networks, almost all based on variations of an architecture called the transformer, along with other innovations from the last five years or so. These models come from the fields of natural language processing and computer vision. These are not branches exclusively of machine learning or neural networks; they're older fields dating back to the 1940s, 50s, and 60s.

Jodie Burchell: A fun aside: the person who created ELIZA, one of the earliest NLP systems, from the 1960s, was actually Larry Page's PhD supervisor. It's such a small world.

Jodie Burchell: Having been in data science for around ten years, I worry about the conversations I'm seeing, because there's real confusion between what people mean when they say 'AI' — which we generally accept to mean generative AI models — and when they talk about 'AI that works,' which is generally the application of classical machine learning and neural networks to solve specific problems. Generative AI, in my mind, is still a relatively unproven branch. It's partially because it's difficult to assess, and partially because it's new — we're still working out the use cases.

Michelle Frost: I recently ran a workshop on fundamental machine learning, and we mostly ended up talking about data — how to think about it, how to collect it, how to preprocess it, what types work well for which models. Do you still think that classical foundational knowledge is essential in 2025? And for whom — people going into data science, research, software developers building on top of these models?

Jodie Burchell: It's a really substantial question. Let's take an example where you might think data quality doesn't matter. Say you want to create vector search over a set of documents. You divide them into chunks, convert those chunks into vectors using an embedding model, and use vector search to retrieve the closest match to a query. You might think the model handles everything — but it doesn't. You can't magic up meaning from a piece of text. You need a large enough chunk to have semantically meaningful content, which requires judgment and domain knowledge. And then there's the question of missing data, which is deeply complex. The reason data is missing may reflect bias in how it was collected. This is an extremely simple use case compared to the complex systems people are building with generative AI today. Even here, the model cannot produce something that isn't there.

Michelle Frost: Yes — context is different from background information, and I think that's something we're still struggling to separate cleanly.

Jodie Burchell: And conversations about ethics have always been a hard sell. Nowadays I think they're even harder because it feels overwhelming — you're working with a model trained on highly biased data and there's nothing you can do. But the truth is, you do have control over the use case, the application, and the data you feed in. Bias isn't just a fairness issue — it's a generalizability and performance issue, and a risk and liability issue.

Jodie Burchell: It's a simultaneously weird space right now. It feels more accessible than ever because the focus is on already-trained models and you just build tools, which is comfortable for most engineers. And at the same time it feels completely inaccessible because the models are so complex that even people who've been in this field for years are struggling to keep up. The truth is in the middle. The fundamentals of machine learning are still there. You can get there step by step. But please don't forget them — they're vital for measurement, performance, and generalization.

Michelle Frost: In the ethics space, it's also often misconstrued. In 2014 and 2015, hardly anyone was talking about fairness or bias mitigation in machine learning. Then around 2017, everyone suddenly woke up. There's still a perception that it's not a technical field — but if you look at fairness alone, there are over 30 mathematical definitions of what fairness even means. Cynthia Dwork's work has produced many of those measurements. It's extremely complex because we're taking a sociotechnical reality and trying to derive mathematics from it in a way that's meaningful and helpful without causing harm in the other direction.

Michelle Frost: When I did AI ethics consulting, I'd go into a room and ask 'who has values, who cares about ethics?' and see eyes glaze over. Then I'd say 'lawsuits' and everyone would re-engage. We're talking about the same thing — just different roads to the same destination. The measurement aspect is the most complicated part, and governance that actually works isn't an abstraction; it's about what you do in a product team that lets you keep moving forward.

Michelle Frost: In 2024 there was suddenly an influx of 'AI ethicists,' and then in 2025 — possibly influenced by geopolitics and what's been happening with data and AI regulation in the US — you start to see who actually cared and who was there because it was trendy. It's been a very interesting curtain reveal.

Patterns of the AI Bubble: History Repeating Itself

Jodie Burchell: After August, all of a sudden everyone started saying we might be in an AI bubble. This came off the back of the announcement about GPT-5.

Michelle Frost: You actually gave a keynote in September on this very topic, in Copenhagen. Can you tell me about the research that led you to that talk, and to that moment in August?

Jodie Burchell: The announcement couldn't have been timed better — I'd already finished writing the talk and just had to add the slide of Sam Altman saying it might be a bubble. I got into this because I don't come from a traditional machine learning background and didn't learn this history academically. I came across a video from a channel called 'Asianometry — a Taiwanese creator who seems to have a background in semiconductors or electrical engineering and makes very interesting videos about the history of hardware. This particular video was about what's now called the second AI summer and the invention of specialized computers for running Lisp programs, called Lisp machines.

Jodie Burchell: I knew bits and pieces of early NLP history — ELIZA, PARRY and its psychotherapy simulation, which all psychologists come across at some point. But this video sent me down a rabbit hole, and I realized the patterns from the first AI summer, which started in the 1950s, and the second AI summer in the 1980s, are identical to what we're seeing now. You have the development of a new AI technology, it looks exciting, even the researchers start saying this could be AGI because it performs so well at human-like tasks — and then we hit a plateau and people get disappointed. These aren't bad or stupid people. They're very smart people. It's just the excitement of a dazzling new technology. And I have been saying since around 2023: this is not AGI. You can't even properly measure AGI.

AGI: Hype, Hope, and the Measurement Problem

Michelle Frost: Can you give a loose definition of AGI for those not familiar with the term?

Jodie Burchell: AGI is not a well-defined phrase, which makes defining it difficult. The general idea is a system that can perform across a range of domains representative of what humans can do, at the level of skilled humans or better. There's also the associated concept of artificial superintelligence — a system that can outperform all humans, even highly skilled scientists, across a huge range of tasks.

Jodie Burchell: There's a very notable paper by François Chollet, a well-known AI researcher, released in 2019 called ‘On the Measure of Intelligence'. He lays out the difficulty of defining what the scope of tasks even is, and raises the question of how to define generalizability. A lot of what we're seeing is what's called artificial narrow intelligence — systems that specialize in particular tasks. This has been working for years, but we don't know how to make the leap from narrow to general intelligence.

Jodie Burchell: What seems to have happened with GPT-3 — the first GPT model large enough to have what's called parametric knowledge, meaning it learned enough from its training data to produce coherent text, unlike GPT-2 which produced wonderfully chaotic sentences — is that people started to see themselves in it. You can understand how they made the leap to thinking just a few more parameters would get us to AGI. But we're not there. LLMs are generalists at language tasks. Language tasks are not intelligence. They are not reasoning. They are not sentience.

Michelle Frost: This is something people miss. Language is extraordinary — it's how we communicate — but it's not the only part of cognition. We talk about intelligence in this field as though we have a clear definition of it. But if you ask cognitive psychologists or neuroscientists for a definition, they can't give you one cleanly. There's also a bias embedded in how we define it — intelligence looks different in different contexts, and many people we revere as highly intelligent challenge accepted norms. We're trying to impose a single standard on something inherently varied.

Michelle Frost: How much do you think what we're building toward is shaped by science fiction of the past?

Jodie Burchell: Quite a lot. You see this in the utopia/dystopia divide. In earlier science fiction there was this idea of technology for the betterment of mankind — the original Star Trek with its post-scarcity society. Then you have the dystopian narrative, which I think comes more from later science fiction. 2001: A Space Odyssey and HAL 9000, from 1968, was written right in the middle of the first AI summer — something that only clicked for me after doing the research for the keynote.

Jodie Burchell: Timnit Gebru has given a very interesting talk on how many people in this space hold deeply committed beliefs, either transhumanist or doom-oriented. Some people in this space are known to have actual doomsday bunkers. That's a little frightening, because they're likely making decisions in a very different way than practitioners like you and me do. We tend to think about these as very concrete problems. A lot of the people leading this space may not actually have deep hands-on experience building with these tools at scale.

Developer Productivity: What the Research Shows

Michelle Frost: So what does all of this mean for the day-to-day work right now? And what fields are most being influenced?

Jodie Burchell: I want to be clear — I may come across as more cynical than I am. I love LLMs. I fell in love with them the first time I used them. I think they are the most brilliant tools we've ever had for working with language. But that's an important caveat: we need to focus them on language problems. Not all of software development is a language problem, but parts of it are.

Jodie Burchell: I was uncomfortable for a while because all we had was anecdotal data, not substantive studies. One promising study was the Stanford 100K Developer Study, released earlier this year. What I liked about it is that they used real codebases — 80% of them private repos — and had teams report on when they started using AI tools in earnest, then measured the impact.

Jodie Burchell: The picture was complex, as you'd expect. Across the board, teams did see productivity gains from AI-assisted coding. But LLMs also introduced lower code quality — more bugs, more refactoring. Even accounting for the time spent fixing that, teams came out ahead. However, the benefits were concentrated in particular types of projects: greenfield projects benefited far more than legacy ones, smaller codebases saw more gains, less complex tasks were handled better. Popular languages like Java and Python benefited most because the training data is enormous. Older or more obscure languages performed worse — and until recently, that was even true with Rust. They are tools. Use them for language tasks and they perform well. Try to use them to solve your whole business problem, and they won't.

Michelle Frost: Our head of AI, Yvonne Tsakok, mentioned something similar on a panel — that clients are seeing a real increase in productivity specifically around code understanding. New team members needing to understand a large repository find these tools genuinely helpful.

Jodie Burchell: That's fascinating, because the original use case for LLMs was machine translation. It's come full circle.

Michelle Frost: We're not done yet. It's an interesting space to be in. I love that we've always shared the same attitude: this is really cool, but what's the baseline reality, and how do we work practically rather than chasing moonshots?

Jodie Burchell: I look forward to seeing how things settle when the funding dynamics shift and we potentially enter the next AI winter. I think we'll see these tools used for what they're genuinely useful for, and in some use cases they really are helpful for developer productivity.

Michelle Frost: Absolutely. I think we're at time, but as always, this was fun. I feel like we could go for hours.

Jodie Burchell: We absolutely could. Very sensible of them to cut us off. We'll catch up over wine.

Michelle Frost: A glass of wine at a specific bar in Copenhagen, perhaps.

Jodie Burchell: Always.

Michelle Frost: Thank you, Jodie.

Jodie Burchell: Thank you.