Home Gotopia Articles Natural Language...

Natural Language Processing in Real-life: Introduction

How close are we to interacting with machines at a human level? Learn about the state of the art in deep learning with practical insights from Zenodia Charpy, Senior Data Scientist at NVIDIA. She talks to Eric Johnson, Principal Developer Advocate at AWS, about the basics of language models and their evolution toward multilingual transformers.

Share on:
linkedin facebook
Copied!

Read further

How close are we to interacting with machines at a human level? Learn about the state of the art in deep learning with practical insights from Zenodia Charpy, Senior Data Scientist at NVIDIA. She talks to Eric Johnson, Principal Developer Advocate at AWS, about the basics of language models and their evolution toward multilingual transformers.

Intro

Eric Johnson: All right. Here we are for another "GOTO Unscripted" session. We are live from Aarhus. I've been working on how to say that right. So, Aarhus is how the Danes say it from what I understand. So, I'm very excited. I am Eric Johnson. I am a principal developer advocate for serverless at AWS. And my guest today is Zenodia Charpy.

Zenodia Charpy: Hello.

Eric Johnson: Did I say that right?

Zenodia Charpy: Yes.

Eric Johnson: Tell us who you are and what you do.

Zenodia Charpy: Well, my name is Zenodia Charpy. I'm a senior data scientist working at Nvidia and my focus is on training and building and deploying very large language models.

Eric Johnson: Okay. So, let's break that down. Very large what models?

Zenodia Charpy: Language models.

What is a language model?

Eric Johnson: Okay, all right. So, what is a language model?

Zenodia Charpy: Well, language models, in general, has a history in... well, before transformer age, that, you know...we were using back words, word embeddings a lot and the statistical method. And since around 2000 something ever since transformer came out and it has this mind-blowing effect. Well, now that's where we are.

Eric Johnson: Okay. So, all right. So, I'm gonna back us up a minute here. So, what is a language model? Why...if I'm a technical person, why would I use a language model? What does it do?

Zenodia Charpy: So, language models, specifically, the transformer model that, you know, is kind of my personal...

Eric Johnson: It's your thing, yes.

Zenodia Charpy: Yes, my thing. Well, they can do a lot of things. You can generate an image by just short descriptions. Yes, you can.

Eric Johnson: Okay.

Zenodia Charpy: Yes. And you can have a conversation with it as if it was a real person. You can ask it to do a sort of task like "Oh, I will give you some examples of English and Swedish translations." And then it will be able to translate it. Is that amazing?

The transformer

Eric Johnson: Wow. Yeah. It's all right. So, what makes the transformer the new thing? Why is it so exciting? And what is it, I guess? So, I mean, I know all about this, obviously. But really obviously I know nothing. So, you're gonna be teaching me. This is very new to me. I know Nvidia does a lot around the language. You all put out a while back some language where it blocks out sound and things like that. It has the AI that does that. Tell me about the transformer. Why is it so amazing?

Zenodia Charpy: The basic kind is, you know, sect to sect model. That is, you have an encoder part and then you have the decoder part and the magic inside of those modules is the self-attention that if you...intuitively thinking if you can pay attention to a sentence and then you mask out a few words and then if you check if this sentence and the consecutive sentence has continuity or not, true or false and then if you just train on this simple task for an encoder and you can do all sorts of amazing things like, you know, classify sentiment.

Eric Johnson: Okay.

Zenodia Charpy: Yes. For the generative part which is the decoder part that...which is what we call GPT type of model, generative type of model. Now that it can generate sentences that sometimes for example you can ask it a question like, you know, "What's the meaning of life?" And it will give you, almost kind of an essay about the philosophy of what I think about it. It has the capability of reasoning for example. If you ask it a masked question...

Eric Johnson: Stop it. Sorry, I startled you. Reasoning? So, it's not...okay, I didn't mean to interrupt you but we're there now. So, if I'm understanding you right, it's not regurgitating something in its database that might...something might seem to relate to that. It's reasoning?

Zenodia Charpy: Yes, well, you know, for example, this is quite recent...I mean, that I have noticed that very, very large language models have the ability to...for example, you ask...you know, like, when we were in elementary school, we learn how to do addition and you would say, "I have five apples. You have one apple. How many apples do we have together?"

Eric Johnson: Right. I know that. It's six. It's six. Yeah, sorry. I get all happy.

Zenodia Charpy: So, you would reason...yes. So, you would suppose, right? It's, like, five plus one equal to six. Therefore, we have...together, we have six apples.

Eric Johnson: Okay.

Zenodia Charpy: Now language models. If you just show it...and one example is called one-shot learning.

Eric Johnson: Okay.

Zenodia Charpy: Yeah. Then it has the ability...if you ask a similar question, it has the ability to do exactly that, replicate that, extrapolate that logic.

Eric Johnson: Okay.

Zenodia Charpy: Isn't that amazing?

Eric Johnson: Beyond amazing. So, all right. So let me say this back, make sure I understand this. So, I show it...if you have...I'm gonna switch it up for you. If you have three apples and I have two apples, we have five. That's the one shot, right? So, it says, "Okay, I figured out the logic. The first number, the second number. I need to add them or subtract them or whatever." But it's working as words. And then so the second...so then I can show it again if you have 19 airplanes and I have 2 airplanes, it can use the logic from the one shot, the first learn to...the first learn. I'm changing up words here. In the first learning to apply to that second one to do the math to understand how to process that.

Recommended talk: Conversational AI Demystified • Priyanka Vergadia • GOTO 2019

Zenodia Charpy: Yes, but I mean, of course, that this type of model needs to...it needs to train on datasets, like the gigantic amount of datasets that have those logics in it, right?

Eric Johnson: Okay.

Zenodia Charpy: So, I mean, you cannot possibly imagine that you know, if you just throw it web chat forums and it’s not even grammatically correct and the emojis and, you know, this kind of thing, and then it would just do that, right, because it understood. I mean, it's like when you're teaching a child how to speak a language. You're teaching a child how to do math. And then if you equipped it with...you know, on a daily basis with a common language like, you know, emojis and web chat forums, not even grammatically correct, I mean, the result is the child will speak that kind of language.

Eric Johnson: Yes.

Zenodia Charpy: Yes. So, the language model is very similar.

Eric Johnson: Okay.

Zenodia Charpy: So, the way that you feed the data to...I mean, not the way. I think the data, the variety, and the versatility you feed to this type of model, GTP type of model, then it will kind of result in this sort of things that I just mentioned, yeah.

Eric Johnson: So, I have a 9 and an 11-year-old. I have five kids actually.

Zenodia Charpy: Wow.

Eric Johnson: But my nine...yeah. That's the response I usually get. "Wow. What were you thinking?"

Zenodia Charpy: No, no, no, no, no. A lot of work.

How good are the AI/ML training models?

Eric Johnson: But it just tells you that I travel for work quite a bit. It tells you how absolutely amazing my wife is, right? But it's funny because my...totally off topic but I'm saying this anyway. My 11-year-old has, you know...be back soon is BBS and, you know, LOL. She's learned these emoji talk and it's like, "Please stop. Please stop." She's learned it from cartoons. So yeah, it has affected how she dialogues and things like that. Really interesting. So, I'm still a little blown away by the idea but there's the learning process. So, the more obviously we teach it, like, any AI ML, the more we teach it, the more responsive it is. How good is it?

Zenodia Charpy: Well, depends on the size of the model. If you imagined that the entire internet is a knowledge base that you can have access to and you...if you could feed all of it to a gigantic model, a GPT3 model, then the bigger the model size...you know, being able to absorb those pieces of knowledge, coming from all over the world then, of course, it can do all sorts of great things. However, this is just a dream that for the moment, most of...I mean, this comes to a little bit sensitive topic because most of the data, high resource language data out there are English, Chinese. So, for those of...like us, I live in Sweden. So Swedish is a low-resource language.

Eric Johnson: Okay.

Zenodia Charpy: So, we don't really get that much data comparable to English.

Eric Johnson: That would make sense, yeah.

Multilingual transformer

Zenodia Charpy: The model that we trained obviously is less competitive, so to speak than the English one but again, if we could have...imagine, like, you know, multilingual kind of GPT3 model, wouldn't that be interesting? So that's kind of like a research initiative that we're taking on.

Eric Johnson: All right. So, here's what I was thinking to see if I'm on the right track then. So, English, is obviously, a very popular language. It's the only one I speak. You trained that, you have high-performing models in English, right? But you're saying Swedish is not as high performing just simply because it's not as common. Does the ability to have...if you teach the Swedish to translate to English and then apply, does that make sense? Is that kind of what you're talking about?

Zenodia Charpy: Well, it's more like the other way around that, you know...because this is also the other thing that we are looking into.

Eric Johnson: Okay.

Zenodia Charpy: So, English has a lot of abundance in all sorts of tasks and, you know...

Eric Johnson: Okay. Abundance?

Zenodia Charpy: Abundance of tasks.

Eric Johnson: Okay. I thought it was abandoned. I was like, "what?" Abundance, yeah, okay. It's my hearing, yeah. So yeah.

Zenodia Charpy: Reaching a variety.

Eric Johnson: Okay.

Recommended talk: Breaking Language Barriers with AI • Boaz Ziniman • GOTO 2019

Zenodia Charpy: Let's take examples for sentiment analysis. Like, if you...or intent slot classification. If you speak a sentence and you have an Alexa here, and if you want Alexa to find out what's the intent when you speak, "Get me a coffee." Then probably it needs to do something with a smart coffee machine. Now that intent slot classification has a lot of public available benchmark-worthy datasets you can check the performance of these very large language models.

Eric Johnson: Okay.

Zenodia Charpy: However, for the low resource language such as Swedish, we simply don't have those...

Eric Johnson: That makes sense, okay?

Zenodia Charpy: So how about we just translate?

Eric Johnson: Okay. So, I was on track. Okay, got it, okay.

Zenodia Charpy: It's more like the other way around. We try and translate from English to Swedish and then of course you need to do a lot of checking whether the translation quality is good enough.

Eric Johnson: Yes.

Zenodia Charpy: Yes. Or then you could possibly, you know, we hope in the future to use this kind of mechanism and create a workflow enabling all sorts of languages that transferred from the high resource to the low resource. Everyone can be abled. So, we are not isolated from the English world.

Eric Johnson: And so better the translation, the more effective that would work, then?

Zenodia Charpy: Yes, but I mean, that's just one initiative, right? You also have the other initiative where it's possible to train a multilingual type of model such as T5. T5 is a language model that has an encoder and decoder where the main difference between the T5 and the sect-to-sect model is that the result will be generated regressively.

Eric Johnson: Okay.

Zenodia Charpy: Anyhow, so the T5 model, if you're training multilingual, and that's also a research initiative that we're having, you could possibly just keep the translation and you train on the English one and then you adopt it to the Swedish one. So more on that later because we might have some blog and stuff that will be out this... at the end of this year so yeah.

Intent and slots in GPT3

Eric Johnson: So a couple of questions. One...let me go back for a second. Did you say intella slots?

Zenodia Charpy: Yes, intent, and slot.

Eric Johnson: Okay, intent.

Zenodia Charpy: Intent.

Eric Johnson: And so okay. So, what does that mean?

Zenodia Charpy: So, for example...

Eric Johnson: I'm going to interrupt. I've worked a little bit...I think I know what it means. I've done some Alexa skills in understanding where the slots are and things like that. explain that to see if I'm even close.

Zenodia Charpy: So, for example, when you want to set the alarm clock you say, "Set alarm," that's intent.

Eric Johnson: Ok

Zenodia Charpy: The slot will be more fine grain. Like, you know...

Eric Johnson: The data.

Zenodia Charpy: Yes, like, 8:00 p.m. or...p.m. is weird. It's a.m. Possibly it's better.

Eric Johnson: Yes.

Zenodia Charpy: Or book a reservation, right? And then you have to give a time and a place. So those will be the slot.

Eric Johnson: Ok.

Zenodia Charpy: The intent will be...yeah. So intent is more like, what do you call it, if you look at the hierarchy, it would be higher up one level and the slot will be...

Eric Johnson: Ok.

Zenodia Charpy: Yes.

Eric Johnson: What to do, the data that applies to that action, okay.

Zenodia Charpy: Yes, you can think of it like that, yes.

Eric Johnson: That's probably oversimplifying it but I'm totally a data scientist.

Zenodia Charpy: No problem.

What’s next?

Eric Johnson: Add that to my resume. Okay. So let me ask you this. How much do you love your job?

Zenodia Charpy: Well, a lot.

Eric Johnson: I was going to say it shows on your face.

Zenodia Charpy: Yes.

Eric Johnson: You obviously love what you do. You're obviously incredibly bright. I mean, a lot of this is going over my head but it's very fascinating. What's next?

Zenodia Charpy: Well...

Eric Johnson: And I'm not asking you to tell any of your secrets unless you want to. Probably you shouldn't. But I'm curious about what's next. What's the next big thing?

Zenodia Charpy: Before I answer that question, I just want to say that there are a lot of bright minds at the company that I'm working for, Nvidia.

Eric Johnson: Obviously.

Zenodia Charpy: Yes.

Eric Johnson: Yes.

Zenodia Charpy: I mean, and obviously all around the world like yourself.

Eric Johnson: Sure. 

Zenodia Charpy: I'm nowhere an expert. I'm just a person who's on the journey of trying to...

Eric Johnson: Just listening to you talk, yeah, obviously you are an expert but okay, whatever.

Zenodia Charpy: The next step will be, like, you know, if you think about using this type of model...I don't know if you heard about Gato. That's the kind of model where you can take any kind of modality, multimodality we're talking about, any kind of modality like image, video, audio, not just text-based, right?

Eric Johnson: Right, right, okay.

Recommended talk: How to Leverage Reinforcement Learning • Phil Winder & Rebecca Nugent • GOTO 2021

Zenodia Charpy: And then you can have this capability such as the language model that's text-based as well. So that would be amazing, right? The multimodality. Then you have one centralized model that does basically everything.

Eric Johnson: Sure, sure, yeah.

Zenodia Charpy: Like we humans do, right?

Zenodia Charpy: That would be fantastic, right?

Eric Johnson: So, and maybe you're not there for a question yet, but as I'm hearing you...so what you're saying...you know, I'm just repeating what you're saying and trying to sound smart but what we wanna be able to do is say, "Okay. You don't have to just talk to me in text." Being...I'm the model or I don't just manipulate images. I know like at Amazon we have recognition that handles video recognition and then we have text track and we have different things like that and they're probably entirely separate systems that handle that. But what you're kind of saying is those...one model to kind of rule them all, right.

Zenodia Charpy: "Lord of the Rings."

Eric Johnson: That was very "Lord of the Rings", wasn't it?

Zenodia Charpy: Yes.

Eric Johnson: Does that mean we kind of bring those capabilities in? So, okay. So kinda front end, so here's your one model. You front end it with the text, front end it with an image manipulator, front end it...or does it handle them all? And maybe there's not a big difference there but I'm curious.

Zenodia Charpy: There's a big difference. And so, if you have one model to rule them all, then...

Eric Johnson: If you all use that, I want just a little tagline, "Eric Johnson."

Zenodia Charpy: It depends on the organizations, right, because some organization wants to have this ability to break things down so that you are having this modulized approach where you can interchange if one part of this ability is not working, then you just interchange that out with a better performing module.

Eric Johnson: Okay.

Zenodia Charpy: For example, when we are speaking and that's an audio, right...automatic speech recognition. You understand what you are speaking from the sound and you translate it in your brain to the text and then, you know, if you have a translator here, then it would translate it to whatever language that, you know, that they are supposed to. So that's a modularized approach. You kind of break it down into a pipeline, right?

Eric Johnson: Okay.

Zenodia Charpy: And that's not bad either because that means that you have the ability to optimize a specific section of that.

Eric Johnson: Okay.

Zenodia Charpy: Yes, in the pipeline. Now if you have a big model, then that kind of taking anything and do everything, that's amazing too but at the same time, it would be making people feel like, you know, "Terminator" scenario.

Eric Johnson: Thank you. I was gonna ask the question. How close are we?

Zenodia Charpy: Yes

Eric Johnson: Yes. This ability to just interact on a human level. How far are we in your opinion from that? You're like, "I don't want to answer that," but you don't have to answer. 

Zenodia Charpy: I think you should ask Elon Musk. He has Neuralink, that company, isn't it? So, you should ask him. Elon Musk.

Eric Johnson: Oh, yes.

Zenodia Charpy: The guy who's like the brightest mind.

Eric Johnson: But he's not here.

Zenodia Charpy: No. No. No, but I think we are close.

Eric Johnson: I wanted to ask you one more thing and then probably break it out. But you have a talk today.

Zenodia Charpy: Yes, I do.

Eric Johnson: And I'm assuming it's a lot about what you're talking about here. What is your talk on?

Zenodia Charpy: It's...I'm gonna talk about two GPT3 models. One gigantic and one small.

Eric Johnson: I wanna thank you very much for joining us. This was fantastic. I've learned a lot. I am adding data scientists to my resume. I'm obviously not, you know, where you are but I'm close, you know.

Zenodia Charpy: Everyone is a data scientist.

Eric Johnson: There you...that's right. I'm gonna go with that. I appreciate your kindness. So, anything you want to say before we head out? Anything...any shameless plugs you've got coming up?

Zenodia Charpy: I would like everyone to kickstart the journey, whatever you are doing because this is a process, right? So, it's only when you start the journey and start learning then you are, you know...experience this fantastic AI world so...

Eric Johnson: Well said. Yeah, well said. I appreciate that.

Zenodia Charpy: Yeah.

Eric Johnson: Well, that's it for another session of "GOTO Unscripted" here live from GOTO Aarhus. I've been working all week to say that right? So, we appreciate it, and thank you. Absolutely fascinating.