Software Architecture

Handling AI-Generated Code: Challenges and Best Practices

About the episode

Share on:
linkedin facebook
Copied!

The Evolution of Developer Workflows

Roman Zhukov: Hello folks. Welcome to GOTO Unscripted, where we discuss a whole range of technical topics in friendly conversations. Today we're going to talk about handling AI-generated code, exploring challenges, best practices, and its impact on developer workflows, which is quite a hot topic right now. I am Roman Zhukov, a cybersecurity expert with more than 20 years of experience securing large-scale platforms and products. I'm now with Red Hat, where I drive open source security strategy and cross-industry collaborations. Today I'm joined by Damian Brady. Damian, would you like to introduce yourself?

Damian Brady: Yes, absolutely. I'm Damian Brady. I am currently on the other side of the world from Roman, over in Australia. I work for GitHub currently, and previously worked for Microsoft as a developer advocate. I have about 20 to 25 years of development experience, depending on how you count. Recently, especially with GitHub and Microsoft, there has been a definite push towards AI-assisted software development, which is fascinating. It's a bit of a change to the way that I've always traditionally done that kind of work.

Damian Brady: I think what we're going to talk about today is how this affects the developer workflow itself. This is definitely something that we talk about a lot. I talk about it with other developers as well. The obvious change for me, especially with generative tooling, is that we've moved from an hour of coding being an hour of coding to potentially some of that work being offloaded elsewhere while you do some parallel work. It's almost like a scaling factor. Is that something you've seen as well?

Roman Zhukov: Yes, absolutely. We also see huge developer workflow transformation. But I would say we don't see AI as a replacement for all of the developers, junior developers, and so on. It's more like an amplifier to get things faster. I've seen research by a company called Smarter. They did research and came up with the conclusion that developers felt around 20% faster when they were using AI tools like Copilot, Cursor, Claude, and so on, but actually were 19% slower in complex tasks to deliver a complete product or release. I think this is a good representation of the reality we live in right now with code development. It obviously accelerates the development itself, but in order to make the actual release or a high-quality product, you still need to have the human in the loop and you still need to cook it right, as we call it.

Damian Brady: Yes, I absolutely agree with that. I think part of the hype issue that we're seeing in the industry with AI tooling is the lack of clarity around what the tooling is good for and what it's not good for. It doesn't matter how good your AI dev tool is, it's never going to replace a human in working out exactly what the requirements need to be and speaking to the people. Copilot or Cursor or any of these tools are never going to be in the same room as the product managers and the owners and the customers, working these things out. While these tools might be great at certain tasks, even unsupervised, like making sure you have tests that cover all of these functions in an area, that's relatively straightforward and easily defined. But something more complex, like implementing end-to-end a new payment methodology, probably needs a lot more human eyes. In those cases, these tools can be great, especially if you know how to use them really well, but they really do need a human driving that kind of work. The tools are great, the tools are really powerful, but you can't just throw the problem at the tool and expect it to solve it completely.

Roman Zhukov: That's true. We also see some inconsistency in how different companies or communities actually apply or allow the usage of AI. Red Hat is a huge contributor to open source and upstream communities, and we work with them quite extensively. Our position at Red Hat is that we definitely are not banning AI, but rather we would like to innovate responsibly. We see there are a lot of different strands and debates in developer communities that we work with regarding how to tackle AI-assisted code contribution and the problems that they potentially can introduce. Almost everything that we do at Red Hat is actually in open source. Over the past year, we have been investing a lot, both internally into our own engineering practices and exploring how other communities actually work with generative AI and all of these code contributions. Working in the open is our culture. We believe it's really important to share what we're doing, what we're learning. We usually say that upstream is downstream for us. We really do the majority of the work in open source.

Damian Brady: We have a slightly biased environment because we are trying to make these AI tools as well. Copilot was one of the very first commercial products that used LLMs, especially for developers. But we have similar efforts with open sourcing. The VS Code team at Microsoft recently open sourced the Copilot extension and built it into the product itself. We tend to allow people to use the tools that they want to use. This is a personal thing for me as well. If the tool that you're using is a great tool for you, then you should keep doing that. I don't think you can say very clearly one AI dev tool is better than another necessarily. It depends on the task, your familiarity, and even the code base. Having that kind of openness about what you can use and what you're allowed to use is a great thing. Obviously, internally as we're writing these tools, we're using them. We've stopped saying dog fooding now; we drink our own champagne is the way we like to put it. We use it internally a lot to the point that Copilot is now the number one contributor to the GitHub code base, and Copilot code review is number three. It's not even just developing the code; it's making suggestions at code reviews and things like that.

Navigating Trust and Security in the Age of AI-Generated Code

Damian Brady: Speaking to customers, it's not always the case. There are a lot of questions around, especially if you're using these tools, where does that code come from and who owns that code. Is it safe to just let people use these tools and generate code and then push that out to production? There's a lot of variation in how comfortable companies are with that kind of thing. It sounds like Red Hat is relatively comfortable using these tools as tools. I don't know whether you've seen that in the industry as well, that there is a bit of reluctance to hand over the keys, if you want to put it that way.

Roman Zhukov: Yes, absolutely. I like how we switched gears a little bit to code quality and trust, or mistrust, or whatever problems that we can actually have while accepting AI code contributions. This is definitely the problem. We see different companies tackling it differently. What we think is the legal and ethical standards haven't changed, actually, even though the tools have changed. We actually wrote in our blog that is called something like "Navigating Legal Issues While Accepting AI-Assisted Code Contributions and Development." AI systems are not authors, but developers are, right? If you commit code, you are responsible for its quality, security, and licensing. AI systems, at least right now, are not considered as authors under copyright law and under general legal understanding. It would be misleading for developers to present substantially AI-generated output as their own work. The augmentation should be admitted first, and I think it should be documented. That's why, for example, the solution to this problem could be requiring developers to add "assisted by" or "helped by" disclaimer when they commit their code, and be careful with their licenses. That's why we have our own list of practices that's also outlined publicly in our blog that we recommend for our developers and for everybody else to implement when they use AI-assisted tools.

Damian Brady: Ultimately, using other tools to assist in the development is not really a new problem. The AI part is the new problem, and the fact that, to a large extent, these large language models are a bit of a black box. You ask a question, there's a prompt, and then there's an inconceivable number of operations inside a terabyte-sized model, and then what you get at the other end is some code. That's the new bit and the scary bit. I think there's a bit of a sliding scale about what is appropriate to do with AI-assisted tooling. If you are actively working with the tool, asking for suggestions and it's giving a code snippet or even a little bit more, it could change half a dozen files, and then it's your responsibility to look through the files as a developer and see those changes and test them yourself. If that is a synchronous process, I think it's just like all of our previous generation of code generation tools, like the templating engines. You know, we have a database schema, so let's generate the data objects in your original languages. It's just a little bit more non-deterministic.

Where it gets a little bit more complicated is cloud-based or offline or background tools where you give it a prompt and then you get a full solution later on. You get a full end-to-end implementation of a feature or something like that. The way that we've positioned that at GitHub, and I think this is pretty standard practice now, is you don't hand over the keys and say, here, go just put code in production. Everything that gets done by the Copilot coding engine, everything that gets done by that agent, is submitted as a pull request. It's tracked in GitHub as "this was committed by Copilot coding agent on behalf of the developer who requested that work, and the code review was done by this developer," and so on. There is that very clear traceability of what wrote the code. But that's where it gets a little bit more gray in my mind. You've given it the prompt, you've ideally given it a bunch of custom instructions and a specific agent, you've chosen the model and guidance and so on. However, it's ultimately just gone in one end and then code has come out the other end. It's still your responsibility as a developer or a dev team to verify that that is the correct code to be using. I think I agree with you for 99% of that. It's unlikely to be the case anytime soon that the AI itself will be responsible for the code that gets developed. We're just dealing with a more complicated version of looking up Stack Overflow and copying and pasting code. It's not ultimately a new problem, but it's a new level of the same problem.

Roman Zhukov: Yes, absolutely. From my remit, from a security perspective, I also see there are a lot of security concerns related to AI code contributions. But the majority of them actually are not new by nature. They're just happening at a new scale. One of the recent examples that I can bring up is the malicious Claude incidents, I think a month or a few months ago, where a package named "cloud-code," which is actually a fake one, was suggested. When developers asked AI to include some packages or extensions, it actually suggested these completely hallucinated "cloud-code" extensions that got ingested into the development repositories, which potentially could lead to unexpected consequences and security concerns. It's one of the problems that we see because of how developers right now are consuming AI tools or AI models. They just pull them, sometimes blindly, straight away from hubs like Hugging Face with one or two commands. But they don't necessarily explore where this kind of model or artifacts came from and what they could potentially include. This is not a brand-new problem because supply chain issues have been with us for a few decades now, but the scale of the problem is significantly bigger right now.

Damian Brady: Yes. On that point then, how do you make sure that these tools are giving you code that is trustworthy and good quality? Obviously, as an experienced developer who has been writing code for a number of years, if you get a result from an LLM, as long as you're paying attention, you can look at it and say, "Great, this looks correct." But especially for junior developers or somebody who maybe doesn't care quite as much and just wants to move the ticket from one column to the next column, how do you verify in your organization that the code quality is there and the security vulnerabilities aren't?

Roman Zhukov: That's a perfect question. I would answer it probably in two ways. The first important thing is education, and to make sure we actually educate developers, especially junior developers, but also everybody else, on how to write code using AI assistants securely. That's why we feel it's extremely important to collaborate with other industry leaders like Microsoft and communities like the Open Source Security Foundation, OpenSSF, for example, to make sure first we collect all the good security practices for AI assistants. Second, we compile all of these practices into short and concise guidance and free classes that enable developers to use AI securely. You can actually Google "OpenSSF AI secure development practices" that are completely free. No registration is required, so you can consume them right away. I'm one of the contributors to these practices, among other folks. While we were doing these, we made sure we provided really developer-centric recommendations.

For example, these guidelines include specific prompt suggestions. For instance, don't write a prompt like, "Create me the identity handling function" or "Create me the passwords form," but make sure that you use all of the secure-by-design functions and extensions that can help me to do so. Avoid unsafe packages, for example, without explicit review. Use crypto functions in a specific way that are well-known and well-recognized, then the specific latest version, something like that. This difference in how we prompt our assistants to create secure code and ultimately secure products really makes a difference from my perspective.

Damian Brady: Yes. I also like the idea of codifying that in your code base and in your repositories and in your organizations. I'm pretty sure almost all of the AI agent dev tools at the moment have the ability to have custom instructions or agent instructions in them. I know Claude does, for example. You can have an agent file which gives that information. So when you're working on an area of the application that touches security, you can have instructions in these files to say basically exactly what you said: follow these guidelines, use good cryptographic libraries that we know about, and use standard auth protocols and processes and so on. Don't roll your own. I know at least with GitHub, we have the capability of defining those at the enterprise and organizational level as well. So you can say, apply these standards and these extra prompts and meta-prompts at the organization level.

I think it's also probably useful to use some of these tools that are designed specifically for software development. As opposed to dumping a bunch of code in ChatGPT and saying, "Can you write authentication for this application?" there's nothing in the way of pulling a terrible practice out and giving you those results. Whereas Copilot is the one that I know the most, so I can use that as an example, but there are filters in place. You're not going to get responses that have security vulnerabilities in them because there is a responsible AI process at Microsoft that requires that we go through these filters before giving any AI-based results. Using the right tool for the job rather than just throwing it at raw LLMs can go some way to doing it. But again, never a replacement for a human in the loop or somebody who knows what they're doing, a proper security review and things like that.

Roman Zhukov: Yes. That's true. I completely agree with this. I think trust, or actually zero trust, is still the right concept to apply even to these new worlds. I think it's key to ensure we get the real value of AI-powered speed increase to create truly production-ready solutions as opposed to just code. I threw out an example about AI model consumption, for example. We also saw a lot of incidents related to it when some malicious code or workloads could be included in the AI models itself and then injected into the organizational workflow, so that it leads to stolen credentials or establishing the reverse shell to command and control servers where malicious actors operate.

Building Trust Through Provenance

Roman Zhukov: I think with this, trust starts with provenance. If listeners are familiar with the traditional provenance concept for software artifacts, when we have metadata that's attached to artifacts, to source code, or to binaries that actually can tell you how this artifact was produced so that you can make sure it's not tampered with and is authentic. We're trying to apply the same provenance concepts to the AI space as well. For example, to attach these verifiable artifacts or metadata to training data, to the AI model itself, and to the complete AI application or system that end users will be interacting with. How do you even audit these systems when you don't know what training data influenced them?

That's why concepts like data provenance standards that we are working on together within the industry among all of the big tech companies come into play so that we can standardize things like data provenance, like AI model cards, or even the full AI system cards that you can perceive as a nutrition label for the entire AI application. This documents the complete AI operational environment, platforms, and the detailed system architecture, including the security posture. To our conversation that we discussed a little bit earlier about how to enable developers and how to enable people to verify that the code or the applications are secure and good enough, we need to provide them some level of assurance. The provenance and the documents and the practices are one way to do it.

Damian Brady: Yes, definitely. I think that's an incredibly good initiative, something the industry as a whole should really embrace. From a local perspective, if your application and your developers are using these AI tools, I think nothing can replace the structures and the processes you already have for software development. Having code reviews and having security checks, audits, builds, tests, and all of those processes that you would have had regardless of where the code comes from can also help with making sure that, at least for individual projects, if those structures are still in place, ideally regardless of where the code comes from, you still have those builds and tests and audits and things like that as well. I agree. It sounds like that's a great initiative to give the provenance of where all of this stuff comes from.

The Future of Developer Roles and Skills

Roman Zhukov: I think we touched a little bit on this already, but what is your opinion about the human-in-the-loop thing and how skills and roles right now evolve around this? Should we fire all of the junior developers? What do you think about evolving roles?

Damian Brady: I think this is probably a good last thought, actually, as we get close to our time limit. Very much in the way that new languages and new techniques and new processes for software development have always cropped up, and then we've made an effort to train upcoming developers and upcoming technical people in those new techniques and those new languages and frameworks and libraries and tools, I think we need to have a concerted effort to train the upcoming developers and junior developers in the tools and the processes that we use now. Which means using the AI tools and working out how to get the most out of them. Admittedly, we're still learning that ourselves. This is still very much a work in progress. There are techniques now, like spec-driven vibe coding, that have really only been around for a few months but are proving to be very useful. I think as we learn these things, we just need to make sure that we're passing that knowledge onto the next generation of developers as well. That's going to take a concerted effort as an industry.

Roman Zhukov: Yes. I also think we should do it evolutionarily. I think we are getting closer to the skill shift from syntax experts to system architects, let's put it that way, because it's not really a big deal right now to generate thousands of lines of code. We don't necessarily need to be major experts even in programming languages that we do with this. But then it comes to review rigor, right, and to all of these things that we need to do after probably the first rounds of generated code are here. We should do better prompting. We should challenge the LLM. I found it particularly useful for myself, for example, if I see something wrong, to try to provide examples. I see how the model actually does it better when you provide certain examples or links or references. For instance, "Hey, use the string copy. Why is that unsafe here? Rewrite this using a bounded string copy function," for instance. That's quite a simple example. But the other thing could be, "Hey, explain the security implications of this change that you have made to this code," or "Don't under any circumstances touch this kind of code or these folders," something like that. This kind of skill, I think, would be very necessary to teach to the next generation of developers, as long as the basics about the architecture, the software architecture, and the coding practices itself.

Damian Brady: Keeping some fundamentals in there for sure.

Roman Zhukov: Yes.

Damian Brady: I've really enjoyed this chat. Thank you. I think there's a lot of change that's just happened and a lot of change that's yet to come as well. It'll be interesting to see how it progresses over the next months, years, decades even.

Roman Zhukov: Yes. I absolutely agree. It's been a great conversation. Thank you.

Contributors

Roman Zhukov

Roman Zhukov

Principal Architect - Security Communities Lead at Red Hat

Damian Brady

Damian Brady

Staff Developer Advocate at GitHub