Security weaknesses of Copilot generated code in GitHub

belter | 130 points

If a weakness is common, then of course Copilot is going to suggest it. Copilot gives you popular responses not correct ones. Yet if a weakness is common, it also means that human coders frequently make the same mistake as well.

The studies results are rather unsurprising and its conclusions are oft-repeated advice. As many have said, treat copilot’s code in the same light you would treat a junior programmer’s code.

faeriechangling | 2 years ago

> The results show that (1) 35.8% of Copilot generated code snippets contain CWEs

What percent of non-Copilot generated public GitHub repos contain CWEs?

Edit: According to this study, Copilot generates C/C++ code with vulnerabilities, but at a lower rate than your average human coder: https://arxiv.org/pdf/2204.04741.pdf

calibas | 2 years ago

"...The results show that (1) 35.8% of Copilot generated code snippets contain CWEs, and those issues are spread across multiple languages, (2) the security weaknesses are diverse and related to 42 different CWEs, in which CWE-78: OS Command Injection, CWE-330: Use of Insufficiently Random Values, and CWE-703: Improper Check or Handling of Exceptional Conditions occurred the most frequently, and (3) among the 42 CWEs identified, 11 of those belong to the currently recognized 2022 CWE Top-25. Our findings confirm that developers should be careful when adding code generated by Copilot (and similar AI code generation tools) and should also run appropriate security checks as they accept the suggested code..."

belter | 2 years ago

I wonder if it would be possible to rate the code used during the training phase. For example the code could go through various static analysis tools and the result would be assigned as metadata to the code being used to train the model. The final model would then know that a given pattern is flagged as problematic by some tool and could take this into account not just to suggest new snippets but also to suggest improvements of existing snippets. Though I suppose if it was that easy, they'd have done it already.

laurent_du | 2 years ago

As always the statistic is useless without the human comparison. If it improves on human coders, no amount of gnashing and wailing will stop the layoffs.

gmerc | 2 years ago

There's only one weakness specifically identified that I can see.

    print("new user", username, password)
Yeah, not best practice, but also pretty common for development if you wanted to check that everything is being passed to the correct function.
tedunangst | 2 years ago

This is where one needs a hyphen :-)

azangru | 2 years ago

A related headline could be "Security weaknesses of code produced by a junior developer". It says copilot in the product name -> it's not intended to replace the pilots (aka developers) brain.

siva7 | 2 years ago

Did they prompt it to consider security weaknesses?

jncfhnb | 2 years ago