A collection of weggli patterns for C/C++ vulnerability research

“No one cares about the old scene people anymore, I’m sure,
bunch of old people grepping for the last of the memcpy.”
— Bas Alberts

TL;DR

Go grab my weggli patterns for C/C++ vulnerability research, and hack the planet! 🏴‍☠️

Backstory

The recent update to my Semgrep ruleset for C/C++ vulnerability research sparked quite some interest in the community. Among the feedback I received was this Twitter thread that especially caught my attention:

Is there a collection of weggli queries out there? Ideally something that covers most of @0xdea's awesome semgrep rules? Read his blog btw, it's great (linked on the git)!https://t.co/zynK1cfP5C

— Richard Johnson (@richinseattle) November 29, 2023

Weggli is a small yet powerful tool developed by Felix Wilhelm of Google Project Zero. I had played with it when it came out in 2021, but back then I hadn’t created an organized collection of weggli queries, since in my day-to-day source code review tasks I’ve mostly been using Semgrep. It was time to dig deeper into weggli and see if it could have a place in my vulnerability research workflow, which can be summarized as follows:

Read the documentation to uncover attack surface and interesting paths.
Scan with static analysis tools against interesting paths in the codebase.
Review scan results to quickly mark hotspots in code where bugs may be.
Understand relevant code around the hotspots to identify potential vulnerabilities.
Confirm vulnerabilities via further analysis, dynamic testing, targeted fuzzing, etc.
Identify variants of confirmed vulnerabilities in other parts of the codebase.

Enter weggli

Weggli is a blazing fast semantic search tool for C and C++ codebases, designed to help security researchers identify interesting functionality in large codebases. Its query language resembles code, making it easy to turn interesting code patterns into queries.

Weggli is inspired by Semgrep, Coccinelle, joern, and CodeQL, but makes some different design decisions:

C++ support: modern C++ constructs, such as lambda expressions, range-based for loops, and constexprs are supported.
Minimal setup: it should work out-of-the box against most software. Most importantly, it does not require the ability to build the software and can work with incomplete sources or missing dependencies.
Interactive: most of the time, a weggli query will be faster than a grep search. The goal is to enable an interactive workflow where quick switching between code review and query creation/improvement is possible.
Greedy: weggli’s pattern matching is designed to find as many useful matches as possible for a specific query. While this increases the risk of false positives, it simplifies query creation and manual code review.

Here it is in action:

My pattern collection

In a couple of weeks, I’ve managed to convert most of my Semgrep rules into weggli patterns, and I’ve written some additional patterns that leverage weggli’s unique abilities. Although not strictly equivalent to my Semgrep rules, which are generally more comprehensive and field-tested, these patterns should provide fairly complete coverage. Most of all, they should function as examples that can (and should) be customized for your specific needs.

My weggli pattern collection can be found here. It covers vulnerabilities in the following broad categories:

Buffer overflows
Integer overflows
Format strings
Memory management
Command injection
Race conditions
Privilege management
Miscellaneous

In order to compare these patterns to my Semgrep rules, let’s see how they fare against a familiar codebase. As a sample target, I’ve once again picked Zephyr 3.4.0, which I’ve previously audited with the help of Semgrep.

The following screenshot (you may click on images to view them full-size) shows that weggli could detect a buffer overflow due to strcat() use that I’ve reported to the Zephyr project and that is now fixed:

The following screenshot shows another bug that I’ve discovered in the Zephyr IPM driver. It’s a signed to unsigned conversion error that causes a buffer overflow:

Finally, the screenshot below shows a buffer overflow in the CANbus subsystem, caused by an ineffective size check due to relying on assertions that are compiled-out in production releases:

As a bonus, here’s another couple of simple bugs that I’ve recently found while auditing other open source projects.

Accidental use of the sizeof() operator on a pointer instead of its target in lwIP (this one turned out to be a duplicate bug report):

Wrong order of arguments in call to memset() in FreeRTOS demo code:

These are just a few examples, but you get the idea… As can be seen, weggli is able to find more or less the same vulnerabilities that Semgrep can find. Its query language, however, is less powerful than Semgrep’s. In addition, weggli doesn’t support scanning for more than one pattern at once and has limited support for automation. Still, if you like robust, no-nonsense command-line tools, weggli might be right for you. I hope my patterns can help you approach it and quickly master it.

Conclusion

Weggli’s query language is less expressive than Semgrep’s. Therefore, some Semgrep rules cannot be easily converted into weggli patterns. With weggli, it’s also harder to manage rulesets, automate workflows, and parse output in your favorite code editor. Despite these limitations, this small tool has gained its place in my vulnerability research workflow.

As it turns out, some patterns are actually easier to express in weggli. Its simple and elegant syntax, together with its amazing performance and greedy pattern matching, make it an ideal tool for locating security hotspots in large codebases on which to focus during manual code review. In addition, the ability to provide seamless context switching between code review and query building is particularly useful for variant analysis. Give it a try and see what it can do by yourself!

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.