LLQL: Matching patterns in LLVM IR/BC files using SQL query

Amr Hesham
5 min readOct 31, 2024

--

Hello everyone, in the last two months i started to contribute to the LLVM amazing project (You can find my patches here) and in this interesting patch the goal was to match pattern in the LLVM IR and transform it to more optimized form, to do this work i got a recommendation from a LLVM organization member to look at the InstCombine Guide for contributor which can be found here, and i found that LLVM has a very interesting pattern matching functions to match the pattern you want, for example if you want to search for pattern like this (A — B) + B you can write

Value *A, *B;
if (match(V, m_Add(m_Sub(m_Value(A), m_Value(B)), m_Deferred(B)))) {
}

And if you want to search for. (A-B) + B or B + (A — B) you can replace m_add with m_c_add , the extra C is for commutatively.

Value *A, *B;
if (match(V, m_c_Add(m_Sub(m_Value(A), m_Value(B)), m_Deferred(B)))) {
}

I finished the patch and it’s merged 🥳🥳, and i back to work on GitQL query engine, at this time i have created the ClangQL tool which is used to run SQL query on your C/C++ code, and i got one question in my mind, What if i can run SQL query on LLVM IR and be able to perform pattern matching like in InstCombine part 🤔🤔.

The LLQL Project

At the start i was thinking okey, how to represent the information like instructions, and how to make performing pattern matching smooth like in LLVM, also how to provide a clean error message and maybe in the future support operators for those new types.

Representing the LLVM Types and Values

What i want here is to have types that map to LLVM types like like i8 , i32 , pointers , arrays , or vectors…etc, and work with them as if they are primitives in GitQL Query Engine.

Thanks to the GitQL new architecture which allow SDK user to define his own types, values and functions (You can read the full details about the design from this article GitQL: The data types from the Engine to the SDK), i started to create Types to be as wrapper for LLVM values for example LLVMDataType which represent Types in LLVM, and created a value that represent that type

pub struct LLVMTypeValue {
pub llvm_type: LLVMTypeRef,
}

impl Value for LLVMTypeValue {
...

fn data_type(&self) -> Box<dyn DataType> {
Box::new(LLVMDataType)
}
}

and the same idea to represent the LLVMValueRef the code is like this

pub struct ≈ {
pub llvm_type: LLVMTypeRef,
}

impl Value for LLVMTypeValue {
...

fn data_type(&self) -> Box<dyn DataType> {
Box::new(LLVMDataType)
}
}

After creating the Schema and DataProvider i can select instructions from IR files like this

SELECT instruction FROM instructions

and got a result like this

│ instruction                      │
╞══════════════════════════════════╡
│ %sub = sub i32 %a, %b │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ %mull = mul i32 %a, %b │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ %add = add i32 %sub, %mull │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ret i32 %add │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ %result = add i32 %a, 0 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤

Now it’s time to find a way to perform pattern matching 🤔.

Building the Pattern Matcher

What i want is to be able to perform query like this

SELECT instruction FROM instructions WHERE m_inst(instruction, m_add(m_sub(), m_mul()))

This match for Add instruction that as Sub Instruction as Left hand side and Mul instruction as right hand side, in LLVM IR it can be like this

%sub = sub i32 %a, %b
%mull = mul i32 %a, %b
%add = add i32 %sub, %mull <------- Like this add

`So my idea was that lets think of this Add as Node that has two children, each one of them is another node, so what i want is to have a tree structure of Patterns that each node can be matched or not againts that same node in the real IR tree,

So i created a new Type that represent the Matchers so i can make sure that user can only pass the correct type to my new matchers, I called it InstMatcherType for Instruction Matchers and TypeMatcherType for data types matchers, then created a values for them and also a Matchers functions so what the functions does is to build and connect the nodes to endup with full pattern and m_inst will match, so the Matcher node is like

pub trait InstMatcher: DynClone {
fn is_match(&self, instruction: LLVMValueRef) -> bool;
}

And now i can create BinaryMatchers that can match his two childrens too, or UnaryMatchers, …etc

The first function i created as m_inst which take Instruction and Pattern and return true if the instruction is matched with the pattern, and now i created the m_add function which take optional two other InstMatchers for Left and Right sides, so the implementation is like this

fn match_add_inst(values: &[Box<dyn Value>]) -> Box<dyn Value> {
let (lhs_matcher, rhs_matcher) = binary_matchers_sides(values);
let matcher = ArithmeticInstMatcher::create_add(lhs_matcher, rhs_matcher);
Box::new(InstMatcherValue { matcher })
}

fn match_sub_inst(values: &[Box<dyn Value>]) -> Box<dyn Value> {
let (lhs_matcher, rhs_matcher) = binary_matchers_sides(values);
let matcher = ArithmeticInstMatcher::create_sub(lhs_matcher, rhs_matcher);
Box::new(InstMatcherValue { matcher })
}

fn match_mul_inst(values: &[Box<dyn Value>]) -> Box<dyn Value> {
let (lhs_matcher, rhs_matcher) = binary_matchers_sides(values);
let matcher = ArithmeticInstMatcher::create_mul(lhs_matcher, rhs_matcher);
Box::new(InstMatcherValue { matcher })
}

and the signature is like

Signature {
parameters: vec![
Box::new(OptionType {
base: Some(Box::new(InstMatcherType)),
}),
Box::new(OptionType {
base: Some(Box::new(InstMatcherType)),
}),
],
return_type: Box::new(InstMatcherType),
}

Thats mean we can call it with 0 arguments to match any Add Instructions, for example

SELECT instruction FROM instructions WHERE m_inst(instruction, m_add())

Or with arguments to match add and then match childrens too

SELECT instruction FROM instructions WHERE m_inst(instruction, m_add(m_sub(), m_mul()))

You can query how many times this pattern exists in each function

SELECT function_name, count() FROM instructions WHERE m_inst(instruction, m_add(m_sub(), m_mul())) GROUP BY function_name

And now you can perform pattern matching againts LLVM IR or BC Instructions.

What is next!

The current state is that you can build your pattern with matchers for Arithmetic, ICMP and FCMP matchers, and there are a lot of other matchers that we can support and we can perform more deep analysis, feel free to write feedback, report issues or fix bugs, everyone is most welcome.

The project is free and open source, you can find The LLQL repository on Github and don’t forget to give it a star 🌟

I am looking forward to your opinion and feedback 😋.

I hope you enjoyed my article and you can find me on

You can find me on: GitHub, LinkedIn, and Twitter.

Enjoy Programming 😋.

--

--