GitQL: The data types from the Engine to the SDK
Hello everyone, in the last few months, the GitQL project has become bigger and has a lot of useful features now some tools are built using the SDK, like FileQL and ClangQL, so we have reached the point where we can query any kind of data by defining the Data Schema and Provider and map the data to the GitQL builtin types like Integer, Text, Date, Array …etc. and also can extend the Std library to defining your custom functions, but what if you want to create a new custom data type 🤔🤔🤔! And why do you need a custom type?
The benefit of having custom data types
Imagine you want to create a tool to run SQL queries on matrix vectors. By default, you can’t write a query like this without supporting the type Matrix in the Engine.
SELECT matrix1 * matrix2;
Or, when creating a function that does some calculation on the matrix, what parameter will you use?
SELECT perform_on_matrix(matrix1);
Maybe you think, okay, let’s make the function take Text as a parameter, and we can convert the matrix to a String and pass it. The function constructs the matrix, does the calculation, and returns it as a string again, but what if the end-user passes any string to the function 🤔.
SELECT perform_on_matrix("Hello, World!");
In this case, if you have a type called Matrix,
you can easily make sure that the user can only pass a value with Matrix type, or he will get an error message that Function {} expects the type Matrix but got Text
. And if there is a way to do operator overloading, you can support using the +
operator between two Matrices
.
In Programming languages, you can create your custom type using Struct
or Class
to define the structure, and in some languages like C++, you can overload operators for it, too, but how can we get the same result in GitQL using SQL? How can we create a type that can represent complex values like Audio files, Images, Tree data structure of specific values …etc?
In some Database engines like PostgreSQL, it allows the user to define a type as a composed of other defined types and overload operators with implementation for this type using SQL, but the goal of GitQL SDK is to help you easily build your domain-specific query engine, but what I want here is to be able also to define new types that maybe not composed from primitives like Audio, Video, Image, Abstract Syntax Tree for one Language or even Assembly Instruction as Type, but how 🤔.
So inspired by Chris Lattner's design in Swift and currently in Mojo to move all types, even primitives, from Compiler to be defined in the Standard library, I found that this idea is very good, and if I can implement it in the GitQL to move the types to SDK, any SDK user can easily use interface to define custom types and overload operators for it.
Moving the Types from Engine to SDK
So what I did is I moved the old Types from the Engine level to the SDK level, so now the Parser, TypeChecker and Engine deal with Abstract Type as interface and don’t know what this type represents, but they know what this type can do, for example, if I want this type to work this *
operator I can do this
impl DataType for MatrixType {
fn can_perform_mul_op_with(&self) -> Vec<Box<dyn DataType>> {
vec![Box::new(MatrixType)]
}
fn mul_op_result_type(&self, _other: &Box<dyn DataType>) -> Box<dyn DataType> {
Box::new(MatrixType)
}
}
This means that you can perform the * operator between two Matrices, and the second function defines that the expected type from this operation will be MatrixType, there are similar functions for each operator.
Now then, the parser and type checker find an expression like matrix1 * matrix2,
they will call can_perform_mul_op_with
to check if this operation is valid or not, and if valid, they will call mul_op_result_type
to get the result type from this operation.
Similar to a custom type, we can define custom Values too, for example
pub struct MatrixValue {
pub matrix: Matrix
}
impl Value for MatrixValue {
fn mul_op(&self, other: &Box<dyn Value>) -> Result<Box<dyn Value>, String> {
if let Some(other_matrix) = other.as_any().downcast_ref::<MatrixValue>() {
let value = self.matrix.multiply(other_matrix.matrix);
return Ok(Box::new(MatrixValue { value }));
}
Err("Unexpected type to perform `*` with".to_string())
}
}
Now you can create custom functions that take MatrixType as a parameter or return type to end up with a query like this
SELECT create_matrix([1, 2], [3, 4]) * create_matrix([4, 5], [7, 8])
With this new architecture, I built a new tool called LLQL, which allows users to run SQL queries on LLVM IR or Bitcode and it’s possible to implement the same idea on Java Byte code, Assembly or even Machine code, here a real example from LLQL readme,
Imagine we want to search if there is an Add instruction that has sub instruction as left hand side and mul instruction as right hand side
define i32 @function(i32 %a, i32 %b) {
%sub = sub i32 %a, %b
%mull = mul i32 %a, %b
%add = add i32 %sub, %mull <----- Like this Add(Sub, Mul)
ret i32 %add
}
You can easily search for this pattern using this SQL query
SELECT instruction FROM instructions WHERE m_inst(instruction, m_add(m_sub(), m_mul()))
In that project Instruction column has InstructionType
, m_add, m_sub, m_mul return IntMatcherType
and both of them are custom defined type using the SDK without modifying the GitQL engine itself.
You can read more about LLQL design and implementation from this article: LLQL: Matching patterns in LLVM IR/BC files using SQL query
You can find the full detailed documentation on GitQL website.