Writing a Static Analyser for PHP in Rust - Rules

php
Table of Contents

In this shorter part of the series, we'll start to build up an API for our analyser's rules. The idea here is that we want to be able to store as much information as we'd like on the rule itself, but still have a consistent API for executing the rule.

The best way to do this in the form of a trait. Traits in Rust behave like a mixture of PHP's abstract classes and interfaces. You can define contractual methods that the implementing structure must define, whilst also providing default implementations for methods.

Let's think about what our Rule will need to do.

The first thing Rule needs to do is check whether or not it should be executed for a particular statement or expressions in the AST. Let's call this method should_run().

pub trait Rule: Debug {
    fn should_run(&self, node: &dyn Node) -> bool;
}

The second method will be responsible for actually checking the Node. We'll keep it simple and only pass in the Node and DefinitionCollection since we don't have any form of Scope just yet.

pub trait Rule: Debug {
    fn should_run(&self, node: &dyn Node) -> bool;
    fn run(&mut self, node: &dyn Node, definitions: &DefinitionCollection);
}

Right now, the run() method doesn't need to return anything. Soon enough, it will be able to write error messages to some sort of buffer that will later be output in the console.

With the Rule trait in place, we should create a generic Analyser structure that stores all of the registered rules. The rules will be stored inside of a Vec<Box<dyn Rule>>.

#[derive(Debug)]
pub struct Analyser {
    pub rules: Vec<Box<dyn Rule>>,
}

INFO

Since we want to store anything that implements the Rule trait, we need to wrap all of those structures in a Box. Trait objects (dyn Trait) in Rust are zero-sized, meaning the compiler isn't able to reliably allocate memory for them on the stack.

Wrapping them in Box will result in a heap-allocation where knowing the size at compile-time isn't required. This is also required when you want to implement a recursive structure, such as an AST or linked list since it's incredibly difficult to calculate the size of a potentially infinitely recursive bit of data.

Let's also add some helper methods to make registering rules easier.

impl Analyser {
    pub fn new() -> Self {
        Self {
            rules: Vec::new(),
        }
    }

    pub fn add_rule(&mut self, rule: Box<dyn Rule>) {
        self.rules.push(rule);
    }
}

The API here is a little annoying since you're required to wrap the Rule in a Box yourself, but I'm okay with that.

Now that we have an Analyser, we can provide it with the DefinitionCollection that we have already obtained. This will be provided via the constructor.

impl Analyser {
    pub fn new(definitions: DefinitionCollection) -> Self {
        Self {
            rules: Vec::new(),
            definitions,
        }
    }

    // ...
}

Let's also initialise an Analyser in the command handler.

pub fn run(_: AnalyseCommand) {
    let files = discoverer::discover(&["php"], &["."]).unwrap();
    let mut collector = DefinitionCollector::new();

    for file in files {
        let contents = std::fs::read(&file).unwrap();
        let mut ast = pxp_parser::parse(&contents).unwrap();

        collector.scan(&mut ast);
    }

    let collection = collector.collect();
    let mut analyser = Analyser::new(collection);
}

We should also start to think about an API for storing errors and messages from the analyser. This structure will be passed along to each Rule that gets executed and will be used to push messages to the output. For now, we can call this the MessageCollector.

#[derive(Debug)]
pub struct MessageCollector {
    file: String,
    messages: Vec<String>,
}

impl MessageCollector {
    pub fn new(file: String) -> Self {
        Self {
            file,
            messages: Vec::new(),
        }
    }

    pub fn add(&mut self, message: impl Into<String>) {
        self.messages.push(message.into());
    }
}

INFO

Accepting a value that implements the Into<String> trait will allow us to use String, &str or a custom Message object instead of being strictly limited to String.

Each file that gets analysed will have it's own MessageCollector. Those will then get collected into their own Vec<MessageCollector> that we can iterate over to output in the terminal.

impl Analyser {
    //...

    pub fn analyse(&mut self, file: String, contents: &[u8]) -> MessageCollector {
        let mut message_collector = MessageCollector::new(file);

        let parse_result = parse(contents);
        if let Err(error) = parse_result {
            message_collector.add(error.to_string());
            return message_collector;
        }

        let mut ast = parse_result.unwrap();

        return message_collector;
    }

    // ...
}

The Analyser is now able to accept a filename and the contents of that file, parse it and return a MessageCollector. We're starting to do a little bit of error handling now too.

If the parser fails and returns an Err(ParseError), we can convert the ParseError to a string, add it to the MessageCollector and output the error in the console.

Let's hook this up to the command to see if the parser errors are produced correctly.

pub fn run(args: AnalyseCommand) {
    let files = discoverer::discover(&["php"], &["."]).unwrap();
    let mut collector = DefinitionCollector::new();

    for file in files {
        let contents = std::fs::read(&file).unwrap();
        let mut ast = pxp_parser::parse(&contents).unwrap();

        collector.scan(&mut ast);
    }

    let collection = collector.collect();
    let mut analyser = Analyser::new(collection);

    let contents = read(&args.file).unwrap();
    let messages = analyser.analyse(args.file, &contents);

    dbg!(messages);
}

If we create a bad PHP file with a syntax error:

<?php

1 +

This should produce a parser error.

cargo run -- analyse ./playground/parse-error.php
[src/cmd/analyse.rs:27] messages = MessageCollector {
    file: "./playground/parse-error.php",
    messages: [
        "[E002] Error: unexpected end of file on line 3 column 4\n",
    ],
}

And there we go! The parser error is now being added to the collector. The formatting of that error could use a little love still, since it's using the format that the parser provides out of the box but that's a problem for a future version of me.

In the next part, we'll start to actually write our first Rule and add in the necessary bits and pieces to make that work.

All of the code for this step can be found on GitHub.

Enjoyed this post or found it useful? Please consider sharing it on Twitter.