Ryan Chandler

Fast and powerful syntax highlighting in PHP with Phiki

4 min read

tl;dr I wrote a syntax highlighter. It's pretty good and I had a tonne of fun working on it.

Preface

Syntax highlighting is something that we've all had to add to a site before, probably. Whether it's documentation or our personal sites, figuring out what package or service to use can be a nightmare.

I faced this problem too. Way back when my personal site used Highlight.php, a PHP port of the popular Highlight.js package. Then I tried out Shiki, a JavaScript package that uses VSCode themes and TextMate grammars to highlight code.

Highlight.php was pretty fast but it lacked contextual highlighting which made everything feel really flat. Shiki is a very good package but it's written in JavaScript, so there's a huge overhead when trying to use it from PHP.

I was frustrated with these tools so the only logical thing to do was build my own syntax highlighter in PHP.

Overview

Phiki is a syntax highlighter written in PHP that uses VSCode themes and TextMate grammar files to produce beautifully highlighted snippets of code.

Phiki is really easy to use. Install the package with Composer:

composer require phiki/phiki

And use the Phiki::codeToHtml() method to convert a string of code into a syntax highlighted piece of HTML.

use Phiki\Phiki;
use Phiki\Grammar\Grammar;
use Phiki\Theme\Theme;

$phiki = new Phiki();

echo $phiki->codeToHtml(
    code: "echo 1 + 2;",
    grammar: Grammar::Php,
    theme: Theme::GithubDark,
);

Phiki ships with a bunch of languages and themes out of the box, all of which can be found as members on the Grammar and Theme enums you see above.

Highlighting code for the terminal

If you have a scenario where you need to highlight code for the terminal, you can use the codeToTerminal() method with the same set of parameters to generate a string of text containing ANSI escape sequences for the terminal

echo $phiki->codeToTerminal(
    code: "echo 1 + 2;",
    grammar: Grammar::Php,
    theme: Theme::GithubDark,
);
A preview of some highlighted code in the terminal.
A preview of some highlighted code in the terminal.

CommonMark

Highlighting code blocks in Markdown files is probably the most common use case for a syntax highlighter. Phiki ships with a league/commonmark extension so you don't need to do any manual wiring.

use League\CommonMark\CommonMarkConverter;
use Phiki\CommonMark\PhikiExtension;
use Phiki\Theme\Theme;

$converter = new CommonMarkConverter();

$converter->getEnvironment()->addExtension(new PhikiExtension(Theme::GithubDark));

echo $converter->convert(<<<'MD'
    ```php
    echo "Hello, world!";
    ```
MD);

Register the extension, providing a Theme, and it all just works.

Future

I have a small list of things that I want to do before I tag a v1.0.0. Here's the rough list:

Dual themes

When a site has a light and dark mode, having different highlighting themes for each variant is quite a nice feature. It's hard to find one theme that works well for both colour schemes. The API for this would probably be something like this:

$phiki->codeToHtml(
    code: "...",
    grammar: Grammar::Php,
    theme: [
        'light' => Theme::GithubLight,
        'dark' => Theme::GithubDark,
    ],
);

The generated HTML could then use CSS variables to store various style values for each theme and a bit of CSS could be added to the site to change which CSS variables are used.

Transformers

Phiki already has a "transformer" API that lets you modify things at various stages in the highlighting process. This could in theory be used to add things like [!code focus] to focus a particular line in a block of code.

The transformer API is very rugged at the moment though and I'd like to make it as simple to use as possible. Thankfully I can make these sort of breaking changes pre-1.0.

Performance

There are a few places in the code that have sub-optimal performance. Parsing grammars, transforming RegEx patterns, etc. Finding ways to make this more performant is vital as I want Phiki to be the fastest highlighter on the market.

Pitfalls

The current version of Phiki is my 4th attempt at writing a syntax highlighter similar to Shiki in PHP. I've probably spent anywhere from 50 to 100 hours working on the idea and finally getting to a place where it works and is near-perfect, I'd say that's not bad. I've given up plenty of times before.

Despite all of this hard work, Phiki still has a bit of a problem. It's not perfect.

TextMate grammars contains a bunch of regular expressions which determine what tokens are in a file and how they should be highlighted. The TextMate editor, and subsequent vscode-textmate package, both use a RegEx engine called Oniguruma. PHP uses the PCRE2 engine.

Oniguruma is a very good RegEx engine and has support for a bunch of things. Some of those things aren't supported by PCRE2, which is the main reason why Phiki isn't perfect. In all of my testing, the main blocker is support for "variable-length lookbehinds". Here's an example:

(?<=^\S+)=

This pattern is looking for a = character, but will only match if it is preceded by the start of the line and any number of non-whitespace characters. PCRE2 doesn't support this type of lookbehind, mainly because of the performance implications. My understanding is that the architecture and design decisions in PCRE2 would make adding these sort of lookbehinds painful and the performance would be horrible.

Of the 200+ grammars that Phiki currently ships with, only 12 of them are using variable-length lookbehinds. Others were also using them but I managed to patch them out of the grammar files with some simpler and potentially less-accurate RegEx patterns.

What I'm trying to figure out now is the best way to workaround this problem. The way I see it, I've got two choices.

  1. Remove these grammars from Phiki and let users add them at their own risk.
  2. Write a wrapper around PCRE2 that adds support for variable-length lookbehinds, albeit with a performance penalty.

As a developer who enjoys tackling tough problems, my mind tells me that option 2 is the way to go. Effectively write a RegEx engine in pure PHP, learn some new things and eliminate the problem for good. On the other hand, the amount of time required to build said RegEx engine would be insane and probably not worth it in the long run.

I'd love to know what you think. Let me know on Twitter.

Sign off

Thanks for reading! I'd love it if you gave Phiki a go and provided some feedback on GitHub. There is still a ways to go before tagging v1.0.0, so all feedback is welcome.