Downloading files from the web (Rebuilding Composer in Rust)

Welcome back to the series. If this is your first time here, I recommend starting from the beginning so you don't miss out on cool stuff.

In this post, I'm going to explain how composer require works at a very high level and then start to implement the foundations of my own require command.

How Composer requires packages

So what happens when you run composer require? I'm glad you asked!

Let's say you want to install one of my (excellent) packages. You'd run a command similar to this.

composer require ryangjchandler/blade-cache-directive

The first important thing that happens is Composer tries to determine the "requirements" for the package you're attempting to install. This process involves figuring out whether the package actually exists in Composer's package repository, and if it does, which version is best to install in your project,

After it has figured those things out, it sends the necessary information over to an Installer class which handles actually downloading the package from the correct place, as well as installing any dependencies of the package you're trying to install.

That's a somewhat recursive process since dependencies of a package might also have dependencies, so that happens for each package that needs to be downloaded and installed.

Eventually it will have a list of all packages that need to be installed and where the packages need to be downloaded from.

The package files are downloaded (normally as a ZIP or a "tarball"), extracted into the vendor directory, the autoloader is regenerated and any Composer scripts are executed.

Downloading files from a URL in Rust

It's going to take a few more posts before I get into the nitty gritty of installing real packages. I just want to focus on download a file, specifically an archive file, and extracting it into a directory somewhere.

To do this, I'm first going to need an HTTP client. There are many options in the Rust world, but only a few are truly feature complete in my opinion. My personal choice is a package called reqwest.

I'm going to create a new package in the monorepo for this called crs-downloader.

cargo init crs-downloader --lib

To install the reqwest package, I'll also use Cargo.

cargo add reqwest --package crs-downloader --features blocking

I'm not writing asynchronous code, so I want to enable the blocking features which lets me run synchronous HTTP requests. At some point in the future, I'll introduce async stuff, but I'm keeping it simple for now.

This new crate is going to be responsible for downloading a single archive from a given URL, then returning a value that represents where that file has been downloaded to on disk.

I'll start by creating a new struct called Downloader.

use reqwest::blocking::Client;

pub struct Downloader {
    client: Client,
}

impl Downloader {
    pub fn new() -> Result<Self, DownloadError> {
        Ok(Self {
            client: Client::builder().build().map_err(|_| DownloadError::FailedToCreateClient)?,
        })
    }
}

#[derive(Debug, Clone)]
pub enum DownloadError {
    FailedToCreateClient,
}

The only piece of state that the Downloader has is an HTTP client. Instead of creating a new Client for each request, I can just make a single one and everything will be a little bit faster.

Cool, so how can something be downloaded from a given URL? I'll need a method, say download. That will take a single parameter url.

An HTTP request needs to be made to that url, then the data that is returned needs to be written to a specific place on the disk. I'm going to store things in a temporary directory so that it doesn't clutter a user directory.

Making that HTTP request is very simple with reqwest.

impl Downloader {
    // ...

    pub fn download(&self, url: &str) -> Result<DownloadedFile, DownloadError> {
        let response = match self.client.get(url).send() {
            Ok(response) => response,
            Err(_) => return Err(DownloadError::FailedToDownloadFile(url.to_string())),
        };

        todo!()
    }
}

Using the Client on the Downloader, I can make a GET request to the url and grab the response.

The .send() method returns a Result, meaning a value or an error. If something goes wrong during the request, I can return a simple DownloadError. There's no information in here apart from the url that was being requested – again, keeping things simple for now.

To store the downloaded data on disk I need to store it inside of a folder somewhere. Rust has a standard function for finding the "temporary directory" of a machine, but I want some more helpers. I'm going to use the tempfile package to help me a little.

cargo add tempfile --package crs-downloader

This package provides a TempDir type that will create a temporary directory, and upon destruction of the TempDir value, delete that folder from the disk. This means the value needs to be "owned" by something. I'll opt to store it inside of the Downloader.

pub struct Downloader {
    client: Client,
    tempdir: TempDir,
}

impl Downloader {
    pub fn new() -> Result<Self, DownloadError> {
        Ok(Self {
            client: Client::builder().build().map_err(|_| DownloadError::FailedToCreateClient)?,
            tempdir: TempDir::new().map_err(|_| DownloadError::FailedToCreateTemporaryDirectory)?,
        })
    }
}

As long as the Downloader is kept "alive" in the application, the files that are downloaded will be available on disk.

Taking the data from the response and storing it inside of a file is pretty straightforward too.

impl Downloader {
    // ...

    pub fn download(&self, url: &str) -> Result<DownloadedFile, DownloadError> {
        // ...

        let path = self.tempdir.path().join(
            slugify_url(url)
        );

        let mut file = File::create(&path).map_err(|_| DownloadError::FailedToCreateDestinationFile(path.to_string_lossy().to_string()))?;
        let text = response.text().map_err(|_| DownloadError::FailedToReadBytesFromResponse(url.to_string()))?;

        copy(&mut text.as_bytes(), &mut file)
            .map_err(|_| DownloadError::FailedToCopyBytesToFile(url.to_string(), path.to_string_lossy().to_string()))?;

        Ok(DownloadedFile { path })
    }
}

The code looks kind of crowded, but there's manual error-handling going on all over the place. I could introduce some casting magic, but I'm keeping things simple.

I've written all of this code without any tests so far. Now for rapid development, that's maybe not a bad thing, but I do want to make sure that this actually works.

#[cfg(test)]
mod tests {
    use super::Downloader;

    #[test]
    fn it_can_download_a_file() {
        let downloader = Downloader::new().unwrap();
        let downloaded_file = downloader.download("https://www.rust-lang.org/logos/rust-logo-512x512.png").unwrap();

        assert!(downloaded_file.path.exists());
    }
}

This is a very simple test to make sure that the downloader can actually download a file. Hardcoding a URL feels like it might cause issues in the future if that URL goes missing, but it'll do for now.

I also want a test that ensures the temporary directory and the files are tidied up too.

#[test]
fn it_tidies_up_after_itself() {
    let downloader = Downloader::new().unwrap();
    let downloaded_file = downloader.download("https://www.rust-lang.org/logos/rust-logo-512x512.png").unwrap();

    assert!(downloaded_file.path.exists());

    drop(downloader);
    
    assert!(!downloaded_file.path.exists());
}

running 2 tests
test tests::it_can_download_a_file ... ok
test tests::it_tidies_up_after_itself ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.14s

Perfect! Both tests are passing and I can now download files and store them somewhere on disk.

In the next part of the series I'll start implementing the actual package resolving stuff. That will include using the real Packagist API to grab information about a package, where it can be downloaded and more.