Sometimes there are things that start very small but quickly turn into massive projects. I had a simple idea for a Mastodon app. However, the existing PHP implementations turned out to be incomplete, outdated, or otherwise of subpar quality. So, I decided to build one myself.
I wrote about this earlier in Smart Generics in PHP. There, I established a solid foundation, and I implemented the first few API methods. So far, so good.
However, the Mastodon documentation documents around 200 different API methods and about 65 different entities. If I were to implement all of those manually...
I really wanted to create a complete implementation, so I needed to come up with something else.
I searched for it, but unfortunately, there's no complete OpenAPI specification available for the Mastodon API. Such a machine-readable file describes an entire API, enabling (large parts of) the code to be automatically generated for both client and server, as well as generate nice user documentation.
When building an API, it's always useful to start with the specification and keep it as the basis for everything. Doing it afterward and keeping it up to date would ultimately take much more time and effort.
Specs Based on the Test Suite?
During my search, I found an interesting alternative approach: AppMap. This is a tool to gather runtime data while the code is being executed. This data can be later analyzed and visualized or converted into other formats. The idea is that by running the Mastodon test suite with the help of AppMap, you could automatically generate an OpenAPI spec.
Unfortunately, when running the Mastodon test suite with AppMap enabled, I encountered tests that failed suddenly, while they all passed without AppMap. I contacted the people behind AppMap via their Slack channel, and they confirmed that it's a bug in AppMap, which hasn't been resolved yet.
I was able to generate a sort of API spec based on the tests that passed, but it was far from complete and therefore unusable for my purpose. It was also questionable how useful this approach could be, even without the AppMap bug. Because even with 100% test coverage, there would still be missing metadata (such as documentation) to generate a truly good and comprehensive specification.
A Markdown Parser
So, I was back to square one. The only option I saw now was parsing the documentation markdown. At first glance, it seemed quite "parse-able": the structure appeared uniform enough, and all the necessary data was present.
For an initial proof of concept of the parser, I turned to an old friend: Perl. It's not a language I use often these days, but Perl (combined with some Regexp wizardry) is well-suited for quickly parsing and manipulating textual data, better than PHP, I'd say. In a short time, I hacked together a few quick and dirty scripts that parsed all relevant markdown files and generated a neat JSON structure of entities and methods.
PHP Code Generation
I had never done that before. Well, I hadn't gone beyond generating empty base classes that you then manually filled in. Code generation isn't that different from HTML generation, but you do face some domain-specific challenges. The syntax of PHP code is, of course, stricter and more complex than that of HTML, especially when your code also has to pass PHPStan checks. For example, you have imports that need to be unique – if you want to import both
\Bar\Request, you need to create an alias and refer to it in the rest of your code:
use Foo\Request; use Bar\Request as BarRequest; // ... $request = new BarRequest();
Naturally, this needs to work correctly when the list of imports is dynamic (and sometimes quite lengthy). When doing code generation, you need to think about your code at a higher abstraction level. It was quite a learning experience but in the end I had a nice "framework" for code generation.
I decided to generate some additional convenience with a collection of method proxies that you incorporate into the client. Those proxies would do nothing more than instantiating the correct request and send it through the client. But this way, you don't have to manually search for the right request classes – your IDE can assist you. It does introduce some code duplication, but this one actually adds value.
And now for some testing...
By now, I had progressed for a few weeks, but now the tedious work began: testing. The Mastodon documentation is written for and by people, so not everything is described in exactly the same generic way. There were quite a few inconsistencies, edge cases, custom parameters, etc., causing my JSON (and as a result my code) to not always be correct and/or complete. With my first parsing attempt, maybe I got about 80% parsed correctly, which is not bad. But you know how it goes: that remaining 20% took up about 80% of the time.
I basically had to go through every method and entity, check it for completeness and correctness, and adjust my parser and/or code generation each time I encountered an issue.
For the API entities, I could partially automate that. Almost all entities have a piece of example JSON in the documentation, and I could use that to auto generate unit tests using my code generator. This allowed me to spot and fix quite a few issues, and as a bonus my test code coverage improved significantly.
But that still left a lot of manual work. It was quite a grind that took a lot of time. My parser got better and better, but I found a few inconsistencies in the documentation that I couldn't work around in my code. So, I eventually forked the documentation, made the necessary changes myself, and created a PR for it.
Finally: Version 1.0! 🎉
When I finally reached version 1.0 after a few pre-releases, I popped a bottle to celebrate. All in all, it has been quite a challenging and time consuming project. I hadn't really expected this when I decided to "just" build an API client. But it was an educating experience, and I'm proud of the end result. I hope it will a useful library that others will enjoy using!