Code generator: DTO phase, status

This article is part of a series about the code generator. Other posts in this series can be found here!

Let's see where are with this code generator thingy.

I split the code generation into different phases where the first one is generating the Dto classes based on the OpenApi yaml file. The second one is generating test cases for the Dto classes.

The following sections discuss what the code generator can do at the moment.

Configuration file

The code generator is configured by a configuration file in json format. I chose this because the command line api can accept a maximum of up to 8 parameters. It is a bonker..., but increasing the parameters is already requested.

From the command line, the generator requires only the path of the configuration file and from that point on it does the job.

Since the configuration is a json file, with its freedom in terms of schema, working with it can be pain in the rear. I created a json schema to support editing these files. The schema needs to be referred in the configuration file and the IDE (at least Jetbrains stuff) can pick the schema up and provide intellisense and validation. For real examples please take a look at the tests.

{
    "$schema": "https://raw.githubusercontent.com/EncyclopediaGalactica/RestApiSdkGenerator/main/configuration.schema.json",
    "openapi_specification_path": "Dto/PreProcessing/FileName/filename_preprocessing_should.yaml",
    "target_directory": "Dto/PreProcessing/FileName",
    "solution_base_namespace": "EncyclopediaGalactica.RestApiSdkGenerator.Generator.Tests.Unit.DtoInfoCollection",
    "skip_dto_generating": true,
    "skip_dto_tests_generating": true,
    "skip_request_model_generating": true,
    "skip_request_model_tests_generating": true,
    "test_mode": true
}

What I need to decide here is how to validate the configuration. When the generator parses the configuration file it will be deserialized to a POCO and I run FluentValidation against it. The thing I need to be consider here is that Json.Net can use the schema for validation, however, I don't know how friendly the error message is. I'm really satisfied with FluentValidation and it is very easy to integrate. Performance is not a concern (we are talking about a code generator being run once in a while or most frequently in a CI pipeline), but rather error handling and informing the user and providing well-detailed and easy-to-action messages are the real concern.

The generator can collect all the necessary information for generating a valid c# file using Handlebars templates. I implemented a few defensive things like checking whether the letter is a capital letter after the dot in the namespace. If not the code modifies the namespace string. The same happens with the property names. They should start with a capital letter however in the openapi yaml file they start with a small capital letter.

Not all the possible problems are covered because there is a dilemma here. All the rules like this can be exported into a yaml validator. Or the generator can throw an exception and stop the execution. Or the generator spits out warnings to the command line. There is a decent amount of things to be considered here. As I move forward I'll figure out what is good for me here.

There are things I don't check yet. Reserved words...

Generatingn Dto classes

Not a lot to say here.

Testing

I have two test levels. The first one concentrates on how the code generator transforms the information from the yaml file to a data structure for compiling the template. The second one is about the real code generation. There is a lot of things to be done here and a lot of testing opportunity. When I worked for IBM and worked on the generator I missed this part of the testing. We tested the generator changes always running the whole generating process. It is java, it uses maven and the whole is just fucking slow!

The consequence of the above is that properties storing the data for template compiling are public (only get; and not set;). From code logic point of view they could be, rather must be, private, but it reduces testability.

The other decision I made is to make the phases possible to disable. As a result the execution is limited and fast. Let me show you an example. When I test how well the code generator processes the Dto related information I don't need the generation phase to be run. So it is disabled.

I made the decision that every single piece of the code generation process will be tested. I don't like when we have a few tests but they are covering hundreds of small things and you have to dig and figure out why they are failing. The consequence of this decision is that I'll have a lot of tests and higly probable duplicates.

The same granularity is difficult when I test the result. In this case the test code reads the reference and the result file and compares them line by line.

Code base and "architecture"

This is something I've been considering... I mean, if I just put all the code in an ugly big class it will work just fine. If I start chopping the codebase into smaller pieces I might create a problem for myself because I won't gain any extra, but I waste time on making the code look better.

The third aspect is that I do not plan to generate any other than c# code.

But...

Organizing the code in a meaningful way is a great learning opportunity. For example:

  • pulling up the execution control (making it abstract)

  • separating language properties (for example dealing with reserved words in C#) from how the processing of these happening (generalization and using composite pattern or so...)

  • generating OOP lang, like C# and java, and script lang like js

The whole might increase testability, but I don't see this clearly yet.