Blog

2021.03.26

Research

pysen is the new sempai: PFN’s in-house tool for configuring and running Python linters and formatters

Tag

Yuki Igarashi

Engineer

We recently released “pysen”, a python linter/formatter configuration tool.

We use this tool mainly to centralize python linter/formatter configurations so that anyone can easily set up their tool environment. We aim to centralize and share the know-how for setting up tools “as code.” Without a solution like pysen, such information will be scattered across teams. Since its development began in April 2020, pysen has been installed in over 100 private repositories within PFN.

pysen


Above: the transition of the number of configuration files
Below: the transition of the number of Organizations containing at least one repository with at least one pysen configuration file.

Please keep in mind that we do not intend this release to shift the development of pysen to an OSS platform. Our objective is to allow developers, whether or not they are internal members of PFN, to use pysen without our private infrastructure. As such, we are only mirroring the source code from our main repository on our private server. As a rule, we are not open to pull requests.

In this article, we will talk about the background behind the development of pysen.


In collaborative software development, agreeing on a development rule is crucial to maintaining a smooth development cycle. Coding style is one of the essential rules. By establishing a clear coding style, we can eliminate developer-dependent code differences and improve maintainability. We can also reduce interactions in code reviews that suggest insubstantial style fixes. We can then allow our time to more critical issues like discovering bugs or interface designs. In other words, we are circumventing the Law of triviality.

Python is one of the most popular languages in PFN. PEP8 is the most famous coding style in the python ecosystem. However, since PEP8 rules are insufficient to unify the code format, we add other rules in conjunction with PEP8.

Since it is not realistic to have the developer check whether their code satisfies the rules, we use tools on each developer’s environment or on our CI server to lint or format their code. Below is a list of the popular tools in PFN as of 2021.

Developers can set up a reasonably comfortable python development by combining these tools. However, we often come across the following problems.

  • Developers need to write detailed configuration files.
    • Some tools use different filenames (flake8 uses setup.cfg, black uses pyproject.toml, etc.)
    • We cannot honor the DRY principle when we repeat the same setting to different configuration files (e.g., line width, exclude files, etc.)
  • We need to adjust the settings so that tools do not violate each other’s rules.
  • Different tools have different CLI options, so we often devise ad-hoc scripts to call those commands.
    • For example, developers maintain scripts like lint.sh, format.sh, or an equivalent Makefile.

Resolving these issues requires knowledge to some extent, so there is a learning curve for the developer introducing these tools. Each developer or team instills that knowledge. Unless that knowledge is shared appropriately, multiple teams will be reinventing the wheel.

Furthermore, as settings become complicated, we rely on a handful of experienced developers who have enough knowledge to make the configurations. For inexperienced developers who just want to get started, this creates a scenario where they just copy the settings from another project that works. This practice does not contribute to maintainability.

Indeed, we have experienced cases where the set of tools used in each repository is inconsistent. We have also had instances where the execution method of linters and formatters varied across repositories. The variation burdens teams to maintain the scripts and becomes a barrier for members who work on multiple projects. There were even projects that gave up setting up linters altogether because the configuration was too complicated.

pysen aims to tackle these problems. Using pysen, developers can get started with the set of tools widely used within PFN. By choosing pysen, developers can immediately start developing in an environment with standard tools and configurations. They do not need to understand the aforementioned detailed know-hows on the configurations, and corporate-wide know-hows are centralized into pysen “as code.” pysen also acts as a lightweight task runner to call tools appropriately, relieving teams from the burden to maintain scripts like lint.sh.

Let’s look at an actual configuration example. To set up all the tools we mentioned earlier (black, flake8, isort, and mypy), we add the following section to pyproject.toml.

[tool.pysen]
version = "0.9"

[tool.pysen.lint]
enable_black = true
enable_flake8 = true
enable_isort = true
enable_mypy = true
[[tool.pysen.lint.mypy_targets]]
paths = ["."]

When you execute pysen run lint, each tool is called with the appropriate options and tells you whether your code satisfies the coding style.
pysen run lint

If you have violations, you will see the following.
pysen run lint error

pysen run format attempts to fix the violations automatically.
pysen run format

In real life, teams need to customize their configurations (e.g., the target Python version). By defining the substantial configuration in the pysen section in pyproject.toml, the setting is automatically propagated to all the relevant tools.
For example, if you wish to include or exclude specific files from the inspection, we add the following setting.

[tool.pysen.lint.source]
includes = ["."]
include_globs = ["**/*.template"]
excludes = ["third_party/"]
exclude_globs = ["**/*_grpc.py"]

Some teams use not just the Python tools but also yaml or C/C++ formatters. pysen has a plugin feature that allows developers to extend the commands to add arbitrary custom tools.
Custom plugins are distributable as python packages, and PFN centralizes the packages in a repository called pysen-plugins. The knowledge and know-hows instilled among teams are accumulated in the form of plugins, and other teams benefit from reusing them.

For example, if you install the pysen-plugins package and add the following config to pyproject.toml, pysen will inspect files ending with .hxx with clang-format.

[tool.pysen.plugin.clang_format]
function = "pysen_plugins::clang_format"

[tool.pysen.plugin.clang_format.config]
extensions = [".hxx"]

Conclusion

Preferred Networks encourages the 20% rule, where employees are allowed to allot their time on projects that they think will benefit the company. pysen was developed under this rule, and members from different projects collaborated to develop and advertise the tool within PFN.

We will continue to develop pysen and ship our release from the following URL:

We are now accepting applicants for our 2021 Summer Domestic Internship Program!
https://www.preferred.jp/en/news/internship2021/

Footnote:

pysen is a pun for “paisen”, a Japanese slang for “sempai“.

Tag

  • Twitter
  • Facebook