Creating a Python project

In order to better test software, be it a simple API or a full UI based website, one must also learn how to build one such software. In this first post, we will explore step by step what is needed to start creating an open-source Python project based on Flask web framework, host it on Github and make the builds run on CircleCI.

software-development

For this post, the source code is on github under this hash

Pycharm

One can’t code without a good editor. For python, I couldn’t find anyone better than Pycharm. With an interface that has all the normal features I most use (autocompletion, renaming methods, and navigating between files with hotkeys) it also has a built-in PEP8 check on the fly.

Here’s the main screen for our project:

New_App_25_Feb

License, Authors and Readme files

Before delving into environment and code complexities, let’s start with License, Authors and Readme files. Although they’re easy to create and maintain, sometimes they get forgotten and there are several projects out there without them. Best of all, those tips are not python-exclusive 🙂

So, first of all, what open source license to use? This is better answered by choose a license website, that walks you through the differences between them. For myself, MIT license will do — which means, you can do whatever you want with my code, as long as you’re polite enough to say where you copied it from and not blame me when things explode. So, copy the license text from the website into your License file and you’re done.

Authors file is even easier – put your name and email so people can find you in case of questions or buy you a beer.

Readme file is that one who shows the face of your project for everyone. To make them fancy, write it using Markdown syntax. Pycharm has an excellent plugin to edit Markdown files and preview them on the same screen, as can be seen below:

pycharm.markdown

Environment Setup

Although Python is multi-platform, some of its libraries may not be. To avoid that, it would be best to develop and run your code completely inside a Linux-based environment. For those of us who prefer to edit files on Windows, have no fear! Nowadays, editing your files on Windows but running your software on Linux is as easy as ever with a Vagrant based VM. When installing Vagrant, also install VirtualBox and OpenSSH.

Our vagrant file is inside our root folder, so the only line we need to make it possible to have synced files is:

config.vm.synced_folder "../", "/home/vagrant/"

To start the project, I simply go to the vagrant folder:

> cd YourApp/vagrant
> vagrant up && vagrant ssh

To make my life easier, I also found it interesting to group the shell installation steps into a bootstrap.sh file — both Vagrant or anyone who want to manually setup your project can use it. Mine is already ready for Python 3.6 and can be found on Github.

Finally, python packages are required to have a setup.py file that will instruct others on how to run, install and test it. Another recommendation is to use virtualenv, one we will skip due to our Vagrant-based environment. More information about python packages installation can be found here.

Our setup.py will use the License and Readme file we just created, as well as a requirements.txt file — a text file containing all the packages and versions we require. Both setup.py and requirements.txt can also be found on Github.

Finally, we have to tell CircleCI where to look for tests and what python version to use. A simple yaml file is needed. With that, you are also able to put a nice badge of the build status on your readme file 🙂

your_app_circle_badge

YourApp module

With the environment ready, we can start thinking on the module itself. A module in python is any folder with a __init__.py file on it, but a Flask application deserves a little more structure.

your_app_structure

The static folder contains the CSS and JS that will be retrived on our future application. For now, empty files are thrown there.

In a similar way, templates folder holds the Jinja templates we would use. As we are just starting a REST based application, let’s just keep a 404.html there.

Tests folder is a package on itself — note the __init__.py there — and contains all the test files in the project. For now, we only have end-to-end tests, explained in the next section.

Taking inspiration from another Flask posts here and here, our __init__.py initializes the Flask app and the database, as follows:

your_app_init

It uses the configurations defined on another file, config.py, as follows:

your_app_config

To make our life easier, wouldn’t hurt to let a createdb.py ready, as follows:

your_app_createdb

Finally, to make sure our app is up and running (and the tests are actually doing something), let’s create simple endpoints on routes.py:

your_app_routes

 

Tests

Our end-to-end tests will call our APIs directly, as a user would, and check for their responses. It will not use the code entities directly in any way — what Unit tests would do — or combine route file routines directly — what Integration tests would do.

Testing in python doesn’t require any external library – we will relly on Unittest default one here. YourAppBaseTestCase comes from unittest.TestCase and is defined on tests/__init__.py.

A nice idea taken from https://github.com/valermor/nose2-tests-recipes is the use of a groups decorator. In that way, we can combine test-class-hierarchy with any other categorization we want – such as end_to_end test cases versus unit_tests.

Finally, as a big fan of BDD practices, I find it easier to think in tests using that notation of Given/When/Then. The simple tests below help us to verify if our app is ready to go.

yourapp_e2e_tests

And they can be run as:

yourapp_e2e_tests_run

Wrapping up

With all that was set up, only one thing is missing – how to start your app! I like to keep a runserver.py file on the root folder to capture the eyes of everyone as the “main” file.

With the server up and running, we can even open another SSH session and curl-test it ourselves:

your_app_curl_test

Testing your automated tests

Has anybody ever felt the need of tests when refactoring your automated test cases’ framework? I found myself googling with queries such as “test automated tests” and “unit test automated tests”, both hard to search due to the “test” word repetition. To fill up the void, I will share here my last week’s struggles and ideas when creating tests for a part of our python automation test framework at work.

self.healing.robot

We wanted to refactor a certain part of our automation test framework containing several classes, all connected to a common parent. Each one of these classes convert a certain type of data to a standard structure in order to make the comparison of data from different sources an easy job of comparing python dictionaries. Each automated test case will call those several classes with certain inputs, combine them properly before comparing them and asserting no problems occurred doing this process.

Once a big chunk of refactoring was done, we were unsure how these classes behavior would compare to the previous version. Thus, running all the test cases would mean hours of execution time (!). We were lacking tests, that would allow us to check the behavior fast and confidently enough.

Knowing that a test is a set of inputs, outputs, and actions, we just had to find those and put them together. Our actions were the automation framework classes, taken individually, so that was easy 🙂 Therefore, I went hunting for easy to find inputs and outputs on Jenkins logs. One of these classes used Mediainfo to acquire data from a file and convert to our standard structure. Luckily enough, our old version code has been executed very recently and logged the content extracted from Mediainfo and the resultant python dictionary with a standard structure — I will call it StandardMediainfo from now on.

The StandardMediainfo class uses data taken from one (or more) calls to Mediainfo and put it in the standard structure. It uses a MediaInfo class as a constructor input, which in turn takes data from one (or more) files using Mediainfo and converts them to a python dictionary, without changing the data. A Commands class is the one who actually performs the Mediainfo operation using a python subprocess call (along with many other OS commands).

 

To mock or not to mock?

There are many debates about using or not mocks, as in here and here and even an academic paper.

With that stated class organization, the unit tests only needed to mock the actual Mediainfo operation — therefore, a CACHE_MODE static variable was added to Commands class to allow the operation to go to a cached file area and read the content from there instead of applying the actual subprocess call when the variable was set. Adding this control was an easy choice, as mocking (and stubbing) would only mean the introduction of yet another library and yet another complexity layer for the unit tests to deal with.

Another option would be to create a stub for our MediaInfo class that instead of adding this caching logic. The stub would mimic the load behavior but would not call the Command class. Furthermore, the Command class could be a parameter from MediaInfo class’ constructor, allowing us to stub the Command class itself or create a UnitTestCommand class that behaved like our unit tests wanted them to. However, an interesting side-effect of our caching mechanism was that it made the automated test faster — it saved the time Mediainfo program would have spent. As the caching strategy also helps us to debug an automated test as a whole by the set of a flag, we will keep it like that from now.

 

Making our test framework testable

To make our test automation framework easier to build and maintain, we had made very happy design decisions that turned out to make our code easier to test. I will highlight below the most critical ones — many other good ideas can be found in here and in here.

  • Separation of concerns: the action of performing an operating system command, organizing the data and transforming it in a standard format have different purposes and are called in different moments. Therefore, it would make sense to separate them accordingly, which lead us to the creation of the Commands, MediaInfo and StandardMediaInfo classes respectively. Which, in turn, helped us to clearly understand what to test, where our inputs and outputs are and how we will acquire them. Clearly, it didn’t begin like that — we started only with MediaInfo class, that performed all the actions together. Along the way, we noticed the growing need to isolate the system call (and thus make MediaInfo handles multiple files) and the standardization of data (and thus move all the methods and logic dealing with that process into a separate place). Make sure to always review how big and cohese your automation framework classes are and to break them from time to time into smaller units;

 

  • No work in the constructor: our constructors are as shallow as they can be — no class method is ever called automatically. Plus, all the attributes are initialized/declared there (as debated here). It helped a lot to have a safer point of start and all the attributes at one place. In addition, we may want to tweak one of those instance’s attributes in our unit tests in order to avoid stubs in the future. Both those strategies, no work in the constructor and declare/initialize everything in the constructor (even if you’re in a dynamic world like python) make our code readable and safer;

 

  • Extract all non-testable code into wrapper-classes: we don’t need to test if a file was read from the filesystem, or if Mediainfo call worked or not. We want to test if our standardization process generated a certain output for a certain input. Thus, Commands wrapper class helped us to have the other classes focus on data handling and their own logic, which was highly useful. And in the end, the caching strategy directly on Commands class was easy to create and to use, which is another positive point to that one;

 

  • Dependency Injection: we used the idea of injecting a dependency instance when passing MediaInfo class to StandardMediaInfo as a constructor variable, but we avoided it making Command class be instantiated inside MediaInfo class. Our choice was based in the fact that it wouldn’t make sense for an automated test case to deal with Command class itself — this is something the automation framework needs. On the other hand, maybe an automated test case wants to acquire data and check how it was organized even without a standardization of it. At the end of the day, the idea of dependency injection always pops up to help us have more meaningful conversations about classes’ relationships — but we sometimes choose to hide the complexity of things from the tests and have some wrapper classes being instantiated inside framework classes.

Wrapping up

The refactor and unit test work is still not done. Part of the process of the automated tests that depends on Mediainfo operations is being unit tested now, but the automated test cases also interact with external APIs and the UI.  Those parts will be tested in separate unit tests in future posts, so stay tuned!