The three basics
In this post, I would like to get back really quickly to three basics of data science for projects that are built for production:
- Logging
- Configuration file
- Unit Test
Logging
Data Scientist use a lot of print statement when building a project, but there is a better way : logging. I use logging a lot to keep track of a project run state and to debug trouble in a project execution. I use mainly the logging module, but I am sure there is many other way to log the execution of a python project.
Configuration file
Every element that is subject to regular change but is not really code should be put in a configuration file. It allow the developer and every technical user of the project to bring change to the project without changing the code. I personally use dynaconf a lot, but again I am sure that there is many other python package.
Unit test
For every project that is built to be run in production for a long time and that should be improved regularly, it is really important to build unit test to allow quick change in the code of the project. I personally use pytest a lot, but ....