15

Generally, modules should not have side effects. However, in a lot of cases, the side-effects are hard to avoid or may be desirable. There are also popular packages with on-import side-effects.

Which side effects, and when, are allowable when developing a package? For instance, these seem on the border of acceptable and not-acceptable:

# Creates a logfile
logging.basicConfig(filename="module.log")

# Write output to a logfile
logging.info("Module initialized successfully.")

# Add to PYTHONPATH to find a required module
sys.path.append(PLUGIN_DIRECTORY)
import my_plugin

# Replace a function with a patched version
# (from __future__ import ... does similar)
@functools.wraps(math.cos)
def _polyfilled_cosine(theta):
    return math.sin(theta + math.pi / 2)
math.cos = _polyfilled_cosine

# Register a class or method with some dispatcher
# (Often this is done with a decorator or metaclass,
# i.e. Flask's @app.route(): https://flask.palletsprojects.com/en/2.3.x/patterns/packages/)
class PngHandler(filetype_registry.GenericHandler):
    pass
filetype_registry.register(PngHandler, '.png')
  • For logfiles, it's more effort to defer their creation until the first write than it is to simply create one, but if you create a logfile it may be created when code is imported, and then never run.
  • PYTHONPATH manipulation is often something to be avoided, but I've encountered many modules that add a subdirectory or a plugin path to PATH.
  • Monkey patching has its issues, but when the situation calls for it I'm not sure import monkeypatch; monkeypatch.apply_patch() is any better.
  • Registering classes in some way (like Flask does) seems very common, but it means that importing a package can have dramatic side effects.

In practice, it seems like many libraries have an object (potentially a singleton) whose constructor handles the initialization, so instead of my_library.do_thing() the semantics are ml = my_library.MyLib(); ml.do_thing(). Is this a better design pattern or does it just kick the issue down the road?

9
  • 4
    "However, in a lot of cases, the side-effects are hard to avoid or may be desirable." No and no. This "technique" is a cancer.
    – freakish
    Commented Jul 11 at 4:48
  • 8
    Your comment on monkey-patching says "# (from __future__ import ... does similar)". This is inaccurate. __future__ import effects are scoped to the module that performed the import from __future__, your monkey-patching of math affects everyone who imports math (aside from the occasional module that was loaded first and imported with from math import cos). Module-global affects are often okay (though even then, don't confuse your maintainers), but modifying (as opposed to adding to) program-global state is basically never acceptable. Commented Jul 11 at 12:17
  • 6
    basicConfig, for example, should never be called from a module. It's up to the script/application to configure logging; modules simply expose loggers to be configured.
    – chepner
    Commented Jul 11 at 13:48
  • 1
    Kicking the issue down the road is the basic idea of the "functional core, imperative shell" design pattern.
    – chepner
    Commented Jul 11 at 13:51
  • 1
    Not sure how applicable this is for Python because of the way importing a package implicitly adds all the definitions to the current scope so I'm leaving it as a comment instead of an answer, but my rule of thumb from other languages with an explicit export mechanism is that a module should have exports or side effects but not both. Commented Jul 12 at 11:19

4 Answers 4

22

logging

# Creates a logfile
logging.basicConfig(filename="module.log")

No, don't do it! Now git status is dirty, it shows an untracked file.

Protect this with a __main__ guard, or let __init__ do the setup. Follow the documentation's advice.

And sending an .info() message the console or a file is similarly a no go. Unit tests, and imports, should succeed silently. For example, here is a broken unit test:

$ python -m unittest *_test.py
.....DEBUG: x=34, y=10
..
----------------------------------------------------------------
Ran 7 tests in 1.286s
OK

And here is a proper test which succeeds silently:

$ python -m unittest *_test.py
.......
----------------------------------------------------------------
Ran 7 tests in 1.245s
OK

sys.path

# Add to PYTHONPATH to find a required module
sys.path.append(PLUGIN_DIRECTORY)

No. Fix up PYTHONPATH before you even invoke the interpreter. After all, it will control startup behavior before your module even gets control.

You have several options. Adjust the install script or the install instructions so your users will have a suitable env var, perhaps set in .bashrc. Use a companion Bourne script or Makefile recipe to adjust the path. You can even start with a shebang like this:

#! /usr/bin/env -S PYTHONPATH=/opt/mypackage/plugins:. python

patching

If package foo depends on baz, it seems reasonable that it might import and monkeypatch baz. I have never been burned by that. Had I imported baz, then someone silently patching it behind my back would be troubling.

To patch math.cos seems rather adventurous. Better to access the modified behavior via a new name such as _polyfilled_cosine() or cos1().

registry

Creating a module level app object and registering @app.route()s is perfectly fine; we are using them as intended. They produce no annoying diagnostics or other externally visible side effects. They do not, for example, interfere with a "second caller", a unit test, silently importing the module and exercising helper functions.

OTOH, app.run() should certainly be protected by a __main__ guard.

deferred execution

the semantics are ml = my_library.MyLib(); ml.do_thing(). Is this a better design pattern ?

Yes, this is significantly better than doing the thing at import time. Wait until the __main__ guard or some function or method is actually requesting that the thing be done.

If this module was imported by another module which merely wanted to verify a compatible version number, then triggering and diagnosing any of do_thing()'s numerous potential failure modes is more complexity than warranted.

circular imports

While we're on the topic, here's another thing to avoid. Do not let module a depend on b and b depend on a. Even if you can get import a to "work", you have probably left behind a landmine which some unit test that attempts import b will trigger. And no, don't ask the unit test author to "just import a" instead.

Future maintenance engineers should be able to freely use both forms, import x and from x import foo, for any of the modules. Which implies "no circular references!".

8
  • 1
    Not sure why you're fine with registries. Is it that you're talking of "creating a module level object" only? I don't know flask, but the filetype_registry in the OP looks very much like a global to me.
    – Bergi
    Commented Jul 11 at 11:59
  • 1
    @Bergi: Registries can be more or less okay, as they (typically) add to program-global state; if the state is granular enough (highly unlikely to conflict between modules), this can potentially be okay. Basically everything else the OP suggests modifies program-global state that is essentially guaranteed to produce conflicts if multiple modules consider themselves allowed to do it. I agree that it's generally best if such registration is handled by a method called after import, but I've occasionally registered a new codec or error-handler with the codecs module for instance. Commented Jul 11 at 12:22
  • 1
    Re monkey patching: There are cases that are more benign. For example, you might have a package foo containing a module foo.bar that could be provided by different source files depending on some runtime-specific state (e.g. "did we compile the C extension, or are we using the pure Python fallback implementation?"). So foo/__init__.py can and should figure out which implementation of foo.bar will be used, import it by whatever means, and store it in sys.modules. Then import foo.bar works normally and clients don't have to think about this implementation detail.
    – Kevin
    Commented Jul 11 at 18:35
  • The python docs have well written explanation how to handle logging in a library. The most important piece of info to know: Use the NullHandler to have logging in your library that doesn't do anything unless the user requests it. docs.python.org/3/howto/logging.html#library-config. Maybe add that to the answer?
    – blues
    Commented Jul 12 at 6:35
  • 1
    Hi @TankorSmash. I fear I was playing a bit loose with an informal definition of "silent". Please see the revised Answer text. Do you agree with that? (And clearly one could choose to crank up logging verbosity for a particular test run as part of an interactive edit-debug cycle, as long as the default behavior is silent.)
    – J_H
    Commented Jul 13 at 23:09
13

However, in a lot of cases, the side-effects are hard to avoid or may be desirable.

They are never hard to avoid, and they are never desirable. Or maybe they somewhat are hard to avoid, because it requires mindset change. Which as we know is hard for many people.

For logfiles...

What if the file lives on a network drive? Now you not only create it at import, but also do networking at import. While being unsure that you ever use it.

In fact I had this case in one of my jobs. It would took a webserver around 5s to start, because one import would do a networking. So 5s before we could even check that server is up and running. Bad.

And secondly, where is that file's path coming from? Hardcoded? Really bad. Configured in other module? Order of imports matter, bad. Taken from env? Implicit, hidden dependency, bad.

PYTHONPATH...

That's even worse. I don't know a single case when manipulating env variables (rather then just reading them) is beneficial. Except for situations where a library requires it because it is badly written.

The set of environment variables is a global singleton. What can go wrong?

And doing this at import means that now order of imports matter.

Monkey patching...

Cancer. Do not modify behaviour of existing code at runtime. This leads to more confusion and is so error prone. I never once used monkey patching in production code.

I know that patching is popular in testing, but even there, well designed services, abstractions and passing of dependencies is simply superior.

And again, if you do this at import, then the order of imports matters. Counterintuitive.

Registering classes...

Or more generally decorators. Yes, these are somewhat acceptable, as long as these don't do too much.

Registering classes through decorators is quite popular but I would argue that it is unnecessary. In case of any major side effect (i/o) you again block an import. Why?

Secondly, you tie a certain behaviour with a class. Why is that necessary? It limits its reusability.

Thirdly what is registered or not depends on whether it is imported or not. The entire registration is distributed over the source code, and you have to ensure that all classes are imported. And potentially in correct order. Counterintuitive and error prone.

Wouldn't it be better to have a single explicit function that does all of that?

In practice, it seems like many libraries have an object (potentially a singleton) whose constructor handles the initialization

Yes, it is better. You don't pay for what you don't use. And you don't have hidden behaviours hard to track.

When you code, the first question you should ask is: will other people have trouble understanding my code? Any implicit, hidden behaviour, especially at import simply goes against this principle.

For example, I remember once I worked on a web server, which ate 200mb ram at startup after a change. Kubernetess then would kill it because of low memory limits. We couldn't figure out what is going on, no code seems to indicate such behaviour. Until we found out that some lib was building a cache at import. A huge one. And we didn't even use that piece of the lib.

Amazing what kind of libs people create and are proud of...

7
  • 2
    The answer is correct, but arguments are weak. Log configuration is bad due to ill-defined order of imports, leading to undefined logging configuration. sys.path is used for plugin systems, so can't be that bad. polyfill is known and battle-tested technique. Registration and singletons are necessary evil and an the problem of never importing the module is non-existent, as if you don't import it, you don't need its functionality and registrations are irrelevant then.
    – Basilevs
    Commented Jul 11 at 7:47
  • 9
    @Basilevs (1) modifying sys.path is horrible, and I don't care how much it is used. That can only prove how bad Pythons community is at designing code (2) registration (at import level) and singletons are not a necessary evil, that's ridiculous. It is this kind of mentality I referred to in my first paragraph. (3) the problem of distributed code is that you may miss that you need that import until the code is in production. Of course tests help here, but still reasoning about such distributed code (what is registered, when and in what order) is just hard.
    – freakish
    Commented Jul 11 at 7:53
  • 2
    Ad (3) and in fact people will create a single file which does all the imports only to mitigate that. In which case it would be better to simply wrap those registrations with a function to avoid potential side effects at import. And guess what: I've seen people doing this as well, importing stuff in a function. All of that only because of the design of registration framework. Horrible.
    – freakish
    Commented Jul 11 at 7:56
  • 2
    @freakish You're right; most of this stuff only exists because Python's module system is incredibly badly designed. Import should never have been allowed to run code in the first place.
    – pjc50
    Commented Jul 11 at 9:30
  • 1
    @Basilevs I really don't know how a module can be used without importing it. Plus that's not even a topic here.
    – freakish
    Commented Jul 11 at 15:24
3

One big issue with your example logging code is that both those lines of code could fail for a multitude of reasons (read-only filesystem, invalid path, lack of filesystem permissions, etc). These errors will be extremely difficult to work with. I can technically wrap a import statement in a try block and catch any exceptions, but how am I supposed to handle those exceptions intelligently this early in my program? The import was the first line in the file, I don't even have any of my support code in scope yet. There's no clean way for me to try to fix the problem and retry the operation. There's also no clean way for me to provide your library with enough information to avoid the error in the first place (like a path to a usable log directory). If this functionality was instead inside a regular my_module.init_log() function, then handling these sorts of problems becomes a straightforward task.

Don't do anything during import that involves non-trivial error handling. You're just making it significantly harder to handle errors. You definitely don't want to take an otherwise-recoverable error and promote it to a fatal error simply because you wanted to run the code during import for some reason.

-3

TL;DR

Side effects are unacceptable until they are unavoidable.

Convention

Python libraries are "quick and dirty" - they rely heavily on global shared state and there is little to be done about that. Eventually your module would have to interact with these and contribute to chaos.

Python programs vary wildly in scale - from a single page to megabytes of code. In the first case anything goes, in the second - any surprise is fatal.

Therefore there can't be hard and fast rule about side effects. Just avoid them unless it is impossible in your context.

Non-solutions

Init method

Introducing an init() method that would perform side effects for your module is mostly meaningless - it does not change dependency topology and does not eliminate the problem of shared global state. There are rare exceptions, where a two-step initialization helps, but they smell.

Lazy initialization

If you call an init() method from other methods of your module you introduce additional uninitialized state is your module, exacerbating the problem of global state, not fixing it.

Examples

In your listed examples only logging.basicConfig is a critical bug as it has effect only on first call and there is no way to ensure call order from an imported module.

I can find scenarios where other examples are unavoidable.

10
  • 8
    -1: your code does not have to contribute to chaos. That's a terrible advice.
    – freakish
    Commented Jul 11 at 15:38
  • This is not an advice. It is a fact. The only way to eliminate side effects is to work in purely functional language, so idea that it is possible in Python is naive at best.
    – Basilevs
    Commented Jul 11 at 23:58
  • 2
    No one is talking about eliminating all side effects. That is impossible, even in functional language. Only about eliminating them from global scope, so you can import a module without side effects. There's nothing naive about that, it fact language itself should not allow that. And most certainly you don't have to do that only because others do that, good luck with such mentality.
    – freakish
    Commented Jul 12 at 5:39
  • 2
    You are just wrong. Just because something can be done, doesn't mean it should be done. Go to the docs: docs.python.org/3/library/logging.html and see for yourself how Python std encourages to actually wrap logging initialization with a function to actualy avoid side effects at import. I've removed my previous comment because this discussion is simply ridiculous.
    – freakish
    Commented Jul 12 at 10:06
  • 2
    -1: A number of other answers suggest strategies and solutions for avoiding side effects on import. On top of the bad advice, this answer claims that it is impossible to avoid these side effects, and even that it is impossible to design good programs in Python. Just because some libraries are poorly designed does not mean it is impossible to write good code. Commented Jul 12 at 19:11

Not the answer you're looking for? Browse other questions tagged or ask your own question.