Status of Python agent

This is a summary of limitations and known issues for New Relic's Python agent. For changes that have been made between different versions of the Python agent, see the Python agent release notes.

Agent limitations

  • Capacity analysis - The capacity analysis reporting enabled with the introduction of version 1.5.0.103 of the Python agent, only works for traditional single threaded or multithreaded applications. Although the capacity analysis report will be displayed, it will not show any data, except for agent restarts, when coroutine based systems such as gevent or eventlet are being used in conjunction with a WSGI server.

    This limitation exists because the current model by which the load of a system is calculated doesn't translate to coroutine based systems. We are looking at alternative ways of doing this for coroutine based systems, but haven't settled on a suitable alternative as yet.

    Even where using a traditional single or multithreaded application, the metric information generated by the agent to support the capacity analysis page, is only captured and reported if the optional C extension component of the Python agent can be compiled and installed. If the target system doesn't support compilation of C extensions shipped with a Python package, then the required metrics will not be recorded or reported and the capacity analysis page will be empty except for details about agent restarts.

    This limitation exists because a pure Python version of the code used to track the thread utilisation metric used as a basis for the capacity analysis metrics, would require heavy use of thread locking and we are not satisfied at this point that it would not cause a performance impact on the process. We are investigating other methods for working out this metric that have an acceptable level of overhead when using a pure Python implementation.

    Also be aware that the capacity analysis report will also only be available once all application instances reporting have been upgraded to version 1.5.0.103 or later of the agent. If you have a mix of old and new agent versions, the capacity analysis page will not be available.

  • Thread profiling - The thread profiling mechanism was introduced with version 1.7.0.31. Although it can capture details for greenlets when a coroutine based system is being used, such as occurs when gevent or eventlet modes of gunicorn are being used, it does have one limitation.

    The issue with coroutine based systems is that the existing threading module is monkey patched so that if creating a new thread, it will actually create a greenlet instead. For the thread profiler, this means the thread profiler background thread is actually a greenlet.

    The problem now is that a greenlet will only get a chance to run when other greenlets explicitly yield control, such as when they block. This means that when the thread sampler does get to run, it will only ever sample the stack for other greenlets at a point where they are blocked. It will not sample them when they are executing arbitrary code. It can even completely miss execution within a greenlet if it never blocked or otherwise yielded to another greenlet.

    This means that where a lot of time is being spent in pure Python code and it isn't blocking, that will not be picked up.

    A solution to this problem is to ensure that the thread profiler background thread is run as a real thread and not a greenlet. We are investigating this possibility, but it may not be straight forward as when the threading module is monkey patched, some coroutine libraries will completely hide the original thread implementation and we therefore cannot easily still make use of it.

    Because current results are misleading when coroutines are being used, from version 1.11.0.X of the agent, the thread profiler no longer shows any information for requests when coroutines are being used.

  • Celery and coroutines - The agent provides builtin support for monitoring Celery applications, with tasks being recorded as background tasks. This currently however only works properly when using processes to implement concurrency. If either the 'eventlet' or 'gevent' concurrency pool mechanisms are used, use of the agent appears in some cases to prevent the consumption of tasks in the Celery task queues. This issue is still being investigated and no known workaround exists.

Third party bugs

  • uWSGI compliance - There is a notable issue when using the Python agent and uWSGI. New Relic recommends upgrading to uWSGI version 1.2.6 or later.

    The problem is most pronounced with versions of uWSGI prior to 1.0.4. The problem derives from a bug in uWSGI and the way it implements the WSGI specification. Specifically, uWSGI is not calling 'close()' on the iterable returned from a WSGI application.

    This means the web transaction record is not closed off properly and is reused on the subsequent request for that thread. This can result in data for multiple requests being merged together with the overall web transaction having an excessively large response time.

    The uWSGI author did attempt to fix this issue in uWSGI and delivered that fix in uWSGI 1.0.5 and 1.1. It was then determined that the fix didn't address all cases where this problem can occur. Specifically, the 'close()' method was still not being called when a Python exception is raised when yielding a value from an iterable. This was then supposed to have been fixed in uWSGI 1.2.5, but even after that we were still seeing problems. A final case where 'close()' was not being called was then found where the client connection is closed before uWSGI tries to consume the iterable returned from the WSGI application.

    At this time, New Relic believes that the problem is fixed as of uWSGI version 1.2.6.

    Do note that this bug in uWSGI wouldn't normally affect most WSGI applications and their handling of requests, it is only affecting the data reported into New Relic.

  • Python wsgiref module - The 'wsgiref' module in the Python standard library, which provides a reference implementation of a WSGI server, has a WSGI compliance bug whereby it does not call 'close()' on the iterable returned from the WSGI application when the client connection is lost when streaming response content.

    This means the web transaction record is not closed off properly and is reused on the subsequent request for that thread. This can result in data for multiple requests being merged together with the overall web transaction having an excessively large response time.

    This issue in the Python standard library 'wsgiref' implementation is believed to be fixed in Python 2.7.4.

  • Django development server - The Django development server, has a WSGI compliance bug whereby it does not call 'close()' on the iterable returned from the WSGI application when the client connection is lost when streaming response content.

    This means the web transaction record is not closed off properly and is reused on the subsequent request for that thread. This can result in data for multiple requests being merged together with the overall web transaction having an excessively large response time.

    For Django versions prior to Django 1.4, this was because the bug existed in the bundled WSGI server that Django used. From Django 1.4 onwards it is because it relies on the 'wsgiref' module in the Python standard library, which carries the same bug.

    If using Django 1.4 onwards, so long as you are using Python 2.7.4 or newer, you will benefit from the fix in that version of Python. For older versions of Python, a change is included in Django 1.5.1 to workaround the bug in the Python standard library 'wsgiref' module.

  • Gunicorn daemon mode - When using daemon mode of gunicorn, the agent can fail to start up and report data. This is due to bugs in gunicorn related to how it closes out file descriptors when daemonizing. Hopefully the issue in gunicorn will be addressed in the first version of gunicorn to follow on after gunicorn version 0.17.2.

  • Gunicorn gevent mode - When using gevent mode of gunicorn and the 'newrelic-admin run-program' command is used to wrap the invocation of gunicorn, the hosted web application can fail in strange ways. One way this is manifesting is with requests blocking for a period of 1 minute.

    The cause of the problem in this case specifically relates to the order in which module imports are occuring. The monkey patching performed by gevent is not working properly for the case where the Python threading module is imported before the gevent monkey patching routine is run.

    This issue with gevent has now been fixed in gevent 0.13.7 and the Python agent appears to work fine now. Version 1.5.0.103 of the Python agent also includes its own work around to the gevent problem to allow older versions of gevent to be used.

Web frameworks

  • Twisted.WEB - We have been working on some experimental support for Twisted.Web. If you are particularly keen to explore what is available then contact us.

Hosting mechanisms

  • Meinheld WSGI server - The Meinheld WSGI server is not supported as it doesn't provide monkey patching for the Python standard library, so as to allow a coroutine based application to work together with code using existing threading modules from Python standard library. This prevents the agent from working correctly.

  • Google App Engine - The restrictions that exist under Google App Engine, such as those on background threads, as well as how processes are managed, mean that the Python agent as it is implemented at present will not work as is. There are no plans at this point to support Google App Engine.

Python implementations

  • Jython - We have confirmed that the agent test performed by 'newrelic-admin validate-config' does work under Jython but that is the extent of any testing we have done. CPU amd memory metrics is disabled and will not produce any metrics.

  • IronPython - No testing has been performed under IronPython. We suspect that the agent will not work.

Operating systems

  • Windows - No testing has been performed on Windows with the Python agent. It is possible that it will run, but it is suspected it may give wrong results as certain parts of the code still need to be customised for the Windows platform.

For more help

If you need additional help, get support at support.newrelic.com.