2 September 2015

Mongo and Flask Performance Tuning: quick and easy

At Vunify (which is basically the TV Guide of the future) we are a python shop using Flask and Mongo to run our site. Recently as we were finishing up rolling out a new UI (aren’t those always fun to do?) our CEO noted that the main page was taking a while to load for logged in users. That page has a lot going on in terms of graphical elements on the screen and with its logic on the backend. Finding the cause of the slowdowns wouldn’t be a simple as doing a diff of the new version with the old.

Step one: profile and measure

Before attempting any kind of optimization, you must make measurements. Without baseline measurements in the beginning, you will never really know if or how you improved the code. I decided to start with using Chrome’s developer tools to measure the time it took to download the page. [caption id=”attachment_468” align=”aligncenter” width=”584”] The Chrome developer console. The call to “nick.vunify.com” is the initial call to the server. See how long it takes? Is it mongo or flask that is slow?[/caption] Chrome’s dev tools confirmed that a major time sink was the initial call to the server. Once that call completed it was up to the browser to download images and javascript. Since we have more control over the performance of the server, it made sense to look there next. The back end code consists of a series of calls to mongo to get our user’s data, some per-user logic to process some of that data, then sending it to the jinja template for rendering. This gist can give you a rough overview of what the code looks like: The jinja template for this view had become quite complex in this latest release. To help with code reuse we have several jinja macros that we use through out the page. The fear was that one of these macros was either inefficent, or otherwise buggy. Flask does have a very nice debugging tool bar that can do some profiling, but it couldn’t look into the templating code to see where the hotspots where. At this point it made sense to look at the rest of the back end code since we have better debugging tools there. I opted to use print/logging statements in the view code ti measure the time it took for each section of the code to execute. My reasoning for this was while on my local dev machine I can hook up any debugger that I want, I also needed to measure the performance in other environments. Simple print statements would allow me to quick measure and compare performance in all of our environments. I moved the render_template() call up and assigned it to a variable in order to try and measure the time it took for the template to render. At a minimum, this would allow me to compare the time spent rendering with everything else. The timing results were very interesting. As expected, each environment saw different number relative to each other, but they were always in proportion. This is good, it shows that the code is executing consistently across environments (e.g. it runs faster on better equipment, slower on worse equipment). The numbers looked very similar to these:

Step 2: One change at a time

The variation on the calls to mongo was not really a surprise, some of those call pull back a lot of information, others are very small. It can also vary per-user, so I tried to setup my test user with the “best case scenario” so that improvements there would also (hopefully) carry through to other more data-heavy users. It was clear that a lot of time was being spent in the render_template() function, but from what I could find online, there wasn’t a lot of guidance on how to improve performance there. Most suggestions revolved around caching the compiled templates. While that is great advice, these templates seem to be already cached as following the instructions lead to almost no change in render time. Putting a limit() on some of the bigger mongo calls did help, but it also impacted some of the business needs: Limiting the data returned limited the number of options on the screen for the user. One solution that was proposed was to do an “infinite scrolling” on the page, and call back for more data from the server when needed (instead of sending it all at once). While infinite scrolling seemed like a great idea, the implementation cost in terms of time (and a little bit of page redesign) would majorly impact our development timeline. In other words, we had too many other things that needed to get done first.

Step 3: Get a 2nd opinion (from a mongo master!)

As luck would have it I was scheduled to have lunch with my friend Rick who is very good at debugging… and also happens to be a Master of Mongo. When I mentioned the template execution time, his first reaction was that the jinja templating engine runs pretty fast and it would be hard to have templating code that was that messed up that it would drag down performance. “What about the mongo data that is being sent to the template?” Rick asked. “Are you sure its all there?” To this point, I was assuming it was all there. With mongo and mongoengine you can do lazy loading. In that view I was assuming that all of the data was being loaded before being sent to the render_template() function. With Rick’s question burning in my mind, I went and looked back over our mongo calls. It turns out my assumption was not quite correct. We had a model or two that actually had references to other documents in mongo. When those models were being processed by the template, the mongoengine code was calling back to the mongo server to retrieve the rest of the data. Even though our mongo is on an SSD with great low-latency connectivity, this extra network activity was adding significant overhead. The mongoengine documents suggested adding a “select_related()” call to queries that had references like this. I selected a few of the bigger mongo calls that we were doing in the view and added this call. I then re-ran the timing tests: The functions that had the select_related() added to them did take a few milliseconds longer to execute… but the time on render_template() dropped by about 1/2! The result was that overall the page was returning from the server faster than it was before. As I pushed this change out to each of our environments, I saw the results held. The lazy loading was the culprit.

Conclusions

After adding the select_related() to a few more queries we saw the time it took to render the page in the browser decrease to a point where the boss was happy. There’s always room for improvement, but for now (or at least until our next big UI redesign) things are good enough. The takeaway lessons from this are:

Take measurements before, during, and after.
The humble print statement sometimes is the best tool
Identify where you hotspots are
Ask yourself: “What could cause these slowdowns?”
Look beyond the initial answers for deeper underlying causes
Talk with someone else about the problem, a fresh perspective can do wonders.

For this situation at Vunify, we had some lazy loading that was going on. This wasn’t obvious at first glance, and only by explaining the problem to someone else was I able to get the clarity to examine the code with a fresh perspective.

tags:

Possibility and Probability

A Python programmer with a personality thinking about space exploration

Mongo and Flask Performance Tuning: quick and easy

Step one: profile and measure

Step 2: One change at a time

Step 3: Get a 2nd opinion (from a mongo master!)

Conclusions