de Finibus Bonorum et Malorum



Reclamações recentes

Moving on with the Debian RC Tracker Bot

The Debian RC Tracker, my little bot that tracks the number of RC Bugs in the Debian Testing distribution, has been up and running since August. While it has been running mostly without a problem (there have been a few minor issues), there were a few things about it that really bothered me.

The most important one is that it was originally implemented procedurally; while I feel most at home in that paradigm. But as I learned more about Python, it feels more appropriate to re-implement it with a Object-Oriented approach. Ever since I first implemented it during the Buster freeze I wanted to migrate it into a class.

Being a bot is no easy task

One other issue was with where it was deployed. PythonAnywhere has been nothing if not useful; but it has a lot of limitations on tasks. It only allows one task to be run once a day in the free tier, and most importantly, it enforces a 30-day expiration time on it. That made me consider ways of circumventing this limitation, by making the bot itself manage its scheduling (which would be totally possible but would probably consume all the CPU time I had available). And although it was an annoyance, I was OK with logging in every month to refresh the task and keep it from expiring.

And so it was that I was on the lookout for alternatives. Then, a few weeks ago, I was exploring something in salsa (I can’t remember what, for the life of me) and I saw an option for “deploy in Heroku” (it was an example of something, if I recall). The important part is that I became aware of the existence of Heroku, which for some reason had eluded me until then.

It has the advantage of offering more resources and options (specifically, a Scheduler that offers more options); using it for deployment is a much more complicated issue than it is with PythonAnywhere, though. It requires the usage of its own set of tools (which are not available on the Debian Archives, by the way), and has a bit of a learning curve, too. I’m not going to cover that here, though; that’s far beyond the scope of this post.

Long story short, I decided to migrate the tracking bot to Heroku. That would require a lot of testing, though, until I could be sure that it would run without problems. The current version was already running and doing its job, so why touch that until I was sure it would be safe otherwise?

Rewriting the app

As I said in the beginning of this journey, the process of creating this app was, at its heart, a learning experience. I wanted to learn more about the Debian release cycle and about programming with Python. As far as the former is concerned, I believe I have reached my objective. For the other one, it is still a work in progress.

But it has been more important than ever, to be honest. I’m in the process of entering University again, to get a CS degree; so it’s more than a safe bet to say that programming expertise is important. I have also started doing a few freelancing jobs to complement our home’s budget, and knowing how to properly build and deploy an app is one of the cornerstones of that activity.

Anyway, here I was learning how Python apps work. It was something I had dabbled with before, but that effort had to be cut short. An important step was finally moving on from my original attitude towards programming: to stop trying to do everything “from scratch” every single time. That’s how my first formal training in programming (learning C) went more than twenty years ago, so it has stuck with me since then. The fact that using frameworks and packages is a HUGE time saver had something to do with that, too.

Divide and Conquer

The new version has two parts. The first one is the DebianTracker class, which implements all the functionality of the original app, but is more reliable and allows for future expansion. The second part is the functional part that uses the class for the tasks involved in the bot’s operation.

The Class

So anyway, here we are. There were several issues with the original code that I wasn’t happy with, so I decided to completely rewrite the app. Although I have adapted some of the original code, most of it is completely fresh. Of course, it’s far from being a “complete” class or from using all the capabilities it could be using. But it’s progress anyway.

One of the most important upgrades from the original version is how data is stored. Originally, data were stored in a text file. That approach is problematic for a million reasons. The fact that Heroku deploys apps in a ephemeral filesystem gave me the last nudge towards doing a complete database implementation. At first I wanted to use SQLite for it, because it’s really simple. But SQLite also relies on the filesystem, so I went straight on to PostgreSQL. So now instead of having hacks all around the code to deal with text files I just have a couple of functions methods that do the heavy lifting (saving and retrieving data).

Plus, since this is still a learning experience, I decided to use SQLAlchemy for it. It’s an extra layer of abstraction but it’s a better approach since it is… an extra level of abstraction. It’s weird, but it’s true. I’m sure you will understand. So the class implements different methods for all of the required tasks. That includes fetching bug data from UDD, processing them, saving them and retrieving them from the database.

DebianTracker also has methods for plotting the bug data it gets from the database; most of the code has been rewritten, although it’s heavily derived from the original, using numpy and pyplot. Now it has means of dealing with extra options, though, like start and end dates for plots, but special options. The code to deal with dates has also been improved (and that’s a good thing; using dates in plots is always a headache). Here, again, I ran into the problem of dealing with Heroku’s ephemeral filesystem. Since I can’t rely on saving the files for later use I resorted to a unusual approach. After generating the images I save them directly to the database. Then I retrieve them when necessary.

Plot generated by the bot (taken directly form the API, so it should be up-to-date)

The one thing that still needs some work is the twitter code. It creates the tweet (that we’re now used to seeing) and posts it with an image with the day’s plot. It still relies on tweepy, but uses deprecated code and I still need to learn more about it to properly use it. But it also has received some changes, so hopefully we’re not going to see those negative numbers anymore.

The App and the Bot

Now, the App. Or Bot. It’s actually composed of a few parts.

The first part is a collection of scripts (which I consider as the “bot” part). They use the class to run the routine tasks of the App: fetching bug information from UDD, saving it to the database and sending out the tweets. They run on Heroku’s native scheduler and with some fancy time checking I have bug information being updated every 6 hours (you can note the difference in the plot), with tweets going out daily and a slightly earlier time than before (14:00 UTC).

The other part of the App uses Flask to offer a web interface. I whipped up a simple views.py file that has a couple of routes that are connected to functions in the class. That way it’s possible to check on the plots whenever one wants, instead of waiting for the next tweet. It’s running with gunicorn, and receiving requests at http://debiantracker.herokuapp.com. And since I was at it, the endpoint /image serves the images directly. /image/bugs serves the overall plot, and /image/freeze serves the plot since the beginning of the freeze.

What Next?

As the freeze moves on (hard freeze begins in a week or so), there are a few more things I want to look into. Originally, the bot included a function fitted to the bug numbers. I might look into it once again, but it will need a few tweaks. Back then I only plotted the totals, instead of separate curves for different bug severity. I’m not sure if plotting different functions to each would be a good idea. Curve fitting takes up quite some CPU time, and doing it for three different curves might use up all the time I have available in Heroku’s free tier. Unfortunately, upgrading to a paid tier is not an option right now.

I’m always open for suggestions. If there’s anything you’d like to see this bot do, let me know.


Comentários

Leave a Reply

Your email address will not be published. Required fields are marked *