My frustration with NodeJS specifically and JavaScript in general is just through the roof at this point.

The churn of frameworks and libraries in NodeJS is getting completely out of hand. I have a few feeds (Reddit, etc) and it is almost exclusively some new Rails-a-like packaging of Express, a new build system meant to replace `make` and shell scripts, or more MV* frameworks (each with varying amounts of ES6 and weird features).

I just can’t take it. I think it’s finally overwhelmed by ability to give any fucks. Being involved in the NodeJS or JavaScript “community at large” is a guarantee of drinking from the firehose of this nonsense, and I am just done. I can’t be arsed to care much any more.

My heart is wandering back to Python more and more. Staid, ugly, boring Python. Synchronous, the global interpreter lock, give it to me. The rate of change is far too slow, but Python can speed up. The Python community can change.

I don’t believe JS can be reasonable at this point. They’re too invested in not caring about sustainability, “the long term”, whatever.

Maybe I’ll change my mind later. I still like JavaScript for lots of things: it’s optimal for so many tasks. But I feel like, writing big programs in JavaScript at this point is just adding to the noise.

(Totally unrelated, when did WordPress start to act like Tumblr? I mean, they have the same posting category thing now.)

MongoDB and FUD

The Problem:

Our data set consists of gigabytes upon gigabytes of pickled Python dictionaries, CSV files, and plain text, with the odd bit of Excel or Word. I have three goals:

  • Maintain this monstrosity
  • Create a searchable index
  • Build a new version for the future.

The entire app is a single monolithic Python app: there is no such thing as a “front end” or a “back end” or “middleware”. It’s a web app but there’s no templates; it generates HTML via print statements. The same Python file may include standalone logic,or  shared logic to be used by other components. It’s a bit of a mess. Lastly the framework it uses is basically abandonware; I haven’t tried to see if it runs under any Python after 2.4, and you can be sure it won’t work under 3.

My first task was the search problem. I started with Whoosh but after about a year, it started to run into performance problems, and I’d also learned enough about information retrieval that I wanted some more features. The Whoosh guy is awesome and he’s done a hell of a thing, though; I cannot recommend it enough for smaller projects, but I needed more. I’d attended a talk at Pycon about Elasticsearch, so I switched to that, and it’s been awesome. 

My strategy was pretty simple: a cron job to regenerate the world. Since Elasticsearch is really, really fast, it took perhaps 30 minutes to reindex the entire data set, and since it’s not a 24/7 use case running it at night is no big deal. (I’d like to provide real-time search but my users rarely need it; they’re content to have today’s new data appear tomorrow)

This worked so well for 2 reasons. First, I’d learned enough about the “common data set” that I could make the custom indexer pretty easy to work with since I knew enough about my users search needs that I could ignore 99.9% of the data. And second, Python dictionaries map really well to JSON, which Elasticsearch uses as its input and output.

In building the regenerate-the-world scripts, I had written a huge amount of code to 1)walk the entire flat-file “database” and 2)make lots and lots of sense of it all. I did stuff like, “ensure that every disparate part of the app always refers to a Project by the faux-primary-key ‘projectid’ instead of ‘pj’ and ‘projid’ and whatever else”. My indexer did a pretty decent job of cleaning up this semi-schemaless data; so now what?

Since our app uses CouchDB, it was my first choice, and very quickly abandoned. I loathe CouchDB. It makes a lot of sense in our app, but not for a general-purpose data store. 

Up next was “any ol’ RDBMS”, which means MySQL. Attempts to hammer the semi-schemaless data into relational format resulted in a data model so complex and byzantine, it was practically recursive. Instead of 3rd normal form I made a wormhole into a hell-dimension. So, no.

Despondent and generally upset, I tried MongoDB. And it worked! Experiments worked really well! 

  • As I said, Python dictionaries map very well to JSON/BSON so the amount of friction in import/export was minimal.
  • Ad-hoc queries
  • easy blob storage for stuff like Word documents
  • It’s fast (importing the world took perhaps 20 minutes)
  • It’s easy to set up (compile and go, basically)
  • Support for every language and platform I could think of
  • Has some replication capability in case I ever need it

I wasn’t really sure about a couple things, mainly backup-and-restore, but that was really my only concern, and the Mongo docs on the topic seemed straightforward enough; my users can tolerate an hour of downtime.

And now, the point of my little story: I think Mongo DB is picked on more than just about any platform save PHP. There is so much fear, uncertainty, and doubt spread about it, it’s started to leak into my world and freak me out.

Consider the most recent thing, the “randomly log stuff” bit in the Java driver. Places like /r/shittyprogramming were all over it with digital brickbats. Every thread was then a free-for-all of “here’s now MongoDB screwed me over/Here’s why MongoDB sucks” stories from all over the internets.

Panic set in. This data is mission-critical; while my users can tolerate small amounts of downtime and don’t need OTP-type features, it’s still mission-critical data. Have I fucked up royally here? Have I set myself up for epic fail? Or am I just giving in to the sort of FUD that pervades every goddamn internet discussion about any sort of technology? Let’s face it: people pile on and rarely are they anywhere nearly as awesome as they think they are. 

At this point I’m not entirely sure what to do. My thought was to return to the cold comfort of MySQL, using a Friendfeed-style schemaless system. It’s a huge orthogonal step but I’ve recovered horribly fucked MySQL databases after three-too-many bottles of Tequila, so it’s safe and well-understood. It puts the impetus on me to write the entire friggin’ access layer, but whatever. I know about Postgres and JSON, but I don’t know Postgres at all.

Am I giving in to FUD? Do I stay the course, trusting that my proven, real-world positives outweigh potential negatives?

The Mandatory “My First Year With” Post

So, about a year ago, I got a new job and dove into Python.

Prior to this, you could count on 1 hand the LOC I’d written in Python. I “knew” exactly 3 things about Python:

  • It had namespaces, because every Python person brings that up immediately in any language argument on Reddit or Hacker News
  • Significant Whitespace, aka The Whitespace Abomination, which is why I’d generally avoided it like the plague in the first place
  • Python advocates seemed like a dour, hyper-conservative bunch (technologically speaking)

I cut my teeth on the TIMTOWTDI world of Perl and then spent years in the fever swamps of PHP and Ruby; correctness and conservatism were not in my DNA.

I started learning by doing the Python Koans, which I enjoyed immensely, and then pretty much diving right into real work. 

So where am I today? Glad you asked.

First, regarding the items above:

  1. Namespaces are great and all but I simply cannot figure out why they are the first thing people start crowing about in language arguments. Maybe my time in the namespace-less world of PHP made me sensitive to the criticism (FWIW, I argued for namespaces in PHP and was pretty vocal about the use of \ as separator as a horrible idea)
  2. Significant whitespace is absolutely awful for about a week and a half, mildly annoying for about a month, and then fades into nothing at all after that. It’s really the worst thing in the world for all of 5-7 days, but really, you get over it. I’ve love insight into the coding habits of people who can’t manage to get their head (and fingers) around it.
  3. Python advocates are a dour, hyper-conservative bunch (technologically speaking).

The last point is not intended to be flame-y; I’m just re-spinning the Steve Yegge bit here. It is, nonetheless, how the Python world looks to me. There’s a noticeable lack of fun or whimsy at Planet Python. You will be hard-pressed to find a more dry Planet aggregator. (Fun is obviously not a requirement of professional software development, but compare with the general happy vibe at, say, Planet Mozilla.)

In terms of technology, oh, you know, whatever. PHP is a complete crapload, but it’s far easier to crank out web apps with PHP than Python. The amount of setup boilerplate (among other fiddly bits) is frustrating; but that’s how a generally poor language like PHP managed to work its way into the web stack. It simply is of the web, whereas Python requires a lot of work to bridge that gap.

(The counterpoint is that Python is simply a better language, in so many ways, that once I really have all the rough edges sanded down I’ll stop caring. This is probably true. It’s going to take a lot of work in 2013 to sand those edges down, both in my internal “IT” apps and our products, but I’m starting to see a coherent picture of how to do it.)

I’m still not very good at Python. I forget which namespace things live in (os? os.path? ffffuuuuuu); I forget that you have to ask if a key exists in a dict, instead of just ‘if key equals value’; I totally suck at structuring large packages; I don’t understand metaclasses. But it’s only been a year, so I’m positive about losing the remnants of my other-P-lang habits, and generally improving.

In the land of snakes

So I’m getting settled in to my new job. There’s a ton of things to get used to: a company with an extensive history (20-something years old); a large number of people (over 40); a product+service niche that’s pretty rarefied, replete with jargon and terminology; and a technology stack that has not very much to do with what I’ve spent the last decade doing.

The biggest change of that last bit is the switch to Python.

Python is fucking weird.

Yeah, there’s The Whitespace Abomination. The reality is, you get over it. It’s a HUGE deal at first; just awful, terrible, crap. It’s just the dumbest idea anyone could ever have, there’s no sanity to those arguments you see about readability, it’s some magical bullshit thinking. Lisp’s paren soup is NOTHING compared to this bullshit!

It just sort of goes away after a while. It’s still bothersome and weird but I’ve got my editor configured properly now, so generally it’s not really a problem. In fact I tend to end statements with a semicolon more often than making some indentation-related boo-boo!

Python is often called “easy”, and I think it lives up to that description. That said, I think idiomatic Python isn’t easy. Python wants to make it such that there’s one obvious way to do it; but is that one way actually obvious?

In the Python I’ve seen lately, they tend to do the Ruby thing of, “module M contains class C and also a grab-bag of functions”. I’ve spent rather a lot of time in the past few years doing what amounted to Java Lite: large applications are structured by interfaces, abstract classes, and implementations. So, the idea of “here is a class and also some functions” is just … odd. I sort-of get how it works with Python’s namespace/module system, and I’m sure it’ll make more sense as time goes on, but as a switcher it’s positively strange.

Also, tools like virtualenv and the more complex elements of the Python ecosystem aren’t immediately obvious. Coming from a background in PHP or Perl, Python can actually be pretty intimidating if you’re not working in an established platform like Django: what do I do about MySQL? What do I do about web server integration? What do I do about etc etc etc? Scoff all you want, but the fact is hitting http://www.php.net/manual/en/funcref.php is mostly all you need to get rolling on PHP projects, and of course CPAN remains one of the gold standards for code repos. (In fact, CPAN might be too large at this point.)

Anyway. I’ve written a couple of non-trivial Python scripts now, and I’m trying to get more into designing things to be “Pythonic”. We’ll see. I can’t say I’m happy with Python yet, but it’s still pretty new (it’s been all of a month) and so I’m at that strange place in the learning curve.