Archive for December, 2008

Psycho x64, GIL

Posted in Uncategorized with tags , on December 19, 2008 by Bartosz Radaczyński

So why exactly is psycho not available for x64 architecture? Because it was meant to be incorporated into the pypy project…. Strange, since:

  1. pypy is twice as slow as cpyton is
  2. the main goal of the pypy project is not to provide a faster python implementation

Can anyone explain this to me? I’d really like to see some of the psycho improvements on my 64 bit machine.

Another thing – why the GIL? Is it on purpose to make some market room for jython with the native threads? Is there some other motivation behind that? Is it to force people to use processes (as the post and pythons 2.6 multiple processes model suggests). How are they better than threads? For me it seems that they do add more overhead of forking than threads ever do (I guess that is why they call them lightweigh processes). Is there some explanation for this phenomenon?

Things that you should learn at college

Posted in Uncategorized on December 15, 2008 by Bartosz Radaczyński

Ok, so this one goes out to all the sorry asses (like me) that were really clueless at the time of their college education not to take some of the appropriate education courses and feel like they miss some key points in their workshop. You can go ahead and think that once you have a nice job and all that it is not necessary anymore to learn new stuff, but I believe that equipping yourself with appropriate tools of the trade is really going to get you far. Anyhow, whatever language you are coding in there are several things that may come in handy one time or another. Of course the essential skills for a software developer are perhaps not among these, but still:

  1. Compilers. You really need to get this one – is will at least let you realize, that parsing stuff with regexes is not the best choice for many reasons. You do not have to invent your own language, mind you, just get the grasp of how the parsers work and perhaps implement just the parser (not the compiler). It’ll help you solve some problems that seem to be unrelated at the first glance
  2. Data structures and algorithms. This one is essential as well, since you should at least have a clue about the underlying mechanisms of hash tables, lists, trees and similar data structures. Also, some performance issues that you occasionaly run into come from these things.
  3. OO. OK, this may seem trivial – essentially every cs curriculum includes object orientation, doesn’t it? Yeah, but modelling things in terms of objects is not as easy as it seems (at least for some of the programmers that I have met on my way…)
  4. C course – well, this should have been a number 1 thing. Joel is right – some of the people just do not have the part of the brain that understands pointers. And if you can’t get that, you CANNOT be a good programmer (despite what Jeff Atwood says – he would probably get it anyway). This is not really a necessary skill now, since the modern languages abstract these kinds of things away from you, but it is a good test of your aptitude towards being a decent developer (ability to think on several levels of abstraction at the same time, ability to switch context etc). People that do not get pointers end up as managers 🙂
  5. Datatbases – no need to elaborate on that one.
  6. Shell scripting.
  7. Some busisess course, marketing and related issues. How many geeks have you met that cannot communicate with the outside world? This should also be a necessary skill for a developer. Autistic kids are really hard to work with… And this especially important in large organisations. You actually have to sell either yourself, your team or your work. It usually helps to comprehend what the developers call “the politics” and which it really is, but after learnign something about that stuff, you can at least somewhat understand why the “business people” do what they do.
  8. Typing – is this something that you can learn at college? Or perhaps in your spare time in college… That’s it! College time is when I really had time for this stuff. And sadly now I miss on typing a bit, but still trying to catch up with my skills (as with the rest of the list).

Do you have any thoughts on this?

Parsing and stuff

Posted in Uncategorized with tags , , on December 1, 2008 by Bartosz Radaczyński

So, for the past two weeks or so I’ve been trying to get this small python thingy up and running. But (as they always do) this “small thingy” suddenly turned into piles and piles of code. I guess that this is what they mean, when they make out rules like “when a programmer gives you an estimate, add 1 and take the unit immediately larger than the one given” (so that two weeks means actually three months).

So, anyway, this little project started off as a code analyser for cobol programs + db2 sql. It was meant to provide some sort of data flow analysis. I all seemed pretty straightforward and the idea of making that kind of analysis on 300 programs blew my mind, so I figured that an automated tool would do a much better job than I even would doing it by hand. I sort of wandered around to see what choices are there to make your own custom parsers in python. As it turns out there are at least two good ones there. The first one is called pyparsing. This is the one I started off with. But after carefully converting the COBOL grammar from EBNF to pyparsing model it turned out that parsing just one program took like forever to complete. On the other hand it turned out to work pretty well on the sql, but after a while I decided to throw that out and reimplement it… I know, worst idea ever, but still, I was not much of a time-constrained, so I could afford that. And mind you, I’ve thrown away roughly three days of work, so not much harm was done there.

On the second take to the parsing issue I thought that actually being able to write the grammar as EBNF, since these are really much more readable than the pyparsing representation and they are also easier to change. After all, we cs guys are used to math-like symbols… So, with the application od simpleparse things really took off now. It took me about 10 workdays to get the cobol grammar to parse, some 2 more days to add the db2 sql (maybe not complete, but good enough for the programs here). So, anyway, the main thing was, that simpleparse is really a simple parser thingy. It does not support maximum length/most successful match but the first match only. This is crucial to defining grammar, you’ve gotta make the grammar list the longest expression first. The main problem was in the relational conditions, which in COBOL make the form of

IF ABC=1 OR 2 OR 3  OR DEF <= 123 AND WS-SOME-VAR IS NOT GREATER THAN ‘ABC’

now this is really strange to parse, especially the abbreviated condition (ABC=1 OR 2 OR 3, which actually means ABC=1 OR ABC=2 OR ABC=3). But you can get by somehow – at least I did. Anyway, the performance increase is dramatical. On my dual-core laptop the pyparsing stuff took two days to parse a simple program! With simpleparse it takes several seconds… Well naturally this is due to the parser’s implementation being way simpler (first match cuts the further comparisons), but if you’re carefull enough this thingy is capable of doing soooooooo much!

So in the end I guess that Steve Yegge was rigth when telling to learn that stuff about compilers. It definitely pays off to be aware that it is easier to make a parser that use regexes… Or at least it seems so 😉