Saturday, November 8, 2008

What the bleep should we know!?

You can work for me if you can answer one simple question.

Prominent software engineer and professor Dr. David Parnas was once asked what he thought was the most often overlooked risk in software engineering. His answer was "incompetent programmers.... One bad programmer can easily create two new jobs a year. Hiring more bad programmers will just increase our perceived need for them. If we had more good programmers, and could easily identify them, we would need fewer, not more."

That was harsh. As usual, though, he is correct.

At this point in my life I think I must have interviewed over 300 candidates for various positions in software development. I have hired a dozen co-op students, and made the hiring decision on countless permanent positions. I have also fired three people. I have only had employees at Backstage since 2005, so why all this experience in hiring and firing? It is simply because somewhere early on in my career it became known that I was good at the difficult job of the technical screening of applicants. Because of this skill, my employers started asking me to be intimately involved in the process of assessing the talent of potential employees.

The technical screening interview consists of a twenty minute conversation where I attempt to verify that the applicant is competent, and that his or her resume is in fact truthful. Depending on the position we are hiring for, I have managed to boil the interview down to one simple question. Answer it, and you can work for me. It is rather disheartening how many computer scientists are unable to answer it correctly.

Before I reveal the magic question, it is worth discussing the history of the technical screening interview as something distinct from the normal HR-style interview. Back when I was a co-op student in the mid-1980s, the interview training we received emphasized those tricky personality questions, like what are you most proud of, or what do you think is your biggest flaw? And rightly so, as most of the interviews I had while applying for co-op jobs emphasized those kinds of questions, and the majority of the interviewers were HR professionals rather than programmers. The lack of technical questions may have had something to do with the stringent requirements for entry to the optional co-op program, which have since been relaxed over the years due to industry’s voracious appetite for more and more software engineers in the time leading up to the dot-com bubble, and the trend towards making co-op a mandatory component of many engineering programmes.

In my day, the university was effectively doing the technical screening on behalf of the employers by requiring that all co-op students have a B+ average, with a minimum of a B- in any computer science or mathematics course, and a curriculum that included multivariate calculus and algebra. As times changed, the requirements diluted and more and more people entered the industry due to the inflated salaries and potential overnight millions through stock option plans. The spectrum of capabilities of so-called software engineers started to widen dramatically. Thus materialized the need to screen applicants on their basic programming acumen.

Great Moments in Technical Screening Interviews

Whether my perceived dilution of the degree requirements is to blame, or my standards are simply too high, here are some hi-lights from a recent round of interviews that I conducted. We start with the magic question.
Me: What is the average lookup time, in big-Oh notation, for an object in a hashtable?

Applicant: log-n?

Me: No, that's for a binary search tree. How about for a linked list, or an array?

Applicant: n-log-n??

Me: That's for sorting, with a recursive divide and conquer algorithm like merge sort. Try again, without guessing.

Applicant: I didn't really like that class on complexity theory.

Me: Okay, let me put it another way. Give me an example of a problem you solved in Java by using a Hashtable, and tell me why it was the appropriate choice of data structure?

Applicant: I don't really like using the Hashtable. Vector is much better.
For the interested reader, it is O(1) or constant time for a hashtable, assuming you have more buckets than objects, and that your hash function distributes evenly. For a linked list, it is O(n), since you can expect to search through half of the list on average.

Why is this important? Clearly, at the very least, a software engineer should know how to choose the correct data structure and algorithms to implement a software solution. Data structures and algorithms are the basic building materials of our profession; choose incorrectly, and the foundation of the application is compromised as the hardware will have to work unnecessarily hard while running your code. In the same way that we assume a civil engineer knows when to use bricks and mortar versus steel reinforced concrete, our clients will assume that a software engineer knows which data structure to use if it is important, for example, that the data be efficiently retrieved in sorted order.

It is for this reason that I hire based on academic performance, using the transcript as an initial high-pass filter. At Backstage, my CTO and COO are both former students of mine that were at the top of their class. For the same reason that I did not entrust my eye surgery to a dentist, I also try not to hire self-taught programmers or software engineers with a degree in some other discipline. The problem with being self-taught is that you don’t know what you don’t know. Self-taught programmers tend to naively underestimate the complexity of things like transaction control, thread safety, and memory management. This is compounded by the Java programming language, which encourages the use of threads, and hides the details of memory management.

The Problem with Java

I firmly believe that we learn from our mistakes. This is especially true when learning to program. However, Java is designed to prevent common programmer mistakes, particularly those dealing with memory allocation, pointer de-referencing, and array indexing. Therefore, you can’t learn anything programming in Java. Not that I dislike Java – far from it. I believe that it encapsulates much of the great ideas we have developed in software engineering. I am simply suggesting that if it's the only imperative language you know, you should pick up a book on C and become friends with malloc and free.

Let me reiterate.
  • We learn from our mistakes.
  • Java is designed to prevent programmer mistakes.
  • Therefore, you can't learn anything programming in Java.
Case in point...
Me: In Java, under what condition might you expect to see a runtime StackOverflowException?

Applicant: When you push too many things onto a Stack, which is a collection class.
This answer represents a significant lack of knowledge regarding the von Neumann architecture and the machine that actually runs the code. Each process (i.e., an executing program) is allocated a runtime stack used to pass parameters when function/method calls are made. It is a stack because the arguments are pushed on before the call, and popped off upon return. With that reminder (hopefully you took a course in assembler programming, compiler construction, and/or programming languages that covered this topic), it should be obvious that the correct answer is a recursive method without a base case.
Applicant: I never did like recursion.
What do you mean you never did like recursion? My favourite all-time computer science professor, Dr. Bill Wadge (inventor of, among other things, the Lucid programming language and Wadge degrees), once told his class that being comfortable with recursion and maths is what separates the humans from the apes when it comes to software expertise. Today I would also add a working knowledge of the von Neumann machine, memory management, and thread synchronization

Further compounding matters is the open source movement and its over-engineered general purpose solutions to such problems as object persistence in relational databases. I know to say this leaves me open to becoming anathema to the zealots, but I have this sense that open source is killing software development by inadvertently dumbing down our programmers. Programming has been left to a few high priests while for the rest of us it has been reduced to editing XML configuration files until the desired behaviour is observed, but without any real understanding of how or why. Software development using open source tools has largely become the behavioural study of other people's programs. We are in danger of training a generation of software engineers who are merely configuration engineers, who high-five each other if they can get the “Hello, world” JSP to display in their browser, but can’t debug their way out of a class cast exception if their continued employment depended on it.

Remember, you are supposed to be able to get programs working, so don't act like it’s never happened before if/when you get your program debugged.

END NOTES

The title of this entry is motivated by the controversial film What the Bleep Do We Know!?

The David Parnas quote is taken from an interview with Nancy Eickelmann for Software Engineering Notes.

A good book on C programming is by Dr. Nigel Horspool, another one of my former influential professors at the University of Victoria. Any text that can teach you to avoid writing a function that returns a pointer to a local array will suffice.

0 comments: