Software Design Review

by Philip Greenspun and Andrew Grumet, October 2009


In the spring of 2009, a friend who runs an ecommerce Web site asked one of the authors (Philip) for help explaining why his application was running so slowly. He had paid an MIT-trained programmer with 20 years of experience $200,000 to build it and was paying a hosting service $1100 per month to run it. Despite the site having only one user every hour or so, being in a soft-launched state, pages took up to 5 minutes to load. Philip said “This will be easy. Just show me the design documentation.” He replied “What do you mean?” Philip said he wanted the document where the programmer set forth what problem he was trying to solve, how large a data set was being handled, what each server did, what software had been selected and why, where the files and programs were on the server, and what the data model was.

“There isn’t any documentation,” replied the business guy who had created the idea and written the checks to the programmer. Queries to the programmer revealed that he was almost as ignorant of the answers to the preceding questions as his boss. He knew that he was using Ruby on Rails and MySQL, but not how many gigabytes of data were required to produce all of the public pages of the site. Philip eventually was able to get some good information from the hosting service’s sysadmins, e.g., the size of the MySQL database and the amount of RAM on each virtual “slice” being used to run the HTTP servers and the RDBMS. By chucking the virtualized model and buying the cheapest Dell pizza box server with 16 GB of RAM (about $500 worth at the time), the amount of time required to produce a page fell from 5-10 minutes to no more than a few seconds. Hosting costs were reduced from $1100 per month to less than $100. However, our friend was not able to recover the months of customers who had been lost due to the poor performance of the service.

What would have saved this business? An external design review.

The Fundamental Problem: Business People Aren’t Technical

It is almost impossible for business people to manage technical people. Because the business people have no authority to challenge technical decisions and because there are no published standards for how software development is to be done, the programmers can almost always snow the business people with convoluted stories about why something has to be done a certain way.

Adding to the challenge is America’s corporate self-esteem culture. The average programmer does terrible work, producing bug-ridden code with non-existent documentation. However, it is outside the realm of acceptable discourse for a manager to say “this is terrible work.”

Best Practices from the Most Successful Software Companies

How do the most successful software companies handle these problems? Many are run by technical people, who cannot be snowed. Bill Gates of Microsoft is an obvious example (and the company has stumbled ever since the accession to the throne of the less technical Steve Ballmer). Sergey Brin and Larry Page of Google provide another. Both Microsoft and Google have cultures of code review in which programmers are required to present designs to others within the organization. The most successful software companies tend to have a fairly blunt corporate culture, in which it is common for harsh criticism to be delivered (see this 2008 newspaper article about Bill Gates and Microsoft).

External Design Reviewers

What if your company doesn’t have a technical management team like Microsoft’s? Or if your company doesn’t have an unbiased group of great software engineers working on a separate project? Or if your company culture doesn’t allow for straight criticism?

Bring in an outsider.

Even if you can’t attract excellent technical people to work all year ever year on your boring IT systems, you can probably find an excellent software developer to come in for a few half-day review sessions. The outsider won’t have any bias or preconceived notions about particular divisions of your company. The outsider won’t have to worry about hurting anyone’s feelings by saying “You need to do X, Y, and Z.”

The design review process outlined here is described in terms of the development of a multi-user application program, such as a Web-based service for a group of collaborating employees or for a public Web site. These services are typically backed by a relational database management system such as Oracle or SQL Server. However, the process should be useful for any other kind of computer application where there are decision-makers, programmers, and end-users.

The process outlined here is based on the experience of the authors with more than 300 database-backed Internet application programs and roughly 60 years of experience as computer programmers.

Review Stages

In an ideal world, here are the project stages at which you’d bring in an external design reviewer:

  • Scope and tool selection, to answer the questions “What problem is being solved?”, “Could it be solved by tweaking some existing software rather than writing new software?”, “Are the tools selected the best choice?”
  • Page flow and data model (for a standard Internet application) or user interface and data structures (for a traditional application)
  • Post-Prototype
  • Pre-Launch
  • Post-Launch/Maintenance

At every stage, the developers should prepare for a meeting with the design reviewer by writing draft documentation. The design reviewer should submit questions raised by the documentation prior to the meeting, giving the developers time to revise the documentation. The actual meeting should be a working session in which the documentation is modified in real time, possibly with some sections marked for further research.

Let’s go through the stages to see what questions should be answered by documentation.

Scope and Tool Selection

Most software need never have been written at all. Companies will spend tens of thousands of dollars on custom development of a Web application, never having asked “Perhaps we should just use a standard free and open-source Weblog toolkit with our own style sheet and four custom pages.” (See “Weblog as Website for the Small Organization”). The design document produced at this point should answer the following questions:

  • What is useful to end-users and the organization about the proposed software?
  • If they didn’t have this proposed software, how would they solve their problem?
  • What are the classes of users?
  • What kinds of devices can be used to access the application, e.g., can it be used from a mobile phone?
  • How are specifications to be communicated to the development team? How will the development team’s plans be reviewable by decision-makers?
  • What existing software comes closest to doing the job and why isn’t it adequate?
  • How much data will need to be stored persistently, i.e., on a hard drive, and what kinds of queries need to be run against those data?
  • How much data will need to be processed rapidly, i.e., stored in fast memory, and what kinds of queries need to be run against those data?
  • Does any computationally-intensive processing need to be performed?
  • What software development tools, computer languages, and database management systems are being selected and why?
  • What is the schedule and method of updating the schedule?

The document should contain, as attachments, a few user profile pages showing typical expected users of the software and what they will be doing with it. See the Planning chapter of Software Engineering for Internet Applications, Exercise 1b, for more on how to build user profile pages.

“How are specifications to be communicated to the development team?” is an important question. There is nothing more wasteful than a group of skilled programmers building the wrong thing.

The data size and computational intensity questions are important for figuring out what kinds of servers will be appropriate to host the application.

Within the “software development tools” section there should be at least one paragraph on version control. Is a standard system such as subversion or git going to be used? How can a programmer restore code as it existed at a previous point in time? Is the repository stored on a separate computer or hard drive so that it may be able to function as a backup copy? Can a big change be isolated from the current production line through branching?

Page flow and data model/User interface and data structures

The data model stage is where a tremendous amount of complexity can be engineered out of a system. For a standard Internet application, every SQL table typically requires the construction of five or more Web pages for the end-user (browse, search, view, add, edit, delete) and another five or so for the administrators (more or less the same functions, but with broader access). That’s 10 computer programs, each of which may need to be debugged and maintained. Many very experienced C and Java developers are barely competent in SQL and data modeling. It is common for an expert SQL programmer to be able to reduce a 20-table data model down to 5 or 10 tables.

Conversely, an expert SQL developer can often tell whether or not the data model stores insufficient information to fulfill the requirements. As a simple example, suppose that an electronic medical record data model has first and last names are stored in one column. A SQL developer can glance at the table definition and observe that it won’t be possible to produce a list of patients sorted by last name.

Page flow more or less determines the complexity of the application for end-users. If it requires 15 steps to accomplish a task, that will be slower and require more training than if it takes 5 steps. For a consumer-facing Internet site, a sufficiently complex page flow will almost guarantee commercial failure. If you can’t make money unless every user has an IQ over 130 and is extremely motivated to learn a complex interface, well, you can’t make money.

For a non-Web application, the equivalent items to review are the user interface and the data structures in memory and on disk.

In all cases, at this point a draft development standards document should be available to review. This lays out simple questions such as file, URL, and variable naming conventions. It also addresses planned documentation for modules and procedures. The development standards include how configuration variables are named and added. Finally user input data validation and security are addressed.

This is also the stage at which procedures for internal code review should be documented. The external design review process described here is not a substitute for continuous internal reviews. At Google, for example, every check-in to the version control system must be reviewed by at least one other programmer. This sounds cumbersome. What if the change is to fix a typo in a comment? It gets reviewed! But somehow Google has managed to prosper and this blog entry explains how the process is supported. We’re not suggesting that Google’s process is right for every project, but there should be some documented internal code review process.

Post-Prototype

At this stage, a skeletal version of the application is up and running and some testing has been done with potential users. The design review should be looking at the following questions:

  • What has been learned from user testing and what additional tests should be performed?
  • Are the development standards working or should they be modified based on the coding experience?
  • Is the quality assurance strategy adequate?
  • What aspects of site usage and performance are going to be tracked? With what tools?

The usage and performance tracking plan is important because the people who paid for the application are going to ask “How many people are using this? Why aren’t there more? Where are people giving up? How long are pages taking to load?”

Pre-Launch

At this point the software is installed on the production servers and the organization is a week or two away from “throwing the big switch”. The documentation at this point should be good enough that if all of the programmers who worked on the application were hit by a bus, a replacement team could step in and keep the application running.

A critical set of documentation to review at this point concerns the hosting of the application. Where are the servers? If colocated, how does one get physical access to them? What is the network layout? Firewall configuration? What is each server named and what is its IP address? What software does each server run and in what directories is that software located? What hard drives are in each server and what does each drive do? What single disk drive failures will bring down the application? (The answer to this should be none!) What single machine failures will bring down the application? (Oftentimes the RDBMS server failing will bring down the application and this is more acceptable than the cost of redundant RDBMS servers.)

If using an RDBMS it becomes critical to document the RDBMS server configuration. A small RDBMS server might have 10-20 hard disk drives. Why so many? Consider a single update to a table with two indices. This requires writes to the table, index1, index2, and the transaction log file, i.e., to four separate files. If those four separate files are on four separate hard drives the four writes can be processed in parallel and database updates can proceed approximately four times as fast compared to keeping everything on one hard drive. The four drives will need to be mirrored so that the failure of a single drive does not result in data loss or application downtime. Now we have 8 physical disk drives on the server. You wouldn’t want the operating system’s day-to-day demands interfering with those of the database nor would you want the OS crashing down in the event of a hard drive failure. So we add two more disks in a mirrored pair to support the OS. Our minimum size server now has 10 physical disks. The design choices for the RDBMS server have huge implications for performance, reliability, recoverability, and maintainability. They need to be documented partly so that they can be reviewed but mostly so that the system can be maintained.

A release plan should describe how minor changes and new full releases are pushed to production. How are changes to procedural code and SQL data models to be coordinated? What are the names of the development and staging servers? What steps must be taken and who has to sign off before what is on staging can go to production? What is the procedure for backing out from a new release if things aren’t going well?

Quality assurance and performance testing procedures and results should be reviewed at this point. Given that a lot of modifications are likely to be made shortly after launch, it is important that a testing plan is in place to make sure that new bugs aren’t introduced when old bugs are fixed and when new features are added.

Conformance to the development standards should be evaluated at this point. Are file and variable names consistent? Are modules, procedures, and data models documented sufficiently and according to the standards?

Post-Launch/Maintenance

At this point the external reviewer should perform an audit to make sure that the hosting documentation is consistent with any new servers that might have been added. This is also the time to review the data recovery (programmer drops a table by mistake) and disaster recovery (server room is destroyed by fire) plans.

A sweep through all of the earlier documents should be made to ensure consistency with the final product. Remember that a new person coming onto the team should be able to go back to the documents produced during the Scope and Tool Selection review and figure out why custom software was built instead of adapting an existing open-source tool.

Finally, the development team should put together a writeup document that, on one Web page, explains what the application does and why it is useful, complete with screen shots so that the reader need not actually be sitting in front of the running application. See the Writeup chapter of Software Engineering for Internet Applications for examples.

Conclusion

Programmers will not keep themselves honest. If left to their own devices, they will skimp on anything that is necessary but not fun. This includes planning, documentation, and testing. Only a review by an unbiased external reviewer can give a non-technical management the ammunition it needs to get programmers to behave like engineers.

The cost of this process should be minimal. All of the documents that are required for the design review are documents that should be produced in any competently executed software development effort. The cumulative number of hours required for an external expert to conduct all five reviews suggested in this document should be roughly 100. With software experts available at anywhere from $100 to $300 per hour, that’s $10,000 to $30,000 in costs to guard against the following horrifyingly expensive situations:

  • custom software developed to solve problem where open-source solution already exists
  • wrong tools selected
  • inappropriately small, large, or complex servers selected
  • extra costs incurred due to overly complex data model
  • lost customers or user training time due to needlessly complex interface
  • exposure to catastrophic loss in the event of a system failure, attack by crackers, or data center destruction
  • insufficient documentation available for long-term maintenance and therefore enterprise is at the mercy of original developers

More

About the Authors

Philip Greenspun has spent more than a decade nagging industrial programmers and students to document their design decisions (resume).

Andrew Grumet is the Vice President of Engineering at Mevio, a Kleiner Perkins-funded Internet media company. Grumet has a Ph.D. in Electrical Engineering and Computer Science from M.I.T. (resume).

We are grateful to and have incorporated some thoughtful comments from Arthur Gleckler, a senior engineer at Google, and John Patrick Morgan, a recent graduate of Olin College of Engineering.

Related Posts

 

Article by Tuan

I am the administrator of Tek3D Weblog, which was created in December 2008. I write about anything related to technology and science. Wordpress blogging tips and technology news are my favourite topics. Subscribe to the RSS or Twitter to receive my blog's latest updates.

Tuan has written 488 awesome articles for us at Tek3D Weblog



dofollow No Responses

Leave a Reply

CommentLuv Enabled