Term Project
All students are expected to perform a term project for the class, which may
be done either individually, or in a small group (larger groups have larger
expectations). The project involves a significant investigation into some aspect
of software evolution. The goal of the project is to permit a more in-depth
exploration of software evolution than is possible from just the in-class disucssions.
The output from the project is a written report, appx. 7-20 pages in length
(whatever length is appropriate for adequately describing the project), written in the
form of a research paper.
Potential analysis ideas include (but are by no means limited to):
- Explore correlations with bug production. It is possible to automatically
determine which changes in a project have introduced bugs, or fix bugs. An
improved bug prediction model would be beneficial for telling engineers where to
focus their development efforts. As well, it would be useful to have data across multiple
showing correlations between various factors and bug production. In particular, people
are frequently fascinated by the idea of finding the more error-prone engineers
working on a project.
- Explore whether if statement complexity is correlated with bug introduction.
Are longer if statements correlated with more errors? What about the total complexity of
an if ... else chain? What about switch ... case statements? Preliminary investigation
of this subject in summer 2006 indicates that length of if statement is generally
not correlated with error rates. It would be interesting to establish this
fact definitively.
- Characterize language feature use over time. It may seem surprising,
but we do not currently have any data concerning the relative frequency of use
of various programming language keywords/features across multiple large software
projects. Are method calls more common than if statements? Or is it the other way
around? How do these frequencies of use change over time? Do projects tend to add
more logic over time (i.e., more if's), or do they add more method calls over time?
The goal of this project is to pick a programming language and analyze the
relative frequency of programming language keyword/feature use over time. A compaison
across multiple languages and projects would be even better.
- Procedure/method signature evolution. Analyze a software project
to determine how the signatures of procedures/methods have changed over time,
and characterize the kinds of changes that have been made. While this work has
been done for C language programs, it has not been done for Java, or for
other languages. Questions to answer include: how often are variables renamed?
How often are variable types modified? How often are variables added/removed?
How do the results from one language compare to those of other languages?
- Origin mapping analysis. A difficult problem in software evolution
analysis is maintaining the identity of a procedure across a procedure/method
rename. Ideally, even if a procedure has been renamed, you would like the
analysis to recognize that the method/procedure under the new and old names is the
same. Making the problem more complex, methods/procedures can be renamed and modified
in the same SCM transaction, and hence a simple source code text comparison
doesn't fully address the problem. In object-oriented languages, method names can
be overloaded. To date, the best techniques for entity
mapping achieve around 90% correct mappings automatically, with C and Java
having been investigated so far. Developing better origin mapping, or origin mapping
for different languages (Perl, PHP, etc.) would be interesting.
- Design pattern evolution. Software design patterns are structural
idioms used in the development of software systems. Many design patterns have
qualities that should allow them to accommodate a wide variety of change over
time. However, to date there has not been any analysis of how design patterns
actually do evolve over time. This project involves analyzing software design
patterns over time, and characterizing how they change. Automatically
analyzing design patterns is a significant research challenge, and so
a better approach might be to find other patterns that are extractable
(like Micropatterns) and examine their evolution.
- Evolution of UML diagrams. UML diagrams are now commonly used to
represent the design of software systems, and have been used long enough that
it may be possible to observe the effects of long-term change on these diagrams
(and hence on the software system it describes). The goal of this project
is to develop techniques to observe and characterize change in UML diagrams
over time. Some have investigated the automatic extraction of UML diagrams,
and then examined their evolution. It would be interesting to apply
this approach to new languages, and compare how they evolve over time.
- Evolution of software features. Recent work by Robillard and others has
focused on finding the correlation between user-visible features and the
code that most directly implements them. It would be very interesting to
extract a set of features from a large software project (there is tool
support, in the form of an Eclipse plugin to help with this), and then explore
how each feature has changed over time, and compare how they have evolved.
This provides a more feature-oriented
view of how a project evolves, as opposed to the commit-oriented analysis that
is most common today.
- Visualization of evolution. Good visualizations of the evolution
of software projects are an open area of research. The goal of this project
is to develop a novel visualization of the evolution of software over time.
While there have been many evolution visualization approaches over time,
this is still an active area of research.
Project work will have three deliverables. You will need to decide on a project
topic and project partners early in the quarter. A rough draft of your project
report will be due later in the quarter, with the final report due the final
week. Consult the syllabus for exact due dates.
Last modified: 1/3/2007