Digital Edition

SYS-CON.TV
The Perils of Copy-Paste Coding
Three approaches that help make it work

Copy-paste coding is a kind of misguided code reuse. You have a problem to solve and you see a similar problem and its solution in your existing body of code. So you copy and paste the solution, and make the necessary modifications so that the solution matches your current problem.

Here's an example to make this more concrete: you've written a system that allows departments in your organization to analyze their productivity; each department has its own ideas about what it wants, so each has its own domain logic. The sales department wants to be able to export data from the system into a planning tool. After a few months in production, the personnel department spies this particular feature and says they would like it as well. No problem, says your boss; we already have that functionality in the system so we can make it available to you in the next maintenance release. Your boss then arrives at your desk one morning asking you to make this modification, assuming it won't take long because you already have the code.

The problem is that functionality, which is intended to be specific to one business domain, is not usually written generically. The ideal solution would be to rewrite the code to export to a planning tool as a generic component that can be configured for multiple business domains. However, doing it properly means you can't make the deadline that your boss has promised to the customer. Your only option is to take a copy of the existing code, say as a new class, and then modify that copy for the personnel department's specific requirements. This results in two different classes that share a similar functional structure, but whose details vary according to the business domain.

What's the Problem with Copy-Paste Coding?
It's a bit ugly, but is there any real problem with copy-paste coding? Yes! The list is long, but I'll concentrate on the two major issues. Each time you copy something, you are adding something to your maintenance burden. In the above example, if the planning tool was upgraded, in the worst case scenario (which in my experience is the normal case!) each copy of the exporting code would need to be modified. If the organization has standards concerning test coverage, these code repetitions would require individual testing. These are all very practical concerns for a developer.

There is also an issue of principle: tailoring the export code for each business domain leads to tight coupling between the export code and the business domain. Changes to the business domain could have unexpected repercussions for the exporting feature.

What Can You Do About It?
Are we stuck with multiple copies of very similar code? Is the only alternative a full-blown refactoring of the code into a generic component? No! There is a middle road that ensures that only one version of the repeated code exists. With a decent refactoring tool such as Eclipse's JDT or IntelliJ's IDEA, it's easy to refactor the existing code into a maintainable, loosely coupled version in a relatively short time. The approaches I'll describe have these nice properties without having the flexibility that a more generic component might have. In my experience, there are three overall solutions; in practice a combination of these solutions is often needed.

All three solutions rely on analyzing corresponding methods. I often find that the easiest way to do this is to print out the relevant methods and then look at them side by side. This way it's easier to see which lines are common to both methods and which lines are domain specific; I usually draw boxes around the domain-specific lines.

Solution 1: Helper Classes
If the two methods contain similar or identical nondomain-specific code, they can be moved to a helper class. Continuing the earlier example, two different departments in an organization export data to the same planning tool, based on their own information. The personnel department's export is implemented using the class PersonnelInfo Exporter shown in Listing 1. (Listings 1–6 can be downloaded from www.sys-con.com/java/sourcec.cfm.) The sales department's export is implemented using SalesInfoExporter, shown in Listing 2. The details of the planning tool are not really important to this article.

Looking at these two classes, the similarity between the two export methods is quite striking. The structure for both is identical: initialize the tool, populate the tool with data, and terminate the tool. For PersonnelInfo Exporter the three tasks are, respectively, on lines 18–22, 23–64, and 65–67; SalesInfoExporter has these tasks on lines 16–20, 21–51, and 52–53.

This structure suggests that the first and third tasks have little to do with the actual domain-specific data being exported, so we should be able to extract methods for initializing and terminating the tool. A refactoring tool should make this straightforward. In this case, the application of the refactorings Extract Local Variable, Extract Method, and Inline Temp leads to two methods:

  public PlanningTool 
    initializeTool(String title, 
                   int numColumns) {
    PlanningTool planningTool = 
      PlanningTool.openConnection();
    planningTool.createChart(title,
                             numColumns);
    return planningTool;
  }

  public void terminateTool(
    PlanningTool planningTool, 
    String footer) {
    planningTool.setFooter(footer);
    planningTool.closeConnection();
  }

Since neither of these methods is specific to a domain or uses instance variables, they can be moved as static methods to a helper class, in this case ExportHelper, as shown in Listing 3. The new SalesInfoExporter is also shown in this listing; similar changes would be made to PersonnelInfoExporter. These methods are static as it makes their state-independence explicit.

Solution 2: Inheritance
If two methods contain several similar portions of domain-specific code, I use the Template Method pattern by applying the Form Template Method refactoring. The first stage in this refactoring is to extract the domain-specific parts of the method into separate methods. For instance, if we look at the SalesInfoExporter in Listing 3 we can see that its algorithm is as follows:

1.  Initialize the tool with the title and number of columns (lines 24–26)
2.  Fetch data for the period (lines 27–28)
3.  For each year in the period:

  • Create an empty list (line 34)
  • For each column, add the data item for the column to the list (lines 35–53)
  • Add the data series consisting of this list and year to the tool (lines 54–56)

    4.  Terminate the tool with the footer text (lines 58–59) Those parts of the algorithm that are domain specific are underlined and will be refactored as methods using the Extract Method refactoring. For example, I'll create a method getTitle():

    protected String getTitle() {
    return "Sales Statistics for Period ";
    }

    The corresponding part of the export method is:

    PlanningTool planningTool =
    ExporterHelper.initializeTool(
    getTitle() + startYear + "-" + endYear,
    getNumberOfColumns());

    When doing this type of refactoring, I try to use method names that indicate the role performed by these lines in the original methods. Using this approach, it's easy to extract the methods getTitle(), getNumberOfColumns(), and getFooterText().

    However, this still leaves the body of the loop unresolved; what can we do about getting the data item for the current column? One approach would be to create a method called getDataItem (Object obj, int column), which takes the object currently iterated over and generates the corresponding data item for the column. This would work, but in my experience working with Object instances and then upcasting to the class we are interested in indicates that the design lacks something. In this case it's most natural to create an interface representing an object that can be exported:

    public interface IExportableData {
    public int getColumnCount(int column);
    public String getTitle(int column);
    public Color getColor(int column);
    }

    I can then create a PlanningTool.Data-Item object from an IExportableData object. There is now a design issue: Does the export of data fall under the responsibility of SalesInfo and PersonnelInfo, respectively? This depends on the specific applications; if it does fall under their responsibility, it's appropriate that the corresponding classes implement IExportableData directly. Otherwise it's more appropriate to create implementation classes of IExportableData (e.g., ExportableSalesInfo and Exportable PersonnelInfo), which, respectively, delegate to SalesInfo and Personnel Info. In this example I've chosen the former solution; for example, the new SalesInfo class can be seen in Listing 4.

    With this interface in place, the last piece of domain-specific functionality is the retrieval of the list of data for the given domain. We create a hook for this that the implementations use to call the corresponding controller. For example, for SalesInfo:

    protected List getDataForPeriod(
    int startYear, int endYear) {
    return new SalesController().
    getSalesInfo(startYear, endYear);
    }

    After this refactoring, exportSales Info() contains no domain-specific code. The approach now is as follows:

    1.  Create an abstract superclass (Abstract Exporter); domain-specific subclasses (SalesInfoExporter and PersonnelInfo- Exporter) should extend it.
    2.  In the abstract superclass, create abstract methods for each hook (get Title(), getColumnCount(), getDataFor- Period(), and getFooterText()).
    3.  Move the method (exportSalesInfo()) from the subclass to the superclass, possibly renaming to remove domain-specific connotations.

    That's it!

    AbstractExporter is shown in Listing 5 with the refactored SalesInfoExporter class.

    Note that sometimes the renaming part in stage 3 might not be possible (for example, if a particular interface is to be implemented or preserved). In this case the method should just call the superclass method, as shown below:

    public void exportSalesInfo(
    int startYear, int endYear){
    exportInfo(startYear, endYear);
    }

    This refactored design is a major improvement over the original one: there is now very little repetition, the separation of responsibilities is much clearer, and the code performing the export and the data to be exported are now loosely coupled.

    One practical issue that often arises is that it isn't always possible to choose the superclass of a class; for example, if the class has to extend a specific superclass to fit into a particular framework. In this case we choose solution 3.

    Solution 3: Delegation
    This solution is very much a variation of solution 2 but instead of using inheritance, delegation is used. There are four steps.

    First, an interface defining the hooks described in the previous section (lines 36–41 in Listing 5) should be created:

    public interface IExporter {
    public String getFooterText();
    public List
    getDataForPeriod(int startYear,
    int endYear);
    public String getTitle();
    public int getNumberOfColumns();
    }

    Next, the classes SalesInfoExporter and PersonnelInfoExporter should implement this interface; the implementations from solution 2 (e.g., lines 46–59 in Listing 4) should be made public. The third step is that the abstract superclass from solution 2 should be concrete and have an IExporter instance variable. The hooks that were previously in this class should be removed, and calls to them replaced by calls to the corresponding methods on the IExporter instance variable. This is shown in Listing 6.

    Finally, the export methods in SalesInfoExporter and PersonnelInfo- Exporter should create an instance of AbstractExporter and call its exportInfo() method. For example:

    public void exportPersonnelInfo(
    int startYear, int endYear){
    new AbstractExporter(this).
    exportInfo(startYear, endYear);
    }

    SalesInfoExporter and PersonnelInfoExporter could be further decoupled from AbstractExporter using an Inversion of Control pattern (see the list of references for more details). The idea behind solution 3 is exactly the same as solution 2. However, their structures differ according to whether inheritance or delegation is more suitable.

    Refactoring
    To quote Martin Fowler, "Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code, yet improves its internal structure." In practice, this means taking something that works and improving its design so that it's easier to maintain, extend, debug, and so on. In this article I refer to a number of standard refactorings, details of which can be found at the refactoring Web site in the references list.

    Template Method Pattern
    The Template Method pattern is one of the original Gang of Four design patterns (Design Patterns: Elements of Reusable Object-Oriented Software by Erich Gamma, et al). This pattern captures common functionality in an abstract superclass, and domain-specific functionality is located in concrete methods in the subclass known as hooks. These hooks are defined as abstract methods in the superclass and can therefore be referred to from there. See the references at the end of the article for more information about this very useful pattern.

    Closing Remarks
    The techniques described in this article are practical ones that I have found useful on a number of projects. Copy-paste programming is easy and helps to provide something that works, fast. However, as soon as the first copy-paste phase is complete, it's important that the techniques described in this article are applied before the problem with repetition gets out of hand; the sooner the problem is addressed, the easier, quicker, and cheaper it is to resolve.

    References

  • Refactoring: www.refactoring.com
  • Template Method Design Pattern: www.javacamp.org/designPattern/template.html
  • The IDEA Tool: www.intellij.com/docs/IDEA_30_Overview.pdf
  • Eclipse's JDT Tool: www.eclipse.org
  • Refactoring in Eclipse: www-106.ibm.com/developerworks/opensource/library/os-ecref/
  • Hammant, P. "Inversion of Control Rocks." Java Developer's Journal, Vol. 8, issue 12.
  • About Paul Mukherjee
    Paul Mukherjee works as a consultant for Systematic Software Engineering, and is a Sun Certified Java Programmer and Sun Certified Java Developer. In his role as a consultant he is used to helping to make projects successful but also tries to help the individual members of the project to be better at what they do.

    In order to post a comment you need to be registered and logged in.

    Register | Sign-in

    Reader Feedback: Page 1 of 1

    The article is fine, but my problem is with the distracting, ad animations on the site. There where 2 large animated ads that were very distracting. One dividing the artical and the other along the left side of the page. In fact I couldn''t even read it while on the site. I had to copy the article into word so I could read it in peace.




    ADS BY GOOGLE
    Subscribe to the World's Most Powerful Newsletters

    ADS BY GOOGLE

    ChatOps is an emerging topic that has led to the wide availability of integrations between group cha...
    As DevOps methodologies expand their reach across the enterprise, organizations face the daunting ch...
    As Marc Andreessen says software is eating the world. Everything is rapidly moving toward being soft...
    You know you need the cloud, but you’re hesitant to simply dump everything at Amazon since you know ...
    Is advanced scheduling in Kubernetes achievable?Yes, however, how do you properly accommodate every ...
    The cloud era has reached the stage where it is no longer a question of whether a company should mig...
    The need for greater agility and scalability necessitated the digital transformation in the form of ...
    In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an over...
    Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection...
    In his session at 21st Cloud Expo, Raju Shreewastava, founder of Big Data Trunk, provided a fun and ...
    While some developers care passionately about how data centers and clouds are architected, for most,...
    "Since we launched LinuxONE we learned a lot from our customers. More than anything what they respon...
    DevOps is under attack because developers don’t want to mess with infrastructure. They will happily ...
    "As we've gone out into the public cloud we've seen that over time we may have lost a few things - w...
    In his session at 21st Cloud Expo, Michael Burley, a Senior Business Development Executive in IT Ser...
    Sanjeev Sharma Joins June 5-7, 2018 @DevOpsSummit at @Cloud Expo New York Faculty. Sanjeev Sharma is...
    We are given a desktop platform with Java 8 or Java 9 installed and seek to find a way to deploy hig...
    "I focus on what we are calling CAST Highlight, which is our SaaS application portfolio analysis too...
    "Cloud4U builds software services that help people build DevOps platforms for cloud-based software a...
    The question before companies today is not whether to become intelligent, it’s a question of how and...