BME205

Perl Style Guide
Fall 2005 v1.0

D. Bernick

(Last Update: 20:22 PDT 28 September 2005 )

  1. All scripts must use "strict" and "warnings".
    ########################################################################
    use strict;
    use warnings;
    
    I have found that all scripts I have written are possible in this environment, and it eliminates many many errors.

  2. All scripts include a program header record something like
    ########################################################################
    # File:gene2seq.pl
    #	executable: gene2seq.pl
    # Purpose: reads a human list of locus links, translates NM_, then optionally
    # builds the human upstream sequences into a fasta file
    # 
    #   stderr: errors and status
    #   stdout: selected translated sequence ids or upstream sequences
    #          
    # Author: David Bernick
    # History:      dlb 5/18/2005 Created
    #  references: external sources of any algorithms or code fragments.             
    ########################################################################
    

  3. Every subroutine includes a prototype, like
    ########################################################################
    # subroutines 
    #
    ########################################################################
    sub process();
    sub writeMatches($\%);
    sub do_command($);
    sub min($$);
    sub max($$);
    sub isNumeric($);
    

  4. Global parameters, command line options or core variables are defined in the first section, something like
    ########################################################################
    # Command line options and globals
    #
    ########################################################################
    my $upstream = 0;
    ########################################################################
    # parameters
    #
    ########################################################################
    my $human2mouseData = "$ENV{'HOME'}/Desktop/orthologs/data/human2mouse.tab";
    
  5. All subroutines include block comments, something like
    ########################################################################
    # subroutine: initialize
    #
    # inputs: none
    # return: none
    #
    # purpose: installs all command line options, inits the glorp translator, .....
    # uses: global data modifiers if any
    ########################################################################
    
  6. I have a preference for a standard main routine that looks like
    ########################################################################
    # Main
    #
    ########################################################################
    initialize();   
    process();
    finalize();
    exit;
    
  7. Comments within subroutines need to be clear on the semantics of the operation.
  8. Variables need to be named so that humans can understand.
  9. Indenting is essential.
  10. Documentation should be part of the code body, not separate.

Comments added by Kevin Karplus

I agree with David's style suggestions above. Some of them could be expanded on. For example, a good indent to use is 4 spaces per level, as a full tab (8 spaces) moves you over too fast, and 2 spaces does not allow my eye to easily track where blocks end.

For more ideas about what I like to see in program documentation, see the assignment on documenting programs from my technical writing course.

I recommend the use of POD for documentation of program usage (especially since pod2usage can make the perl program self-documenting), but I'm not as fond of it for internal documentation, as POD seems to be a bit more difficult to maintain than simple comments, so tends not to be properly maintained.