docs/bash_code_style.rst

   1 ..
   2    Copyright (c) 2019 Cisco and/or its affiliates.
   3    Licensed under the Apache License, Version 2.0 (the "License");
   4    you may not use this file except in compliance with the License.
   5    You may obtain a copy of the License at:
   6 ..
   7        http://www.apache.org/licenses/LICENSE-2.0
   8 ..
   9    Unless required by applicable law or agreed to in writing, software
  10    distributed under the License is distributed on an "AS IS" BASIS,
  11    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  12    See the License for the specific language governing permissions and
  13    limitations under the License.
  14
  15
  16 The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
  17 "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
  18 "MAY", and "OPTIONAL" in this document are to be interpreted as
  19 described in `BCP 14 <https://tools.ietf.org/html/bcp14>`_
  20 `[RFC2119] <https://tools.ietf.org/html/rfc2119>`_
  21 `[RFC8174] <https://tools.ietf.org/html/rfc8174>`_
  22 when, and only when, they appear in all capitals, as shown here.
  23
  24 This document SHALL describe guidelines for writing reliable, maintainable,
  25 reusable and readable code for CSIT.
  26
  27 Motivation
  28 ^^^^^^^^^^
  29
  30 TODO: List reasons why we need code style document for Bash.
  31
  32 Proposed style
  33 ^^^^^^^^^^^^^^
  34
  35 File types
  36 ~~~~~~~~~~
  37
  38 Bash files SHOULD NOT be monolithic. Generally, this document
  39 considers two types of bash files:
  40
  41 + Entry script: Assumed to be called by user,
  42   or a script "external" in some way.
  43
  44   + Sources bash libraries and calls functions defined there.
  45
  46 + Library file: To be sourced by entry scipts, possibly also by other libraries.
  47
  48   + Sources other libraries for functions it needs.
  49
  50     + Or relies on a related file already having sourced that.
  51
  52     + Documentation SHALL imply which case it is.
  53
  54   + Defines multiple functions other scripts can call.
  55
  56 Safety
  57 ~~~~~~
  58
  59 + Variable expansions MUST be quoted, to prevent word splitting.
  60
  61   + This includes special "variables" such as "${1}".
  62
  63     + RECOMMENDED even if the value is safe, as in "$?" and "$#".
  64
  65   + It is RECOMMENDED to quote strings in general,
  66     so text editors can syntax-highlight them.
  67
  68     + Even if the string is a numeric value.
  69
  70     + Commands and known options can get their own highlight, no need to quote.
  71
  72       + Example: You do not need to quote every word of
  73         "pip install --upgrade virtualenv".
  74
  75   + Code SHALL NOT quote glob characters you need to expand (obviously).
  76
  77     + OPTIONALLY do not quote adjacent characters (such as dot or fore-slash),
  78       so that syntax highlighting makes them stand out compared to surrounding
  79       ordinary strings.
  80
  81     + Example: cp "logs"/*."log" "."/
  82
  83     + TODO: Consider giving examples both for good and bad usage.
  84
  85   + Command substitution on right hand side of assignment should be safe
  86     without quotes.
  87
  88     + But still, quotes are RECOMMENDED, unless line length is a concern.
  89
  90     + Note that command substitution limits the scope for quotes,
  91       so it is NOT REQUIRED to escape the quotes in deeper levels.
  92
  93     + TODO: Do we prefer backtics, or "dollar round-bracket"?
  94
  95       + Backticks are more readable, especially when there are
  96         round brackets in the surrounding text.
  97
  98       + Backticks do not allow nested command substitution.
  99
 100         + But we might want to avoid nested command substitution anyway?
 101
 102   + Code SHOULD NOT be structured in a way where
 103     word splitting is intended.
 104
 105     + Example: Variable holding string of multiple command lines arguments.
 106
 107     + Solution: Array variable should be used in this case.
 108
 109     + Expansion MUST use quotes then: "${name[@]}".
 110
 111     + Word splitting MAY be used when creating arrays from command substitution.
 112
 113 + Code MUST always check the exit code of commands.
 114
 115   + Traditionally, error code checking is done either by "set -e"
 116     or by appending "|| die" after each command.
 117     The first is unreliable, due to many rules affecting "set -e" behavior
 118     (see <https://mywiki.wooledge.org/BashFAQ/105>), but "|| die"
 119     relies on humans identifying each command, which is also unreliable.
 120     When was the last time you checked error code of "echo" command,
 121     for example?
 122
 123   + As there is no reliable method of error detection, and there are two
 124     largely independent unreliable methods, the best what we can do
 125     is to apply both. So, code SHOULD explicitly
 126     check each command (with "|| die" and similar) AND have "set -e" applied.
 127
 128   + Code MUST explicitly check each command, unless the command is well known,
 129     and considered safe (such as the aforementioned "echo").
 130
 131     + The well known commands MUST still be checked implicitly via "set -e".
 132
 133   + See below for specific "set -e" recommendations.
 134
 135 + Code SHOULD use "readlink -e" (or "-f" if target does not exist yet)
 136   to normalize any path value to absolute path without symlinks.
 137   It helps with debugging and identifies malformed paths.
 138
 139 + Code SHOULD use such normalized paths for sourcing.
 140
 141 + When exiting on a known error, code MUST print a longer, helpful message,
 142   in order for the user to fix their situation if possible.
 143
 144 + When error happens at an unexpected place, it is RECOMMENDED for the message
 145   to be short and generic, instead of speculative.
 146
 147 Bash options
 148 ~~~~~~~~~~~~
 149
 150 + Code MUST apply "-x" to make debugging easier.
 151
 152   + Code MAY temporarily supress such output in order to avoid spam
 153     (e.g. in long busy loops), but it is still NOT RECOMMENDED to do so.
 154
 155 + Code MUST apply "-e" for early error detection.
 156
 157   + But code still SHOULD use "|| die" for most commands,
 158     as "-e" has numerous rules and exceptions.
 159
 160   + Code MAY apply "+e" temporarily for commands which (possibly nonzero)
 161     exit code it interested in.
 162
 163     + Code MUST to store "$?" and call "set -e" immediatelly afterwards.
 164
 165   + Code MAY append "|| true" for benign commands,
 166     when it is clear non-zero exit codes make no difference.
 167
 168 + Code MUST apply "-u" as unset variable is generally a typo, thus an error.
 169
 170   + Code MAY temporarily apply "+u" if a command needs that to pass.
 171
 172     + Virtualenv activation is the only known example so far.
 173
 174 + Code MUST apply "-o pipefail" to make sure "-e" picks errors
 175   inside piped construct.
 176
 177   + Code MAY use "|| true" inside a pipe construct, in the (inprobable) case
 178     when non-zero exit code still results in a meaningful pipe output.
 179
 180 + All together: "set -exuo pipefail".
 181
 182   + Code MUST put that line near start of every file, so we are sure
 183     the options are applied no matter what.
 184
 185     + "Near start" means "before any nontrivial code".
 186
 187     + Basically only copyright and long high-level comments are
 188       RECOMMENDED to appear before.
 189
 190   + Also code MUST put the line near start of function bodies
 191     and subshell invocations.
 192
 193 Functions
 194 ~~~~~~~~~
 195
 196 There are (at least) two possibilities how a code from an external file
 197 can be executed. Either the file contains a code block to execute
 198 on each "source" invocation, or the file just defines functions
 199 which have to be called separately.
 200
 201 This document considers the "function way" to be better,
 202 here are some pros and cons:
 203
 204 + Cons:
 205
 206   + The function way takes more space. Files have more lines,
 207     and the code in function body is one indent deeper.
 208
 209   + It is not easy to create functions for low-level argument manipulation,
 210     as "shift" command in the function code does not affect the caller context.
 211
 212   + Call sites frequently refer to code two times,
 213     when sourcing the definition and when executing the function.
 214
 215   + It is not clear when a library can rely on its relative
 216     to have performed the sourcing already.
 217
 218   + Ideally, each library should detect if it has been sourced already
 219     and return early, which takes even more space.
 220
 221 + Pros:
 222
 223   + Some code blocks are more useful when used as function,
 224     to make call site shorter.
 225
 226     + Examples: Trap functions, "die" function.
 227
 228   + The "import" part and "function" part usually have different side effects,
 229     making the documentation more focused (even if longer overall).
 230
 231   + There is zero risk of argument-less invocation picking arguments
 232     from parent context.
 233
 234     + This safety feature is the main reason for chosing the "function way".
 235
 236     + This allows code blocks to support optional arguments.
 237
 238 + Rules:
 239
 240   + Library files MUST be only "source"d. For example if "tox" calls a script,
 241     it is an entry script.
 242
 243   + Library files (upon sourcing) MUST minimize size effect.
 244
 245     + The only permitted side effects MUST by directly related to:
 246
 247       + Defining functions (without executing them).
 248
 249       + Sourcing sub-library files.
 250
 251   + If a bash script indirectly call another bash script,
 252     it is not a "source" operation, variables are not shared,
 253     so the called script MUST be considered an entry script,
 254     even if it implements logic fitting into a single function.
 255
 256   + Entry scripts SHOULD avoid duplicating any logic.
 257
 258     + Clear duplicated blocks MUST be moved into libraries as functions.
 259
 260     + Blocks with low amount of duplication MAY remain in entry scripts.
 261
 262     + Usual motives for not creating functions are:
 263
 264       + The extracted function would have too much logic for processing
 265         arguments (instead of hardcoding values as in entry script).
 266
 267       + The arguments needed would be too verbose.
 268
 269         + And using "set +x" would take too much vertical space
 270           (when compared to entry script implementation).
 271
 272 Variables
 273 ~~~~~~~~~
 274
 275 This document describes two kinds of variables: called "local" and "global".
 276
 277 TODO: Find better adjectives for the two kinds defined here,
 278 if the usual bash meaning makes reader forget other specifics.
 279 For example, variable with lowercase name and not marked by "local" builtin,
 280 is cosidered "global" from bash point of view, but "local" from this document
 281 point of view.
 282
 283 + Local variables:
 284
 285   + Variable name MUST contain only lower case letters, digits and underscores.
 286
 287   + Code MUST NOT export local variables.
 288
 289   + Code MUST NOT rely on local variables set in different contexts.
 290
 291   + Documentation is NOT REQUIRED.
 292
 293     + Variable name SHOULD be descriptive enough.
 294
 295   + Local variable MUST be initialized before first use.
 296
 297     + Code SHOULD have a comment if a reader might have missed
 298       the initialization.
 299
 300   + TODO: Agree on level of defensiveness (against local values
 301     being influenced by other functions) needed.
 302     Possible alternatives / additions to the "always initialize" rule:
 303
 304     + Unset local variables when leaving the function.
 305
 306     + Explicitly typeset by "local" builtin command.
 307
 308     + Require strict naming convention, e.g. function_name__variable_name.
 309
 310 + Global variables:
 311
 312   + Variable name MUST contain only upper case letters, digits and underscores.
 313
 314   + They SHOULD NOT be exported, unless external commands need them
 315     (e.g. PYTHONPATH).
 316
 317   + TODO: Propose a strict naming convention, or a central document
 318     of all used global variables, to prevent contributors
 319     from causing variable name conflicts.
 320
 321   + Code MUST document if a function (or its inner call)
 322     reads a global variable.
 323
 324   + Code MUST document if a function (or its inner call)
 325     sets or rewrites a global variable.
 326
 327   + If a function "wants to return a value", it SHOULD be implemented
 328     as the function setting (or rewriting) a global variable,
 329     and the call sites reading that variable.
 330
 331   + If a function "wants to accept an argument", it IS RECOMMENDED
 332     to be implemented as the call sites setting or rewriting global variables,
 333     and the function reading that variables.
 334     But see below for direct arguments.
 335
 336 + Code MUST use curly brackets when referencing variables,
 337   e.g. "${my_variable}".
 338
 339   + It makes related constructs (such as ${name:-default}) less surprising.
 340
 341   + It looks more similar to Robot Framework variables (which is good).
 342
 343 Arguments
 344 ~~~~~~~~~
 345
 346 Bash scripts and functions MAY accept arguments, named "${1}", "${2}" and so on.
 347 As a whole available via "$@".
 348 You MAY use "shift" command to consume an argument.
 349
 350 Contexts
 351 ````````
 352
 353 Functions never have access to parent arguments, but they can read and write
 354 variables set or read by parent contexts.
 355
 356 Arguments or variables
 357 ``````````````````````
 358
 359 + Both arguments and global variables MAY act as an input.
 360
 361 + In general, if the caller is likely to supply the value already placed
 362   in a global variable of known name, it is RECOMMENDED
 363   to use that global variable.
 364
 365 + Construct "${NAME:-value}" can be used equally well for arguments,
 366   so default values are possible for both input methods.
 367
 368 + Arguments are positional, so there are restrictions on which input
 369   is optional.
 370
 371 + Functions SHOULD either look at arguments (possibly also
 372   reading global variables to use as defaults), or look at variables only.
 373
 374 + Code MUST NOT rely on "${0}", it SHOULD use "${BASH_SOURCE[0]}" instead
 375   (and apply "readlink -e") to get the current block location.
 376
 377 + For entry scripts, it is RECOMMENDED to use standard parsing capabilities.
 378
 379   + For most Linux distros, "getopt" is RECOMMENDED.
 380
 381 Function size
 382 ~~~~~~~~~~~~~
 383
 384 + In general, code SHOULD follow reasoning similar to how pylint
 385   limits code complexity.
 386
 387 + It is RECOMMENDED to have functions somewhat simpler than Python functions,
 388   as Bash is generally more verbose and less readable.
 389
 390 + If code contains comments in order to partition a block
 391   into sub-blocks, the sub-blocks SHOULD be moved into separate functions.
 392
 393   + Unless the sub-blocks are essentially one-liners,
 394     not readable just because external commands do not have
 395     obvious enough parameters. Use common sense.
 396
 397 Documentation
 398 ~~~~~~~~~~~~~
 399
 400 + The library path and filename is visible from source sites. It SHOULD be
 401   descriptive enough, so reader do not need to look inside to determine
 402   how and why is the sourced file used.
 403
 404   + If code would use several functions with similar names,
 405     it is RECOMMENDED to create a (well-named) sub-library for them.
 406
 407   + Code MAY create deep library trees if needed, it SHOULD store
 408     common path prefixes into global variables to make sourcing easier.
 409
 410   + Contributors, look at other files in the subdirectory. You SHOULD
 411     improve their filenames when adding-removing other filenames.
 412
 413   + Library files SHOULD NOT have executable flag set.
 414
 415   + Library files SHOULD have an extension .sh (or perhaps .bash).
 416
 417   + It is RECOMMENDED for entry scripts to also have executable flag unset
 418     and have .sh extension.
 419
 420 + Each entry script MUST start with a shebang.
 421
 422   + "#!/bin/usr/env bash" is RECOMMENDED.
 423
 424   + Code SHOULD put an empty line after shebang.
 425
 426   + Library files SHOULD NOT contain a shebang, as "source" is the primary
 427     method to include them.
 428
 429 + Following that, there SHOULD be a block of comment lines with copyright.
 430
 431   + It is a boilerplate, but human eyes are good at ignoring it.
 432
 433   + Overhead for git is also negligible.
 434
 435 + Following that, there MUST be "set -exuo pipefail".
 436
 437   + It acts as an anchor for humans to start paying attention.
 438
 439 Then it depends on script type.
 440
 441 Library documentation
 442 `````````````````````
 443
 444 + Following "set -exuo pipefail" SHALL come the "import part" documentation.
 445
 446 + Then SHALL be the import code
 447   ("source" commands and a bare minimum they need).
 448
 449 + Then SHALL be the function definitions, and inside:
 450
 451   + "set -exuo pipefail" again.
 452
 453   + Following that SHALL be the function documentation explaining API contract.
 454     Similar to Robot [Documentation] or Python function-level docstring.
 455
 456     + See below.
 457
 458   + Following that SHALL be varius TODOs, FIXMEs and code itself.
 459
 460     + "Code itself" SHALL include comment lines
 461       explaining any non-obvious logic.
 462
 463     + TODO: Document (in an appropriate place) how TODOs differ from FIXMEs.
 464
 465   + There SHALL be two empty lines before next function definition.
 466
 467 More details on function documentation:
 468
 469 Generally, code SHOULD use comments to explain anything
 470 not obvious from the funtion name.
 471
 472 + Function documentation SHOULD start with short description of function
 473   operation or motivation, but only if not obvious from function name.
 474
 475 + Documentation SHOULD continue with listing any non-obvious side effect:
 476
 477   + Documentation MUST list all read global variables.
 478
 479     + Documentation SHOULD include descriptions of semantics
 480       of global variable values.
 481       It is RECOMMENDED to mention which function is supposed to set them.
 482
 483     + The "include descriptions" part SHOULD apply to other items as well.
 484
 485   + Documentation MUST list all global variables set, unset, reset,
 486     or otherwise updated.
 487
 488   + It is RECOMMENDED to list all hardcoded values used in code.
 489
 490     + Not critical, but can hint at future improvements.
 491
 492   + Documentation MUST list all files or directories read
 493     (so caller can make sure their content is ready).
 494
 495   + Documentation MUST list all files or directories updated
 496     (created, deleted, emptied, otherwise edited).
 497
 498   + Documentation SHOULD list all functions called (so reader can look them up).
 499
 500     + Documentation SHOULD mention where are the functions defined,
 501       if not in the current file.
 502
 503   + Documentation SHOULD list all external commands executed.
 504
 505     + Because their behavior can change "out of bounds", meaning
 506       the contributor changing the implementation of the extrenal command
 507       can be unaware of this particular function interested in its side effects.
 508
 509   + Documentation SHOULD explain exit code (coming from
 510     the last executed command).
 511
 512     + Usually, most functions SHOULD be "pass or die",
 513       but some callers MAY be interested in nonzero exit codes
 514       without using global variables to store them.
 515
 516     + Remember, "exit 1" ends not only the function, but all scripts
 517       in the source chain, so code MUST NOT use it for other purposes.
 518
 519       + Code SHOULD call "die" function instead. This way the caller can
 520         redefine that function, if there is a good reason for not exiting
 521         on function failure.
 522
 523   + TODO: Programs installed, services started, URLs downloaded from, ...
 524
 525   + TODO: Add more items when you spot them.
 526
 527   + TODO: Is the current order recommended?
 528
 529 Entry script documentation
 530 ``````````````````````````
 531
 532 + After "set -exuo pipefail", high-level description SHALL come.
 533
 534   + Then TODOs and FIXMEs SHALL be placed (if any).
 535
 536   + Entry scripts are rarely reused, so detailed side effects
 537     are OPTIONAL to document.
 538
 539   + But code SHOULD document the primary side effects.
 540
 541 + Then SHALL come few commented lines to import the library with "die" function.
 542
 543 + Then block of "source" commands for sourcing other libraries needed SHALL be.
 544
 545   + In alphabetical order, any "special" library SHOULD be
 546     in the previous block (for "die").
 547
 548 + Then block os commands processing arguments SHOULD be (if needed).
 549
 550 + Then SHALL come block of function calls (with parameters as needed).
 551
 552 Other general recommendations
 553 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 554
 555 + Code SHOULD NOT not repeat itself, even in documentation:
 556
 557   + For hardcoded values, a general description SHOULD be written
 558     (instead of copying the value), so when someone edits the value
 559     in the code, the description still applies.
 560
 561   + If affected directory name is taken from a global variable,
 562     documentation MAY distribute the directory description
 563     over the two items.
 564
 565   + If most of side effects come from an inner call,
 566     documentation MAY point the reader to the documentation
 567     of the called function (instead of listing all the side effects).
 568
 569     + TODO: Composite functions can have large effects. Should we require
 570       intermediate functions to actively hide them whenever possible?
 571
 572 + But documentation SHOULD repeat it if the information crosses functions.
 573
 574   + Item description MUST NOT be skipped just because the reader
 575     should have read parent/child documentation already.
 576
 577   + Frequently it is RECOMMENDED to copy&paste item descriptions
 578     between functions.
 579
 580   + But sometimes it is RECOMMENDED to vary the descriptions. For example:
 581
 582     + A global variable setter MAY document how does it figure out the value
 583       (without caring about what it will be used for by other functions).
 584
 585     + A global variable reader MAY document how does it use the value
 586       (without caring about how has it been figured out by the setter).
 587
 588 + When possible, Bash code SHOULD be made to look like Python
 589   (or Robot Framework). Those are three primary languages CSIT code relies on,
 590   so it is nicer for the readers to see similar expressions when possible.
 591   Examples:
 592
 593   + Code MUST use indentation, 1 level is 4 spaces.
 594
 595   + Code SHOULD use "if" instead of "&&" constructs.
 596
 597   + For comparisons, code SHOULD use operators such as "!=" (needs "[[").
 598
 599 + Code MUST NOT use more than 80 characters per line.
 600
 601   + If long external command invocations are needed,
 602     code SHOULD use array variables to shorten them.
 603
 604   + If long strings (or arrays) are needed, code SHOULD use "+=" operator
 605     to grow the value over multiple lines.
 606
 607   + If "|| die" does not fit with the command, code SHOULD use curly braces:
 608
 609     + Current line has "|| {",
 610
 611     + Next line has the die commands (indented one level deeper),
 612
 613     + Final line closes with "}" at original intent level.
 614
 615   + TODO: Recommend what to do with other constructs.
 616
 617     + For example multiple piped commands.
 618
 619     + No, "eval" is too unsafe to use.