《Design by Contract for Embedded Software》 翻譯

原文: Design by Contract for Embedded Software (state-machine.com)

Design by Contract is the single most effective programming technique for delivering high-quality code. Here you can learn what the Design by Contract programming philosophy is, what can it do for you, and why should all embedded software developers care.

契約式設計是交付高品質程式碼的一種有效的編程技術。在這裡,你可以了解到什麼是契約式設計的編程理念,它能為你做什麼,以及為什麼所有的嵌入式軟體開發者都應該關注它。

Errors versus Exceptional Conditions

錯誤 VS 異常

While embedded systems come with their own set of complexities, they also offer many opportunities for simplifications compared to general-purpose computers. Dealing with errors and exceptional conditions provides perhaps the best case in point. Just think, how many times have you seen embedded software terribly convoluted by attempts to painstakingly propagate an error through many layers of code, just to end up doing something trivial with it, such as performing a system reset?

雖然嵌入式系統有其自身的複雜性,但與通用電腦相比,它們也提供了許多簡化的機會。處理錯誤(error)和異常情況(exception)可能是最好的例子。試想一下,你有多少次看到嵌入式軟體試圖通過分析一層層的程式碼艱難的把一個層層傳播下來的錯誤捕獲,然而由於系統的複雜性,最終只能做一些微不足道的事情去應對,比如執行系統複位?

By error (known otherwise as a 「bug」), I mean a persistent defect due to a design or implementation mistake (e.g., overrunning an array index or writing to a file before opening it). When your software has a bug, typically, you cannot reasonably 「handle」 the situation. You should rather concentrate on detecting (and ultimately fixing) the root cause of the problem. This situation is in contrast to the exceptional condition, which is a specific circumstance that can legitimately arise during the system lifetime but is relatively rare and lies off the main execution path of your software. In contrast to an error, you need to design and implement a recovery strategy that handles the exceptional condition.

所謂錯誤(以其他方式稱為 “bug”),我指的是由於設計或實現上的錯誤(例如,數組越界或在打開文件之前寫入文件)導致的持續缺陷。當你的軟體有一個 bug 時,通常,你不能合理地 “處理 “這種情況。你應該專註於檢測(並最終修復)問題的根源。這種情況與異常 (特殊情況)相反,異常是指在系統生命周期內可以合法地出現的特定情況,但相對罕見,並且不在你軟體的主要執行路徑上。與錯誤相比,你需要設計和實施一個處理異常的恢復策略。

As an example, consider dynamic memory allocation. In any type of system, memory allocation with malloc() (or the C++ new operator) can fail. In a general-purpose computer, a failed malloc() merely indicates that, at this instant the operating system cannot supply the requested memory. This can happen easily in a highly dynamic, general-purpose computing environment. When it happens, you have options to recover from the situation. One option might be for the application to free up some memory that it allocated and then retry the allocation. Another choice could be to prompt the user that the problem exists and encourage them to exit other applications so that the current application can gather more memory. Yet another option is to save data to the disk and exit. Whatever the choice, handling this situation requires some drastic actions, which are clearly off the mainstream behavior of your application. Nevertheless, you should design and implement such actions because in a desktop environment, a failed malloc() must be considered an exceptional condition.

我們以動態記憶體分配作為一個例子。在任何類型的系統中,用 malloc()(或 C++的 new 操作符)分配記憶體都可能失敗。在通用電腦中,一個失敗的 malloc() 僅僅表明,在這一時刻,作業系統不能提供所要求的記憶體。在一個高度動態的通用計算環境中,這種情況很容易發生。當它發生時,你可以選擇從這種情況下恢復。一個選擇可能是讓應用程式釋放它所分配的一些記憶體,然後重新嘗試分配。另一個選擇可能是提示用戶問題的存在,並鼓勵他們退出其他應用程式,以便當前的應用程式可以收集更多的記憶體。然而,另一個選擇是將數據保存到磁碟並退出。不管是什麼選擇,處理這種情況需要一些激烈的行動,這顯然是不符合你的應用程式的主流行為。然而,你應該設計並實現這樣的動作,因為在桌面環境中,malloc () 失敗必須被視為一種異常。

In a typical embedded system, on the other hand, the same failed malloc() probably should be flagged as a bug. That』s because embedded systems offer much fewer excuses to run out of memory, so when it happens, it』s typically an indication of a flaw. You cannot really recover from it. Exiting other applications is not an option. Neither is saving data to a disk and exit. Whichever way you look at it, it』s a bug no different really from overflowing the stack, dereferencing a NULL pointer, or overrunning an array index. Instead of bending over backwards in attempts to handle this condition in software (as you would on the desktop), you should concentrate first on finding the root cause and then fixing the problem. (I would first look for a memory leak, wouldn』t you?)

另一方面,在一個典型的嵌入式系統中,同樣失敗的 malloc() 可能應該被標記為一個錯誤。這是因為嵌入式系統提供的記憶體耗盡的借口要少得多,所以當它發生時,它通常是一個缺陷的跡象。你無法真正從中恢復。退出其他應用程式不是一種選擇。將數據保存到磁碟並退出也不是一種選擇。無論你從哪方面看,這都是一個錯誤,與堆棧溢出、解讀 NULL 指針或超限數組索引沒有什麼區別。與其在軟體中彎腰試圖處理這種情況(就像在桌面上一樣),你應該首先集中精力找到根本原因,然後解決問題。(如我首先會尋找記憶體泄漏,你會嗎?)

The main point here is that many situations traditionally handled as exceptional conditions in general-purpose computing are in fact bugs in embedded systems. In other words, the specifics of embedded systems (computers dedicated to a single, well-defined purpose) allow you to considerably simplify the embedded software by flagging many situations as bugs (that you don』t need to handle) rather than exceptional conditions (that you do need to handle). The correct distinction between these two situations always depends on the context, so you should not blindly transfer the rules of thumb from other areas of programming to embedded real-time systems. Instead, I propose that you critically ask yourself the following two probing questions: 「Can a given situation legitimately arise in this particular system?」 and 「If it happens, is there anything specific that needs to or can be done in the software?」 If the answer to either of these questions is 「yes,」 then you should handle the situation as an exceptional condition; otherwise, you should treat the situation as a bug.

這裡的主要觀點是,許多在傳統的通用計算中作為異常 (特殊情況) 處理的情況,在嵌入式系統中實際上表現為錯誤。換句話說,嵌入式系統(專門用於單一的、定義明確的用途的電腦)的特性允許你通過將許多情況標記為 bug(你不需要處理)而不是異常(你需要處理)來大大簡化嵌入式軟體。這兩種情況的正確區分總是取決於上下文,所以你不應該盲目地將其他編程領域的經驗法則轉移到嵌入式實時系統中。相反,我建議你批判性地問自己以下兩個探究性問題。”在這個特定的系統中,一個特定的情況會合法地出現嗎?”和 “如果它發生了,在軟體中是否有任何具體的需要或可以做的事情?” 如果這兩個問題的答案都是 “是”,那麼你就應該把這種情況作為一種異常來處理;否則,你就應該把這種情況作為一個錯誤來處理。

The distinction between errors and exceptional conditions in any type of software (not just firmware) is important, because errors require the exact opposite programming strategy than exceptional conditions. The first priority in dealing with errors is to detect them as early as possible. Any attempt to handle a bug (as you would an exceptional condition) results in unnecessary complications of the code and either camouflages the bug or delays its manifestation. (In the worst case, it also introduces new bugs.) Either way, finding and fixing the bug will be harder.

在任何類型的軟體(不僅僅是韌體)中,區分錯誤和異常是很重要的,因為錯誤需要與異常完全相反的編程策略。處理錯誤的首要任務是儘可能早地發現它們。任何試圖處理錯誤的行為(就像處理特殊情況一樣)都會導致程式碼不必要的複雜化,要麼掩蓋錯誤,要麼延遲其表現。(在最壞的情況下,它還會引入新的錯誤。) 無論怎樣,發現和修復錯誤都會更難。

Design by Contract (DbC)

契約設計(DbC)

And here is where the Design by Contract (DbC) philosophy comes in. DbC, pioneered by Bertrand Meyer, views a software system as a set of components whose collaboration is based on precisely defined specifications of mutual obligations—the contracts.1 The central idea of this method is to inherently embed the contracts in the code and validate them automatically at run time. Doing so consistently has two major benefits: 1) It automatically helps detect bugs (as opposed to 「handling」 them), and 2) It is one of the best ways to document code.

而這正是契約設計(DbC)理念的體現。DbC 是由 Bertrand Meyer 開創的,他將軟體系統視為一組組件,這些組件的協作是基於精確定義的相互義務的規範–合約 1。這樣做有兩個主要好處。1)它自動幫助檢測錯誤(而不是 “處理 “它們),2)它是記錄程式碼的最佳方式之一。

You can implement the most important aspects of DbC (the contracts) in C or C++ with assertions. The Standard C Library macro assert() is rarely applicable to embedded systems, however, because its default behavior (when the integer expression passed to the macro evaluates to 0) is to print an error message and exit. Neither of these actions makes much sense for most embedded systems, which rarely have a screen to print to and cannot really exit either (at least not in the same sense that a desktop application can). Therefore, in an embedded environment, you usually have to define your own assertions that suit your tools and allow you to customize the error response. I』d suggest, however, that you think twice before you go about 「enhancing」 assertions, because a large part of their power derives from their relative simplicity.

你可以用斷言在 C 或 C++中實現 DbC 的最重要的方面(契約/合約)。然而,標準 C 庫的宏 assert() 很少適用於嵌入式系統,因為它的默認行為(當傳遞給宏的整數表達式求值為 0 時)是列印一個錯誤資訊並退出。這兩種行為對大多數嵌入式系統來說都沒有什麼意義,它們很少有螢幕可以列印,也不能真正退出(至少不能像桌面程式那樣退出)。因此,在嵌入式環境中,你通常必須定義你自己的斷言,以適應你的工具並允許你自定義錯誤響應。然而,我建議你在 “加強 “斷言之前三思而後行,因為斷言的很大一部分力量來自於其相對的簡單性。

Listing 1. Embedded systems-friendly assertions

#ifndef qassert_h
#define qassert_h

/** NASSERT macro disables all contract validations
 * (assertions, preconditions, postconditions, and invariants).
 */
#ifdef NASSERT /* NASSERT defined--DbC disabled */

#define DEFINE_THIS_FILE
#define ASSERT(ignore_)  ((void)0)
#define ALLEGE(test_)    ((void)(test_))

#else /* NASSERT not defined--DbC enabled */

#ifdef __cplusplus
extern "C"
{
#endif
   /* callback invoked in case of assertion failure */
   void onAssert__(char const *file, unsigned line);
#ifdef __cplusplus
}
#endif

#define DEFINE_THIS_FILE \
   static char const THIS_FILE__[] = __FILE__

#define ASSERT(test_) \
   ((test_) ? (void)0 : onAssert__(THIS_FILE__, __LINE__))

#define ALLEGE(test_)    ASSERT(test_)

#endif /* NASSERT */

#define REQUIRE(test_)   ASSERT(test_)
#define ENSURE(test_)    ASSERT(test_)
#define INVARIANT(test_) ASSERT(test_)

#endif /* qassert_h */

Listing 1 shows the simple embedded systems-friendly assertions that I』ve found adequate for a wide range of embedded projects. Listing 1 is similar to the standard <assert.h> (<cassert> in C++), except that the solution shown in Listing 1:

  • allows customizing the error response;
  • conserves memory by avoiding proliferation of multiple copies of the filename string;
  • provides additional macros for testing and documenting preconditions (REQUIRE), postconditions (ENSURE), and invariants (INVARIANT). (The names of the three last macros are a direct loan from Eiffel, the programming language that natively supports DbC.)

The all-purpose ASSERT() macro (lines 28-29 of Listing 1) is very similar to the standard assert(). If the argument passed to this macro evaluates to 0 (false), and if additionally the macro NASSERT is not defined, then ASSERT() will invoke a global callback onAssert__(). The function onAssert__() gives the clients the opportunity to customize the error response when the assertion fails. In embedded systems, onAssert__() typically first monopolizes the CPU (by disabling interrupts), then possibly attempts to put the system in a fail-safe mode, and eventually triggers a system reset. (Many embedded systems come out of reset in a fail-safe mode, so putting them in this mode before reset is often unnecessary.) If possible, the function should also leave a trail of bread crumbs from the cause, perhaps by storing the filename and line number in a nonvolatile memory. (The entry to onAssert__() is also an ideal place to set a breakpoint if you work with a debugger. TIP: Consult your debugger manual on how you can hard-code a permanent breakpoint in onAssert__().)

Compared to the standard assert(), the macro ASSERT() conserves memory (typically ROM) by passing THIS_FILE__ (Listing 1, line 26) as the first argument to onAssert__(), rather than the standard preprocessor macro __FILE__. This avoids proliferation of the multiple copies of the __FILE__ string but requires invoking macro DEFINE_THIS_FILE (line 25), preferably at the top of every C/C++ file.2

Defining the macro NASSERT (Listing 1, line 7) disables checking the assertions. When disabled, the assertion macros don』t generate any code (lines 10 and 35-37); in particular, they don』t test the expressions passed as arguments, so you should be careful to avoid any side effects (required for normal program operation) inside the expressions tested in assertions. The notable exception is the ALLEGE() macro (lines 11 and 31), which always tests the expression, although when assertions are disabled, it does not invoke the onAssert__() callback. ALLEGE() is useful in situations where avoiding side effects of the test would require introducing temporaries, which involves pushing additional registers onto the stack—something you often want to minimize in embedded systems.

以上比較簡單,不翻譯了

The DbC Philosophy

DbC的哲學/理念

The most important point to understand about software contracts (assertions in C/C++) is that they neither handle nor prevent errors, in the same way as contracts between people do not prevent fraud. For example, asserting successful memory allocation: ALLEGE((foo = new Foo) != NULL), might give you a warm and fuzzy feeling that you have handled or prevented a bug, when in fact, you haven』t. You did establish a contract, however, in which you spelled out that the inability to dynamically allocate object Foo at this spot in the code is an error. From that point on, the contract will be checked automatically and sure enough, the program will brutally abort if the contract fails. At first, you might think that this must be backwards. Contracts not only do nothing to handle (let alone fix) bugs, but they actually make things worse by turning every asserted condition, however benign, into a fatal error! However, recall from the previous discussion that the first priority when dealing with bugs is to detect them, not to handle them. To this end, a bug that causes a loud crash (and identifies exactly which contract was violated) is much easier to find than a subtle one that manifests itself intermittently millions of machine instructions downstream from the spot where you could have easily detected it.

關於軟體契約(C/C++中的斷言),最重要的一點是,它們既不能處理也不能防止錯誤,就像人與人之間的契約不能防止欺詐一樣。例如,斷言成功的記憶體分配。ALLEGE ((foo = new Foo) != NULL),可能會給你一種溫暖和模糊的感覺,你已經處理或防止了一個錯誤,而事實上,你沒有。然而,你確實建立了一個契約,在這個契約中,你闡明了在程式碼中的這個位置無法動態分配對象 Foo 是一個錯誤。從那時起,契約將被自動檢查,當然,如果契約失敗,程式將被粗暴地中止。起初,你可能認為這一定是倒退。契約不僅對處理(更不用說修復)錯誤毫無幫助,而且它們實際上使事情變得更糟,因為它們把每一個斷言條件,無論多麼良性,都變成了一個致命的錯誤然而,回顧前面的討論,在處理 bug 時,首要任務是檢測它們,而不是處理它們。為此,一個導致大聲崩潰的 bug(並準確地識別出哪個契約被違反)要比一個微妙的 bug 更容易被發現,這個 bug 斷斷續續地表現在離你可以輕易發現它的地方几百萬條機器指令的下游。

Assertions in software are in many respects like fuses in electrical circuits. Electrical engineers insert fuses in various places of their circuits to instill a controlled damage (burning a fuse) in case the circuit fails or is mishandled. Any nontrivial circuit, such as the electrical system of a car, has a multitude of differently rated fuses (a 20A fuse is appropriate for the headlights, but it』s way too big for the turn signals) to better help localize problems and to more effectively prevent expensive damage. On the other hand, a fuse can neither prevent nor fix a problem, so replacing a burned fuse doesn』t help until the root cause of the problem is removed. Just like with assertions, the main power of fuses derives from their simplicity.

軟體中的斷言在很多方面都像電路中的保險絲。電子工程師在電路的各個地方安裝保險絲,以便在電路發生故障或處理不當的情況下灌輸一種可控的損害(熔斷保險絲)。任何非微不足道的電路,如汽車的電氣系統,都有許多不同額定值的保險絲(20A 的保險絲適用於大燈,但對於轉向燈來說就太大了),以更好地幫助定位問題,更有效地防止昂貴的損害。另一方面,保險絲既不能防止也不能解決問題,所以在問題的根源被消除之前,更換熔斷的保險絲並沒有幫助。就像斷言一樣,保險絲的主要力量來自於其簡單性。

Due to the simplicity, however, assertions are sometimes viewed as a too primitive error-checking mechanism—something that』s perhaps good enough for smaller programs, but must be replaced with a 「real」 error handling in the industry-strength software. This view is inconsistent with the DbC philosophy, which regards contracts (assertions in C/C++) as the integral part of the software design. Contracts embody important design decisions, namely declaring certain situations as errors rather than exceptional conditions, and, therefore, embedding them in large-scale, industry-strength software is even more important than in quick-and-dirty solutions. Imagine building a large industrial electrical circuit (say, a power plant) without fuses.

然而,由於其簡單性,斷言有時被認為是一種過於原始的錯誤檢查機制–對於較小的程式來說,這種機制也許足夠好,但在具有工業強度的軟體中,必須用 “真正的 “錯誤處理來代替。這種觀點與 DbC 哲學不一致,DbC 哲學認為契約(C/C++中的斷言)是軟體設計的組成部分。契約體現了重要的設計決策,即把某些情況宣布為錯誤,而不是異常,因此,把它們嵌入到大規模的、具有工業強度的軟體中,甚至比快速和骯髒的解決方案更重要。想像一下,在沒有保險絲的情況下建造一個大型的工業電路(比如說,一個發電廠)。

Defensive or Preemptive Programming?

防禦式還是進攻式編程?

The term 「defensive programming」 seems to have two complementary meanings. In the first meaning, the term is used to describe a programming style based on assertions, where you explicitly assert any assumptions that should hold true as long as the software operates correctly.3 In this sense, 「defensive programming」 is essentially synonymous with DbC.

術語 “防禦性編程 “似乎有兩個互補的含義。在第一種含義中,這個術語被用來描述一種基於斷言的編程風格,在這種風格中,你明確斷言任何假設,只要軟體運行正常,這些假設就應該是真實的。在這個意義上,”防禦性編程 “基本上是 DbC 的同義詞。

In the other meaning, however, 「defensive programming」 denotes a programming style that aims at making operations more robust to errors, by accepting a wider range of inputs or allowing an order of operations not necessarily consistent with the object』s state. In this sense, 「defensive programming」 is complementary to DbC. For example, consider the following hypothetical output Port class:

然而,在另一種意義上,”防禦性編程 “指的是一種編程風格,旨在通過接受更廣泛的輸入或允許不一定與對象的狀態一致的操作順序,使操作對錯誤更加穩健。在這個意義上,”防禦性編程 “是對 DbC 的補充。例如,考慮下面這個假想的輸出埠類:

class Port {
    bool open_;
public:
    Port() : open_(false) {}
    void open() {
        if (!open_) {
            // open the port ...
            open_ = true;
        }
    }
    void transmit(unsigned char const *buffer, unsigned nBytes) {
        if (open_ && buffer != NULL && nBytes > 0) {
            // transmit nBytes
            // from the buffer ...
        }
    }
    void close() {
        if (!open_) {
            open_ = false;
            // close the port ...
        }
    }
    // . . .
};

This class is programmed defensively (in the second meaning of the term), because it silently accepts invoking operations out of order (that is, transmit() before open()) with invalid parameters (e.g., transmit(NULL, 0)). This technique of making operations more robust to errors is often advertised as a better coding style, but unfortunately, it often hides bugs. Is it really a good program that calls port.transmit() before port.open()? Is it really OK to invoke transmit() with an uninitialized transmit buffer? I』d argue that a correctly designed and implemented code should not do such things, and when it happens it』s a sure indication of a larger problem. In comparison, the Port class coded according to the DbC philosophy would use preconditions:

這個類的編程是防禦性的(在這個術語的第二個含義中),因為它默默地接受以無效的參數(例如,transmit (NULL, 0))不按順序調用操作(即在 open () 之前調用 transmit ())。這種使操作對錯誤更健壯的技術經常被宣傳為更好的編碼風格,但不幸的是,它經常隱藏著錯誤。在 port. open () 之前調用 port. transmit () 真的是一個好程式嗎?用一個未初始化的發送緩衝區調用發送 () 真的可以嗎?我認為,一個正確設計和實現的程式碼不應該做這樣的事情,當它發生時,肯定是一個更大的問題的跡象。相比之下,根據 DbC 理念編碼的 Port 類會使用前置條件:

class Port {
    bool open_;
public:
    Port() : open_(false) {}
    void open() {
        REQUIRE(!open_);
        // open the port ...
        open = true;
    }
    void transmit(unsigned char const *buffer, unsigned nBytes) {
        REQUIRE(open_ && buffer != NULL && nBytes > 0);
        // transmit n-bytes
        // from the buffer ...
    }
    void close() {
        REQUIRE(open_);
        open_ = false;
        // close the port ...
    }
    // . . .
};

This implementation is intentionally less flexible, but unlike the defensive version, it』s hard to use this one incorrectly. Additionally (although difficult to convincingly demonstrate with a toy example like this), assertions tend to eliminate a lot of code that you would have to invent to handle the wider range of inputs allowed in the defensive code.

這個實現故意不那麼靈活,但與防禦性版本不同,很難錯誤地使用這個版本。此外(雖然很難用這樣的玩具例子來令人信服地證明),斷言傾向於消除很多你必須發明的程式碼,以處理防禦性程式碼中允許的更大範圍的輸入。

But there is more, much more to DbC than just complementing defensive programming. The key to unveiling DbC』s full potential is to preemptively look for conditions to assert. The most effective assertions are discovered by asking two simple questions: 「What are the implicit assumptions for this code and the client code to execute correctly?」 and 「How can I explicitly and most effectively test these assumptions?」 By asking these questions for every piece of code you write, you』ll discover valuable conditions that you wouldn』t test otherwise. This way of thinking about assertions leads to a paradigm shift from 「defensive」 to 「preemptive」 programming, in which you preemptively look for situations that have even a potential of breeding bugs.

但是,DbC 不僅僅是補充防禦性編程,還有更多,更多。揭示 DbC 全部潛力的關鍵是先發制人地尋找斷言的條件。最有效的斷言是通過問兩個簡單的問題發現的。”這段程式碼和客戶程式碼正確執行的隱含假設是什麼?”和 “我怎樣才能明確和最有效地測試這些假設?” 通過對你寫的每一段程式碼提出這些問題,你會發現有價值的條件,否則你就不會去測試。這種對斷言的思考方式導致了從 “防禦性 “到 “先發制人 “的編程模式的轉變,在這種模式下,你會先發制人地尋找那些甚至有可能滋生錯誤的情況。

To this end, embedded systems are particularly suitable for implementing such a 「preemptive doctrine.」 Embedded CPUs are surrounded by specialized peripherals that just beg to be used for validating correct program execution. For example, a serial communication channel (say, a 16550-type UART) might set a bit in the Line Status Register when the input FIFO overflows. A typical serial driver ignores this bit (after all, the driver cannot recover bytes that already have been lost), but your driver can assert that the FIFO overrun bit is never set. By doing so, you』ll know when your hardware has lost bytes (perhaps due to an intermittent delay in servicing the UART), which is otherwise almost impossible to detect or reproduce at will. Needless to say, with this information you』ll not waste your time on debugging the protocol stack, but instead you』ll concentrate on finding and fixing a timing glitch. You can use timer/counters in a similar way to build real-time assertions that check if you miss your deadlines. In my company, we』ve used a special register of a GPS correlator chip to assert that every GPS channel is always serviced within its C/A code epoch (around every 1 ms)—yet another form of a real-time assertion. The main point of these examples is that the information available almost for free from the hardware wouldn』t be used if not for the 「preemptive」 assertions. Yet, the information is invaluable, because it』s often the only way to directly validate the time-domain performance of your code.

為此,嵌入式系統特別適合實施這種 “先發制人理論”。嵌入式 CPU 周圍有專門的外設,這些外設正好可以用來驗證程式的正確執行。例如,當輸入 FIFO 溢出時,一個串列通訊通道(例如 16550 型 UART)可能在線路狀態暫存器中設置一個位。一個典型的串列驅動程式會忽略這個位(畢竟,驅動程式不能恢復已經丟失的位元組),但是你的驅動程式可以斷言 FIFO 溢出位從未被設置。通過這樣做,你就可以知道你的硬體何時丟失了位元組(也許是由於 UART 中斷服務的間歇性延遲),否則幾乎不可能隨意檢測或再現。不用說,有了這些資訊,你就不會把時間浪費在調試協議棧上,而是集中精力尋找和修復一個定時故障。你可以以類似的方式使用定時器/計數器來建立實時斷言,檢查你是否錯過了最後期限。在我的公司里,我們使用 GPS 相關晶片的一個特殊暫存器來斷言每個 GPS 通道總是在其 C/A 程式碼紀元內得到服務(大約每 1ms)–這是實時斷言的另一種形式。這些例子的主要觀點是,如果不是 “先發制人 “的斷言,幾乎可以從硬體中免費獲得的資訊不會被使用。然而,這些資訊是無價的,因為它往往是直接驗證你的程式碼的時域性能的唯一方法。

Assertions and Testing

斷言和測試

At GE Medical Systems, I once got involved in developing an automatic testing suite for diagnostics X-ray machines, which we called 「cycler.」 The cycler was essentially a random monkey program that emulated activating soft keys on our touch screen and depressing the foot switch to initiate X-ray exposures. The idea was to let the cycler exercise the system at night and on weekends. Indeed, the cycler helped us to catch quite a few problems, mostly those that left entries in the error log. However, because our software was programmed mostly defensively, in absence of errors in the log we didn』t know if the 「cycler」 run was truly successful, or perhaps, the code just wandered around all weekend long silently 「taking care」 of various problems.

在通用電氣醫療系統公司,我曾經參與過為診斷用 X 光機開發一個自動測試套件,我們稱之為 “循環器”。循環器基本上是一個隨機的 Mock 程式,模擬激活我們觸控螢幕上的軟鍵和按下腳踏開關來啟動 X 射線曝光。我們的想法是在晚上和周末讓循環器不間斷地測試系統。事實上,循環器幫助我們發現了不少問題,主要是那些在錯誤日誌中留下的條目。然而,由於我們的軟體主要是防禦性編程,在日誌中沒有錯誤的情況下,我們不知道 “循環器 “的運行是否真的成功,或者也許,程式碼只是在整個周末默默地 “照顧 “各種問題而徘徊。

In contrast, every successful test run of code peppered with assertions builds much more confidence in the software. I don』t know exactly what the critical density of assertions must be, but at some point the tests stop producing undefined behavior, segmentation faults, or system hangs—all bugs manifest themselves as assertion failures. This effect of DbC is truly amazing. The integrity checks embodied in assertions prevent the code from 「wandering around」 and even broken builds don』t crash-and-burn but rather end up hitting an assertion.

相比之下,每一次成功的程式碼測試運行都摻雜著斷言,在軟體中建立了更大的信心。我不知道斷言的臨界密度是多少,但在某些時候,測試不再產生未定義的行為、分段故障或系統掛起,所有的錯誤都表現為斷言失敗。DbC 的這種效果確實令人驚訝。體現在斷言中的完整性檢查防止了程式碼的 “四處遊盪”,即使是壞了的構建也不會崩潰和燃燒,而是最終擊中斷言。

Testing code developed according to DbC principles has an immense psychological impact on programmers. Because assertions escalate every asserted condition to a fatal error, all bugs require attention. DbC makes it so much harder to dismiss an intermittent bug as a 「glitch」—after all, you have a record in the form of the filename and the line number where the assertion fired. Once you know where in the code to start your investigations, most bugs are more transparent.

測試根據 DbC 原則開發的程式碼對程式設計師有巨大的心理影響。因為斷言將每一個斷言條件升級為致命的錯誤,所有的錯誤都需要關注。DbC 使得將一個間歇性的錯誤當作 “小故障 “來處理變得非常困難–畢竟,你有一個文件名和斷言發生的行號的記錄。一旦你知道從程式碼的哪個地方開始調查,大多數錯誤就會更加透明。

DbC also encourages testing to be accommodated by the system architecture. In the embedded systems domain, the days of logic analyzers or in-circuit emulators having direct access to all of the CPU』s state information are long gone. Even if you had access to all the CPU』s address and data signals (which you typically don』t, because there are simply not enough pins to go around), the multistage pipelines and cache memories make it impossible to figure out what』s going on in there. The solution requires the testing instrumentation (assertions) integrated directly into the system』s firmware. You can no longer design a system without accounting for testing overhead right from the start. Assuming that all the CPU cycles, the RAM, and all the ROM will be devoted strictly to the job at hand simply won』t get the job done.

DbC 還鼓勵測試被系統結構所容納。在嵌入式系統領域,邏輯分析儀或在線模擬器可以直接訪問 CPU 的所有狀態資訊的日子早已一去不復返。即使你能訪問 CPU 的所有地址和數據訊號(你通常不能,因為根本沒有足夠的引腳),多級流水線和高速緩衝存儲器使你不可能弄清裡面發生了什麼。該解決方案要求將測試工具(斷言)直接集成到系統的韌體中。你不能再設計一個系統而不從一開始就考慮到測試的開銷。假設所有的 CPU 周期、RAM 和所有的 ROM 都被嚴格地用於手頭的工作,是無法完成工作的。

Assertions in Production Code

產品程式碼中的斷言

The standard practice is to use assertions during development and testing, but to disable them in the final product by defining the NDEBUG macro. In Listing 1, I have replaced this macro with NASSERT, because many development environments define NDEBUG automatically when you switch to the production version, and I wanted to decouple the decision of disabling assertions from the version of software that you build. That』s because I truly believe that leaving assertions enabled, especially in the ship-version of the product, is a good idea.

標準做法是在開發和測試期間使用斷言,但在最終產品中通過定義 NDEBUG 宏來禁用它們。在清單 1 中,我用 NASSERT 代替了這個宏,因為許多開發環境在你切換到生產版本時自動定義了 NDEBUG,我想把禁用斷言的決定與你構建的軟體版本脫鉤。這是因為我真的相信,讓斷言處於啟用狀態,特別是在產品的出貨版本中,是一個好主意。

The often-quoted opinion in this matter comes from C.A.R. Hoare, who considered disabling assertions in the final product like using a lifebelt during practice, but then not bothering with it for the real thing. I find the comparison of assertions to fuses more compelling. Would you design a prototype board with carefully rated fuses, but then replace them all with 0 W resistors (chunky pieces of wire) for a production run?

在這個問題上經常被引用的意見來自C.A.R.Hoare,他認為在最終產品中禁用斷言就像在練習中使用救生圈,但在真正的比賽中卻不屑於使用它。我發現把斷言比作保險絲更有說服力。你會用精心設計的保險絲來設計一個原型板,但在生產過程中把它們全部換成 0W 的電阻(大塊的電線)嗎?

The question of shipping with assertions really boils down to two issues. First is the overhead that assertions add to your code. Obviously, if the overhead is too big, you have no choice. (But then I must ask how have you built and tested your firmware?) However, assertions should be considered an integral part of the firmware and properly sized hardware should accommodate them. As the price of hardware rapidly drops and its capabilities skyrocket, it just makes sense to trade a small fraction of the raw CPU horsepower and memory resources for better system integrity. In addition, as I mentioned earlier, assertions often pay for themselves by eliminating reams of defensive code.

使用斷言的問題實際上可以歸結為兩個問題。首先是斷言給你的程式碼帶來的開銷。很明顯,如果開銷太大,你就沒有選擇。(但是,我必須問,你是如何建立和測試你的韌體的?)然而,斷言應該被認為是韌體的一個組成部分,適當大小的硬體應該容納它們。隨著硬體價格的迅速下降和能力的急劇上升,用一小部分原始 CPU 馬力和記憶體資源來換取更好的系統完整性是有意義的。此外,正如我前面提到的,斷言往往通過消除大量的防禦性程式碼而得到回報。

The other issue is the correct system response when an assertion fires in the field. As it turns out, a simple system reset is for most embedded devices the least inconvenient action from the user』s perspective—certainly less inconvenient than locking up a device and denying service indefinitely. That』s exactly what happened the other day, when my wife』s cellular phone froze and the only way of bringing it back to life was to pull out the battery. (I don』t know how she does it, but since then she managed to hang her phone again more than once, along with our VCR and even the TV.) The question that comes to my mind is whether the firmware in those products used assertions (or whether the assertions have been enabled)—apparently not, because otherwise the firmware would have reset automatically.

另一個問題是當一個斷言在現場發生時的正確系統響應。事實證明,對於大多數嵌入式設備來說,從用戶的角度來看,簡單的系統重置是最不方便的行動–當然比鎖定設備和無限期拒絕服務要不方便。這正是前幾天發生的事情,當時我妻子的手機凍結了,而使其恢復正常的唯一方法是拔出電池。(我不知道她是怎麼做到的,但從那時起,她又設法不止一次地把手機掛起來,還有我們的錄像機,甚至電視)。我想到的問題是,這些產品的韌體是否使用了斷言(或斷言是否已被啟用)–顯然沒有,因為否則韌體會自動重置。

Further Reading

Assertions have been a recurring subject of many articles (and rightly so). For example, two articles from the C/C++ Users Journal「Generic: Assertions」 and 「Generic : Enforcements「, describe how you can unleash templates and exceptions to build truly smart assertions. More specifically to embedded systems, I greatly enjoyed Niall Murphy』s articles devoted to assertions, Assertiveness Training for Programmers and Assert Yourself. By the way, the analogy between assertions in software and fuses in electrical systems was Niall』s original idea, which came up when we talked about assertions at an Embedded Systems Conference some years ago in San Francisco.

The main goal of this article is to convince you that the DbC philosophy can fundamentally change the way you design, implement, test and deploy your software. A good starting point to learn more about DbC is the Eiffel Software website (among others). You can find there 「Design by Contract: The Lessons of Ariane」, an interesting interpretation of the infamous Ariane 5 software failure.