多线程std::cout 深入研究

  • 2020 年 8 月 25 日
  • 筆記

1.研究背景

  在测试时发现mingw版本的gcc编译出来的程序,一个主程序新建20个线程,每个线程都循环向cout输出信息,几分钟程序就崩了,而用msvc和gcc-linaro版gcc交叉编译器编译出来的运行很久都没问题。

2.相关查询

2.1 C++ iostreams: Unexpected but legal multithreaded behaviour

//berthub.eu/articles/posts/iostreams-unexpected/

2.1.1Multi-threading

One reason for using C++ is that it supports multi-threading (or more broadly, multi-processing) very well. The original C++ standard had no words on it because back in the day, officially there were no threads. Later versions of C++ (starting with C++ 2011) dusted off the iostreams specification and added words on thread safety.

This starts off with the following:

Concurrent access to a stream object (30.8, 30.9), stream buffer object (30.6), or C Library stream (30.12) by multiple threads may result in a data race (6.8.2) unless otherwise specified (30.4). [ Note: Data races result in undefined behavior (6.8.2). — end note ] – [iostreams.threadsafety]

This is a blanket statement that bad things may happen if we do stuff to iostreams from several threads at the same time, unless there is a specific statement that says doing so is safe.

Luckily, there is the following paragraph too:

Concurrent access to a synchronized (27.5.3.4) standard iostream object’s formatted and unformatted input (27.7.2.1) and output (27.7.3.1) functions or a standard C stream by multiple threads shall not result in a data race (1.10). [Note: Users must still synchronize concurrent use of these objects and streams by multiple threads if they wish to avoid interleaved characters. — end note] – [iostream.objects.overview]

No disasters will happen on concurrent use of iostreams, although if you print out two log lines to cerr at the same time, you may find them interleaved in your output. This certainly is not pretty & hard to parse, but at least it is not illegal.

Note however that this paragraph talks only about ‘synchronized’ streams. Once we call the much recommended sync_with_stdio(false), our streams are no longer synchronized, not only not with stdio, but not at all. This means every write operation on cin or cout etc must now be protected by a mutex.

This itself is likely reason enough to never call sync_with_stdio(false) in any multi-threaded program using cout to print things.

2.1.2 Basic thread locking in C++11

Notice that the requirement not to produce a data race applies only to the standard iostream objects (cout, cin, cerr, clog, wcout, wcin, wcerr, and wclog) and only when they are synchronized (which they are by default and which can be disabled using the sync_with_stdio member function).

Unfortunately I’ve noticed two phenomena; implementations either provide stricter guarantees than required (e.g., thread synchronization for all stream objects no matter what, giving poor performance) or fewer (e.g., standard stream objects that are sync_with_stdio produce data races). MSVC seems to lean toward the former while libc++ leans toward the latter.

Anyway, as the note indicates, you have to provide mutual exclusion yourself if you want to avoid interleaved characters. Here’s one way to do it:

std::mutex m;

struct lockostream {
    std::lock_guard<std::mutex> l;
    lockostream() : l(m) {}
};

std::ostream &operator<<(std::ostream &os, lockostream const &l) {
    return os;
}

std::cout << lockostream() << "Hello, World!\n";

This way a lock guard is created and lives for the duration of the expression using std::cout. You can templatized the lockostream object to work for any basic_*stream, and even on the address of the stream so that you have a seperate mutex for each one.

Of course the standard stream objects are global variables, so you might want to avoid them the same way all global variables should be avoided. They’re handy for learning C++ and toy programs, but you might want to arrange something better for real programs.

2.1.3 or U could do like this

You have to use the normal locking techniques as you would do with any other resource otherwise you are experiencing UB.

std::mutex m;
std::lock_guard<std::mutex> lock(m);
std::cout << "hello hello"; 

or alternativly you can use printf which is threadsafe(on posix):

printf("hello hello");

2.1.4 Summarising

Be very careful when using std::ios_base::sync_with_stdio(false), and if you do, also issue cin.tie(nullptr). Make sure sync_with_stdio is called before doing any i/o.

In general, be very weary of doing output operations on a single iostream from multiple threads – it may not do what you want.

Some further reading:

  • The libstdc++ bug I filed about this, where it will likely be concluded this is (unfortunately) not a bug, but intended behaviour
  • The {fmt} library is a simpler alternative to rapidly output text. Typically faster than printf.

2.2 Simple Lock-free std::cout in C++ Multithreading

//wasin.io/blog/17_simple-lock-free-std-cout-cpp-multithreading.html

Whenever you need to do a quick multithreading program in C++, most of the time printing something out via std::cout::operator<< to validate the logic is the most go-to solution.

Whenever at least two threads call std::cout::operator<< at the same time, then console result will probably be mess, not what we exepct. Newline might not get printed, space sometimes included but other time not included, etc.

Apply full std::mutex seems to be overkill. Anyway mutex solution is not lock-free. What’s about std::atomic? Real close, but it still doesn’t guarantee lock-free solution for us. So those two methods go out of the way.

The sane solution is to use std::atomic_flag. Lower level than std::atomic. It’s comparable to std::atomic<bool> but without load and store operation. See the following code

    static std::atomic_flag lock = ATOMIC_FLAG_INIT;

    // spin-lock (suitable if short time waiting is known beforehand)
    while (lock.test_and_set(std::memory_order_acquire))
        ;

    std::cout << "Print something\n";

    // release the lock
    lock.clear();

Check ThreadLocal.cpp for full example of multiple threads trying to print something out at the same time.

Compile it with g++ -std=c++11 ThreadLocal.cpp -lpthread.

3.总结

由于默认情况下,sync_with_stdio是true的,标准定义多线程时的cout行为为UB(undefined behavior),所以不同编译器出现不同的现象也并不奇怪。

多线程环境下的cout建议还是使用atomic_flag或者std::lock_guard<std::mutex>方式,加锁实现时c++20的osyncstream 也是不错的方案。