关于linux的一点好奇心（四）：tail -f文件跟踪实现

2022 年 8 月 14 日
笔记
C, C语言, linux, tail, 协议类, 原理&故事, 工具类, 源码

　　关于文件跟踪，我们有很多的实际场景，比如查看某个系统日志的输出，当有变化时立即体现，以便进行问题排查；比如查看文件结尾的内容是啥，总之是刚需了。

1. 自己实现的文件跟踪

　　我们平时做功能开发时，也会遇到类似的需求，比如当有人传输文件到某个位置后，我们需要触发后续处理操作。

　　那么，我们自己实现的话，也就只能通过定时检查文件是否变化，比如检测最后修改时间，从而感知到变化。如果要想让文件传输完成之后，再进行动作，则一般需要用户上传一个空的done文件，以报备事务处理完成。

　　那么，如果是系统实现呢？如题，tail -f 的文件跟踪，是否也是这样实现呢？想想感觉应该不会这么简单，毕竟操作系统肯定会比自己厉害此的。

2. tail -f的源码位置

　　我们知道，每个linux系统安装之后，都会有很多的基础命令可用，比如cat/vi/sh/top/tail… 那么，是否这些命令就是内核提供的东西呢？实际上不是的，linux kernel 部分，并未提供相应的实现，即这些工具类的都不是在kernel中实现的，而是作为外部核心工具包组件实现。即 coreutils 。这也是我们想分析一些工具类实现时需要注意的，因为它可能在你找不到的地方。

　　源码访问路径: //git.savannah.gnu.org/cgit/coreutils.git/tree/src/tail.c

3. tail -f 实现

　　tail -f 作为核心工具，虽与操作系统一起出现，但毕竟是独立实现，所以还是需要考虑具体的操作系统环境，所以它的实现往往需要分情况进行处理。

　　即它有多种实现，一种是和我们一样，定时去检测文件化，然后输出到控制台；第二种则高级些，利用操作系统提供的文件通知功能，进行实时内容输出。具体如下：

int
main (int argc, char **argv)
{
  enum header_mode header_mode = multiple_files;
  bool ok = true;
  /* If from_start, the number of items to skip before printing; otherwise,
     the number of items at the end of the file to print.  Although the type
     is signed, the value is never negative.  */
  uintmax_t n_units = DEFAULT_N_LINES;
  size_t n_files;
  char **file;
  struct File_spec *F;
  size_t i;
  bool obsolete_option;

  /* The number of seconds to sleep between iterations.
     During one iteration, every file name or descriptor is checked to
     see if it has changed.  */
  double sleep_interval = 1.0;

  initialize_main (&argc, &argv);
  set_program_name (argv[0]);
  setlocale (LC_ALL, "");
  bindtextdomain (PACKAGE, LOCALEDIR);
  textdomain (PACKAGE);

  atexit (close_stdout);

  have_read_stdin = false;

  count_lines = true;
  forever = from_start = print_headers = false;
  line_end = '\n';
  obsolete_option = parse_obsolete_option (argc, argv, &n_units);
  argc -= obsolete_option;
  argv += obsolete_option;
  parse_options (argc, argv, &n_units, &header_mode, &sleep_interval);

  /* To start printing with item N_UNITS from the start of the file, skip
     N_UNITS - 1 items.  'tail -n +0' is actually meaningless, but for Unix
     compatibility it's treated the same as 'tail -n +1'.  */
  if (from_start)
    {
      if (n_units)
        --n_units;
    }

  if (optind < argc)
    {
      n_files = argc - optind;
      file = argv + optind;
    }
  else
    {
      static char *dummy_stdin = (char *) "-";
      n_files = 1;
      file = &dummy_stdin;
    }

  {
    bool found_hyphen = false;

    for (i = 0; i < n_files; i++)
      if (STREQ (file[i], "-"))
        found_hyphen = true;

    /* When following by name, there must be a name.  */
    if (found_hyphen && follow_mode == Follow_name)
      die (EXIT_FAILURE, 0, _("cannot follow %s by name"), quoteaf ("-"));

    /* When following forever, and not using simple blocking, warn if
       any file is '-' as the stats() used to check for input are ineffective.
       This is only a warning, since tail's output (before a failing seek,
       and that from any non-stdin files) might still be useful.  */
    if (forever && found_hyphen)
      {
        struct stat in_stat;
        bool blocking_stdin;
        blocking_stdin = (pid == 0 && follow_mode == Follow_descriptor
                          && n_files == 1 && ! fstat (STDIN_FILENO, &in_stat)
                          && ! S_ISREG (in_stat.st_mode));

        if (! blocking_stdin && isatty (STDIN_FILENO))
          error (0, 0, _("warning: following standard input"
                         " indefinitely is ineffective"));
      }
  }

  /* Don't read anything if we'll never output anything.  */
  if (! n_units && ! forever && ! from_start)
    return EXIT_SUCCESS;

  F = xnmalloc (n_files, sizeof *F);
  for (i = 0; i < n_files; i++)
    F[i].name = file[i];

  if (header_mode == always
      || (header_mode == multiple_files && n_files > 1))
    print_headers = true;

  xset_binary_mode (STDOUT_FILENO, O_BINARY);

  for (i = 0; i < n_files; i++)
    ok &= tail_file (&F[i], n_units);

  if (forever && ignore_fifo_and_pipe (F, n_files))
    {
      /* If stdout is a fifo or pipe, then monitor it
         so that we exit if the reader goes away.  */
      struct stat out_stat;
      if (fstat (STDOUT_FILENO, &out_stat) < 0)
        die (EXIT_FAILURE, errno, _("standard output"));
      monitor_output = (S_ISFIFO (out_stat.st_mode)
                        || (HAVE_FIFO_PIPES != 1 && isapipe (STDOUT_FILENO)));

#if HAVE_INOTIFY
      /* tailable_stdin() checks if the user specifies stdin via  "-",
         or implicitly by providing no arguments. If so, we won't use inotify.
         Technically, on systems with a working /dev/stdin, we *could*,
         but would it be worth it?  Verifying that it's a real device
         and hooked up to stdin is not trivial, while reverting to
         non-inotify-based tail_forever is easy and portable.

         any_remote_file() checks if the user has specified any
         files that reside on remote file systems.  inotify is not used
         in this case because it would miss any updates to the file
         that were not initiated from the local system.

         any_non_remote_file() checks if the user has specified any
         files that don't reside on remote file systems.  inotify is not used
         if there are no open files, as we can't determine if those file
         will be on a remote file system.

         any_symlinks() checks if the user has specified any symbolic links.
         inotify is not used in this case because it returns updated _targets_
         which would not match the specified names.  If we tried to always
         use the target names, then we would miss changes to the symlink itself.

         ok is false when one of the files specified could not be opened for
         reading.  In this case and when following by descriptor,
         tail_forever_inotify() cannot be used (in its current implementation).

         FIXME: inotify doesn't give any notification when a new
         (remote) file or directory is mounted on top a watched file.
         When follow_mode == Follow_name we would ideally like to detect that.
         Note if there is a change to the original file then we'll
         recheck it and follow the new file, or ignore it if the
         file has changed to being remote.

         FIXME-maybe: inotify has a watch descriptor per inode, and hence with
         our current hash implementation will only --follow data for one
         of the names when multiple hardlinked files are specified, or
         for one name when a name is specified multiple times.  */
      if (!disable_inotify && (tailable_stdin (F, n_files)
                               || any_remote_file (F, n_files)
                               || ! any_non_remote_file (F, n_files)
                               || any_symlinks (F, n_files)
                               || any_non_regular_fifo (F, n_files)
                               || (!ok && follow_mode == Follow_descriptor)))
        disable_inotify = true;

      if (!disable_inotify)
        {
          int wd = inotify_init ();
          if (0 <= wd)
            {
              /* Flush any output from tail_file, now, since
                 tail_forever_inotify flushes only after writing,
                 not before reading.  */
              if (fflush (stdout) != 0)
                die (EXIT_FAILURE, errno, _("write error"));

              Hash_table *ht;
              tail_forever_inotify (wd, F, n_files, sleep_interval, &ht);
              hash_free (ht);
              close (wd);
              errno = 0;
            }
          error (0, errno, _("inotify cannot be used, reverting to polling"));
        }
#endif
      disable_inotify = true;
      tail_forever (F, n_files, sleep_interval);
    }

  if (have_read_stdin && close (STDIN_FILENO) < 0)
    die (EXIT_FAILURE, errno, "-");
  main_exit (ok ? EXIT_SUCCESS : EXIT_FAILURE);
}

　　简单地说两种实现就是: tail_forever 轮询式跟踪; tail_forever_inotify 更新时通知; 下面我们就简单瞅瞅两个具体实现吧。

3.1. 定时扫描跟踪实现

　　这里的定时扫描，和我们自己处理的不太一样的地方，主要是它会跟踪多个文件，而如果是我们自己实现，则一般只会跟踪一个文件即可。

/* Tail N_FILES files forever, or until killed.
   The pertinent information for each file is stored in an entry of F.
   Loop over each of them, doing an fstat to see if they have changed size,
   and an occasional open/fstat to see if any dev/ino pair has changed.
   If none of them have changed size in one iteration, sleep for a
   while and try again.  Continue until the user interrupts us.  */

static void
tail_forever (struct File_spec *f, size_t n_files, double sleep_interval)
{
  /* Use blocking I/O as an optimization, when it's easy.  */
  bool blocking = (pid == 0 && follow_mode == Follow_descriptor
                   && n_files == 1 && f[0].fd != -1 && ! S_ISREG (f[0].mode));
  size_t last;
  bool writer_is_dead = false;

  last = n_files - 1;
  // 一直循环检测，直到用户主动终止进程
  while (true)
    {
      size_t i;
      bool any_input = false;

      for (i = 0; i < n_files; i++)
        {
          int fd;
          char const *name;
          mode_t mode;
          struct stat stats;
          uintmax_t bytes_read;

          if (f[i].ignore)
            continue;

          if (f[i].fd < 0)
            {
              recheck (&f[i], blocking);
              continue;
            }

          fd = f[i].fd;
          name = pretty_name (&f[i]);
          mode = f[i].mode;

          if (f[i].blocking != blocking)
            {
              int old_flags = fcntl (fd, F_GETFL);
              int new_flags = old_flags | (blocking ? 0 : O_NONBLOCK);
              if (old_flags < 0
                  || (new_flags != old_flags
                      && fcntl (fd, F_SETFL, new_flags) == -1))
                {
                  /* Don't update f[i].blocking if fcntl fails.  */
                  if (S_ISREG (f[i].mode) && errno == EPERM)
                    {
                      /* This happens when using tail -f on a file with
                         the append-only attribute.  */
                    }
                  else
                    die (EXIT_FAILURE, errno,
                         _("%s: cannot change nonblocking mode"),
                         quotef (name));
                }
              else
                f[i].blocking = blocking;
            }

          if (!f[i].blocking)
            {
              // 使用 fstat 进行文件变更检测，结果存入 stats 变量中
              if (fstat (fd, &stats) != 0)
                {
                  f[i].fd = -1;
                  f[i].errnum = errno;
                  error (0, errno, "%s", quotef (name));
                  close (fd); /* ignore failure */
                  continue;
                }
              // 通过比较 mtime 判断文件是否发生变化
              if (f[i].mode == stats.st_mode
                  && (! S_ISREG (stats.st_mode) || f[i].size == stats.st_size)
                  && timespec_cmp (f[i].mtime, get_stat_mtime (&stats)) == 0)
                {
                  if ((max_n_unchanged_stats_between_opens
                       <= f[i].n_unchanged_stats++)
                      && follow_mode == Follow_name)
                    {
                      recheck (&f[i], f[i].blocking);
                      f[i].n_unchanged_stats = 0;
                    }
                  continue;
                }

              /* This file has changed.  Print out what we can, and
                 then keep looping.  */
              // 记录最后一次变更情况
              f[i].mtime = get_stat_mtime (&stats);
              f[i].mode = stats.st_mode;

              /* reset counter */
              f[i].n_unchanged_stats = 0;

              /* XXX: This is only a heuristic, as the file may have also
                 been truncated and written to if st_size >= size
                 (in which case we ignore new data <= size).  */
              if (S_ISREG (mode) && stats.st_size < f[i].size)
                {
                  error (0, 0, _("%s: file truncated"), quotef (name));
                  /* Assume the file was truncated to 0,
                     and therefore output all "new" data.  */
                  xlseek (fd, 0, SEEK_SET, name);
                  f[i].size = 0;
                }

              if (i != last)
                {
                  if (print_headers)
                    write_header (name);
                  last = i;
                }
            }

          /* Don't read more than st_size on networked file systems
             because it was seen on glusterfs at least, that st_size
             may be smaller than the data read on a _subsequent_ stat call.  */
          uintmax_t bytes_to_read;
          if (f[i].blocking)
            bytes_to_read = COPY_A_BUFFER;
          else if (S_ISREG (mode) && f[i].remote)
            bytes_to_read = stats.st_size - f[i].size;
          else
            bytes_to_read = COPY_TO_EOF;
          // 输出变更内容到控制台
          bytes_read = dump_remainder (false, name, fd, bytes_to_read);

          any_input |= (bytes_read != 0);
          f[i].size += bytes_read;
        }

      if (! any_live_files (f, n_files))
        {
          error (0, 0, _("no files remaining"));
          break;
        }

      if ((!any_input || blocking) && fflush (stdout) != 0)
        die (EXIT_FAILURE, errno, _("write error"));

      check_output_alive ();

      /* If nothing was read, sleep and/or check for dead writers.  */
      if (!any_input)
        {
          if (writer_is_dead)
            break;

          /* Once the writer is dead, read the files once more to
             avoid a race condition.  */
          writer_is_dead = (pid != 0
                            && kill (pid, 0) != 0
                            /* Handle the case in which you cannot send a
                               signal to the writer, so kill fails and sets
                               errno to EPERM.  */
                            && errno != EPERM);
          // 等待下一次轮询
          if (!writer_is_dead && xnanosleep (sleep_interval))
            die (EXIT_FAILURE, errno, _("cannot read realtime clock"));

        }
    }
}

　　我们运行tail -f 命令时，就是控制台会一直停留在输出界面，等待跟踪结果，也就是说这时的tail进程，会一直在前台运行。这时这个进程交独占用户界面，如果用户不想跟踪了，那么就必须主动终止进程，即ctrl+c 或其他进程终止方式。所以，实现还是比较简单的，如表面意思，就是不停地检测文件，输出内容，如果其中一些文件失效，则跳过即可。

　　检测主要依赖于函数: fstat (fd, &stats) , 通过比较 mtime 进行文件是否变化判定。大致不出意料。

3.2. 基于异步通知的跟踪实现

　　上一个实现是基于轮询的方式实现的，这个实现是基于通知的文件跟踪。基于轮询的实现，要求有比较合适的轮询间隔，太长不容易发现变更，太短则容易导致系统压力大。而基于通知的实现，则优雅许多，它只会在文件发生了变化进进行一次通知，其他时间几乎不会占用系统资源（实际上还是有事件轮询的资源消耗）。我们来看一下。

　　它会先进行 inotify_init(); 然后再进行 tail_forever_inotify;

/* Attempt to tail N_FILES files forever, or until killed.
   Check modifications using the inotify events system.
   Exit if finished or on fatal error; return to revert to polling.  */
static void
tail_forever_inotify (int wd, struct File_spec *f, size_t n_files,
                      double sleep_interval, Hash_table **wd_to_namep)
{
# if TAIL_TEST_SLEEP
  /* Delay between open() and inotify_add_watch()
     to help trigger different cases.  */
  xnanosleep (1000000);
# endif
  unsigned int max_realloc = 3;

  /* Map an inotify watch descriptor to the name of the file it's watching.  */
  Hash_table *wd_to_name;

  bool found_watchable_file = false;
  bool tailed_but_unwatchable = false;
  bool found_unwatchable_dir = false;
  bool no_inotify_resources = false;
  bool writer_is_dead = false;
  struct File_spec *prev_fspec;
  size_t evlen = 0;
  char *evbuf;
  size_t evbuf_off = 0;
  size_t len = 0;

  wd_to_name = hash_initialize (n_files, NULL, wd_hasher, wd_comparator, NULL);
  if (! wd_to_name)
    xalloc_die ();
  *wd_to_namep = wd_to_name;

  /* The events mask used with inotify on files (not directories).  */
  uint32_t inotify_wd_mask = IN_MODIFY;
  /* TODO: Perhaps monitor these events in Follow_descriptor mode also,
     to tag reported file names with "deleted", "moved" etc.  */
  if (follow_mode == Follow_name)
    inotify_wd_mask |= (IN_ATTRIB | IN_DELETE_SELF | IN_MOVE_SELF);

  /* Add an inotify watch for each watched file.  If -F is specified then watch
     its parent directory too, in this way when they re-appear we can add them
     again to the watch list.  */
  size_t i;
  // 依次设置跟踪标识到文件的通知列表中
  for (i = 0; i < n_files; i++)
    {
      if (!f[i].ignore)
        {
          size_t fnlen = strlen (f[i].name);
          if (evlen < fnlen)
            evlen = fnlen;

          f[i].wd = -1;

          if (follow_mode == Follow_name)
            {
              size_t dirlen = dir_len (f[i].name);
              char prev = f[i].name[dirlen];
              f[i].basename_start = last_component (f[i].name) - f[i].name;

              f[i].name[dirlen] = '\0';

               /* It's fine to add the same directory more than once.
                  In that case the same watch descriptor is returned.  */
              f[i].parent_wd = inotify_add_watch (wd, dirlen ? f[i].name : ".",
                                                  (IN_CREATE | IN_DELETE
                                                   | IN_MOVED_TO | IN_ATTRIB
                                                   | IN_DELETE_SELF));

              f[i].name[dirlen] = prev;

              if (f[i].parent_wd < 0)
                {
                  if (errno != ENOSPC) /* suppress confusing error.  */
                    error (0, errno, _("cannot watch parent directory of %s"),
                           quoteaf (f[i].name));
                  else
                    error (0, 0, _("inotify resources exhausted"));
                  found_unwatchable_dir = true;
                  /* We revert to polling below.  Note invalid uses
                     of the inotify API will still be diagnosed.  */
                  break;
                }
            }
          // 注意回调到全局
          f[i].wd = inotify_add_watch (wd, f[i].name, inotify_wd_mask);

          if (f[i].wd < 0)
            {
              if (f[i].fd != -1)  /* already tailed.  */
                tailed_but_unwatchable = true;
              if (errno == ENOSPC || errno == ENOMEM)
                {
                  no_inotify_resources = true;
                  error (0, 0, _("inotify resources exhausted"));
                  break;
                }
              else if (errno != f[i].errnum)
                error (0, errno, _("cannot watch %s"), quoteaf (f[i].name));
              continue;
            }

          if (hash_insert (wd_to_name, &(f[i])) == NULL)
            xalloc_die ();
          // 只要有一个文件需要处理，就需要保持进程的跟踪状态
          found_watchable_file = true;
        }
    }

  /* Linux kernel 2.6.24 at least has a bug where eventually, ENOSPC is always
     returned by inotify_add_watch.  In any case we should revert to polling
     when there are no inotify resources.  Also a specified directory may not
     be currently present or accessible, so revert to polling.  Also an already
     tailed but unwatchable due rename/unlink race, should also revert.  */
  if (no_inotify_resources || found_unwatchable_dir
      || (follow_mode == Follow_descriptor && tailed_but_unwatchable))
    return;
  if (follow_mode == Follow_descriptor && !found_watchable_file)
    exit (EXIT_FAILURE);

  prev_fspec = &(f[n_files - 1]);

  /* Check files again.  New files or data can be available since last time we
     checked and before they are watched by inotify.  */
  for (i = 0; i < n_files; i++)
    {
      if (! f[i].ignore)
        {
          /* check for new files.  */
          if (follow_mode == Follow_name)
            recheck (&(f[i]), false);
          else if (f[i].fd != -1)
            {
              /* If the file was replaced in the small window since we tailed,
                 then assume the watch is on the wrong item (different to
                 that we've already produced output for), and so revert to
                 polling the original descriptor.  */
              struct stat stats;

              if (stat (f[i].name, &stats) == 0
                  && (f[i].dev != stats.st_dev || f[i].ino != stats.st_ino))
                {
                  error (0, errno, _("%s was replaced"),
                         quoteaf (pretty_name (&(f[i]))));
                  return;
                }
            }

          /* check for new data.  */
          check_fspec (&f[i], &prev_fspec);
        }
    }

  evlen += sizeof (struct inotify_event) + 1;
  evbuf = xmalloc (evlen);

  /* Wait for inotify events and handle them.  Events on directories
     ensure that watched files can be re-added when following by name.
     This loop blocks on the 'safe_read' call until a new event is notified.
     But when --pid=P is specified, tail usually waits via poll.  */
  while (true)
    {
      struct File_spec *fspec;
      struct inotify_event *ev;
      void *void_ev;

      /* When following by name without --retry, and the last file has
         been unlinked or renamed-away, diagnose it and return.  */
      if (follow_mode == Follow_name
          && ! reopen_inaccessible_files
          && hash_get_n_entries (wd_to_name) == 0)
        die (EXIT_FAILURE, 0, _("no files remaining"));

      if (len <= evbuf_off)
        {
          /* Poll for inotify events.  When watching a PID, ensure
             that a read from WD will not block indefinitely.
             If MONITOR_OUTPUT, also poll for a broken output pipe.  */

          int file_change;
          struct pollfd pfd[2];
          do
            {
              /* How many ms to wait for changes.  -1 means wait forever.  */
              int delay = -1;

              if (pid)
                {
                  if (writer_is_dead)
                    exit (EXIT_SUCCESS);

                  writer_is_dead = (kill (pid, 0) != 0 && errno != EPERM);

                  if (writer_is_dead || sleep_interval <= 0)
                    delay = 0;
                  else if (sleep_interval < INT_MAX / 1000 - 1)
                    {
                      /* delay = ceil (sleep_interval * 1000), sans libm.  */
                      double ddelay = sleep_interval * 1000;
                      delay = ddelay;
                      delay += delay < ddelay;
                    }
                }

              pfd[0].fd = wd;
              pfd[0].events = POLLIN;
              pfd[1].fd = STDOUT_FILENO;
              pfd[1].events = pfd[1].revents = 0;
              // 读取文件变更事件，当然还是会有超时处理，不然发生意外就不好了
              file_change = poll (pfd, monitor_output + 1, delay);
            }
          while (file_change == 0);

          if (file_change < 0)
            die (EXIT_FAILURE, errno,
                 _("error waiting for inotify and output events"));
          if (pfd[1].revents)
            die_pipe ();

          len = safe_read (wd, evbuf, evlen);
          evbuf_off = 0;

          /* For kernels prior to 2.6.21, read returns 0 when the buffer
             is too small.  */
          if ((len == 0 || (len == SAFE_READ_ERROR && errno == EINVAL))
              && max_realloc--)
            {
              len = 0;
              evlen *= 2;
              evbuf = xrealloc (evbuf, evlen);
              continue;
            }

          if (len == 0 || len == SAFE_READ_ERROR)
            die (EXIT_FAILURE, errno, _("error reading inotify event"));
        }

      void_ev = evbuf + evbuf_off;
      ev = void_ev;
      evbuf_off += sizeof (*ev) + ev->len;

      /* If a directory is deleted, IN_DELETE_SELF is emitted
         with ev->name of length 0.
         We need to catch it, otherwise it would wait forever,
         as wd for directory becomes inactive. Revert to polling now.   */
      if ((ev->mask & IN_DELETE_SELF) && ! ev->len)
        {
          for (i = 0; i < n_files; i++)
            {
              if (ev->wd == f[i].parent_wd)
                {
                  error (0, 0,
                      _("directory containing watched file was removed"));
                  return;
                }
            }
        }
      // 遍历找出变化的文件
      if (ev->len) /* event on ev->name in watched directory.  */
        {
          size_t j;
          for (j = 0; j < n_files; j++)
            {
              /* With N=hundreds of frequently-changing files, this O(N^2)
                 process might be a problem.  FIXME: use a hash table?  */
              if (f[j].parent_wd == ev->wd
                  && STREQ (ev->name, f[j].name + f[j].basename_start))
                break;
            }

          /* It is not a watched file.  */
          if (j == n_files)
            continue;

          fspec = &(f[j]);

          int new_wd = -1;
          bool deleting = !! (ev->mask & IN_DELETE);

          if (! deleting)
            {
              /* Adding the same inode again will look up any existing wd.  */
              new_wd = inotify_add_watch (wd, f[j].name, inotify_wd_mask);
            }

          if (! deleting && new_wd < 0)
            {
              if (errno == ENOSPC || errno == ENOMEM)
                {
                  error (0, 0, _("inotify resources exhausted"));
                  return; /* revert to polling.  */
                }
              else
                {
                  /* Can get ENOENT for a dangling symlink for example.  */
                  error (0, errno, _("cannot watch %s"), quoteaf (f[j].name));
                }
              /* We'll continue below after removing the existing watch.  */
            }

          /* This will be false if only attributes of file change.  */
          bool new_watch;
          new_watch = (! deleting) && (fspec->wd < 0 || new_wd != fspec->wd);

          if (new_watch)
            {
              if (0 <= fspec->wd)
                {
                  inotify_rm_watch (wd, fspec->wd);
                  hash_remove (wd_to_name, fspec);
                }

              fspec->wd = new_wd;

              if (new_wd == -1)
                continue;

              /* If the file was moved then inotify will use the source file wd
                for the destination file.  Make sure the key is not present in
                the table.  */
              struct File_spec *prev = hash_remove (wd_to_name, fspec);
              if (prev && prev != fspec)
                {
                  if (follow_mode == Follow_name)
                    recheck (prev, false);
                  prev->wd = -1;
                  close_fd (prev->fd, pretty_name (prev));
                }

              if (hash_insert (wd_to_name, fspec) == NULL)
                xalloc_die ();
            }

          if (follow_mode == Follow_name)
            recheck (fspec, false);
        }
      else
        {
          struct File_spec key;
          key.wd = ev->wd;
          fspec = hash_lookup (wd_to_name, &key);
        }

      if (! fspec)
        continue;

      if (ev->mask & (IN_ATTRIB | IN_DELETE | IN_DELETE_SELF | IN_MOVE_SELF))
        {
          /* Note for IN_MOVE_SELF (the file we're watching has
             been clobbered via a rename) we leave the watch
             in place since it may still be part of the set
             of watched names.  */
          if (ev->mask & IN_DELETE_SELF)
            {
              inotify_rm_watch (wd, fspec->wd);
              hash_remove (wd_to_name, fspec);
            }

          /* Note we get IN_ATTRIB for unlink() as st_nlink decrements.
             The usual path is a close() done in recheck() triggers
             an IN_DELETE_SELF event as the inode is removed.
             However sometimes open() will succeed as even though
             st_nlink is decremented, the dentry (cache) is not updated.
             Thus we depend on the IN_DELETE event on the directory
             to trigger processing for the removed file.  */

          recheck (fspec, false);

          continue;
        }
      // 输出变化的内容
      check_fspec (fspec, &prev_fspec);
    }
}

/* Output (new) data for FSPEC->fd.
   PREV_FSPEC records the last File_spec for which we output.  */
static void
check_fspec (struct File_spec *fspec, struct File_spec **prev_fspec)
{
  struct stat stats;
  char const *name;

  if (fspec->fd == -1)
    return;

  name = pretty_name (fspec);

  if (fstat (fspec->fd, &stats) != 0)
    {
      fspec->errnum = errno;
      close_fd (fspec->fd, name);
      fspec->fd = -1;
      return;
    }

  /* XXX: This is only a heuristic, as the file may have also
     been truncated and written to if st_size >= size
     (in which case we ignore new data <= size).
     Though in the inotify case it's more likely we'll get
     separate events for truncate() and write().  */
  if (S_ISREG (fspec->mode) && stats.st_size < fspec->size)
    {
      error (0, 0, _("%s: file truncated"), quotef (name));
      xlseek (fspec->fd, 0, SEEK_SET, name);
      fspec->size = 0;
    }
  else if (S_ISREG (fspec->mode) && stats.st_size == fspec->size
           && timespec_cmp (fspec->mtime, get_stat_mtime (&stats)) == 0)
    return;

  bool want_header = print_headers && (fspec != *prev_fspec);

  uintmax_t bytes_read = dump_remainder (want_header, name, fspec->fd,
                                         COPY_TO_EOF);
  fspec->size += bytes_read;

  if (bytes_read)
    {
      *prev_fspec = fspec;
      if (fflush (stdout) != 0)
        die (EXIT_FAILURE, errno, _("write error"));
    }
}

　　基于通知的方式实现文件跟踪，明显是复杂了许多，首先是注册事件，然后是轮询事件，然后是事件处理。但是这样的实现，针对大量的文件跟踪是很省资源的呢。总之，是一种好的实现方式，算是一劳永逸吧。

　　比如我们的io实现方式就有：阻塞io, select/poll io, 异步io, … 异步总是实现复杂，但是收益也是比较可观的一种方法。

Tags: C C语言 linux tail 协议类原理&故事工具类源码