Linux v6.6.1 - Documentation/trace/rv/runtime-verification.rst

====================
Runtime Verification
====================

Runtime Verification (RV) is a lightweight (yet rigorous) method that
complements classical exhaustive verification techniques (such as *model
checking* and *theorem proving*) with a more practical approach for complex
systems.

Instead of relying on a fine-grained model of a system (e.g., a
re-implementation a instruction level), RV works by analyzing the trace of the
system's actual execution, comparing it against a formal specification of
the system behavior.

The main advantage is that RV can give precise information on the runtime
behavior of the monitored system, without the pitfalls of developing models
that require a re-implementation of the entire system in a modeling language.
Moreover, given an efficient monitoring method, it is possible execute an
*online* verification of a system, enabling the *reaction* for unexpected
events, avoiding, for example, the propagation of a failure on safety-critical
systems.

Runtime Monitors and Reactors
=============================

A monitor is the central part of the runtime verification of a system. The
monitor stands in between the formal specification of the desired (or
undesired) behavior, and the trace of the actual system.

In Linux terms, the runtime verification monitors are encapsulated inside the
*RV monitor* abstraction. A *RV monitor* includes a reference model of the
system, a set of instances of the monitor (per-cpu monitor, per-task monitor,
and so on), and the helper functions that glue the monitor to the system via
trace, as depicted below::

 Linux   +---- RV Monitor ----------------------------------+ Formal
  Realm  |                                                  |  Realm
  +-------------------+     +----------------+     +-----------------+
  |   Linux kernel    |     |     Monitor    |     |     Reference   |
  |     Tracing       |  -> |   Instance(s)  | <-  |       Model     |
  | (instrumentation) |     | (verification) |     | (specification) |
  +-------------------+     +----------------+     +-----------------+
         |                          |                       |
         |                          V                       |
         |                     +----------+                 |
         |                     | Reaction |                 |
         |                     +--+--+--+-+                 |
         |                        |  |  |                   |
         |                        |  |  +-> trace output ?  |
         +------------------------|--|----------------------+
                                  |  +----> panic ?
                                  +-------> <user-specified>

In addition to the verification and monitoring of the system, a monitor can
react to an unexpected event. The forms of reaction can vary from logging the
event occurrence to the enforcement of the correct behavior to the extreme
action of taking a system down to avoid the propagation of a failure.

In Linux terms, a *reactor* is an reaction method available for *RV monitors*.
By default, all monitors should provide a trace output of their actions,
which is already a reaction. In addition, other reactions will be available
so the user can enable them as needed.

For further information about the principles of runtime verification and
RV applied to Linux:

  Bartocci, Ezio, et al. *Introduction to runtime verification.* In: Lectures on
  Runtime Verification. Springer, Cham, 2018. p. 1-33.

  Falcone, Ylies, et al. *A taxonomy for classifying runtime verification tools.*
  In: International Conference on Runtime Verification. Springer, Cham, 2018. p.
  241-262.

  De Oliveira, Daniel Bristot. *Automata-based formal analysis and
  verification of the real-time Linux kernel.* Ph.D. Thesis, 2020.

Online RV monitors
==================

Monitors can be classified as *offline* and *online* monitors. *Offline*
monitor process the traces generated by a system after the events, generally by
reading the trace execution from a permanent storage system. *Online* monitors
process the trace during the execution of the system. Online monitors are said
to be *synchronous* if the processing of an event is attached to the system
execution, blocking the system during the event monitoring. On the other hand,
an *asynchronous* monitor has its execution detached from the system. Each type
of monitor has a set of advantages. For example, *offline* monitors can be
executed on different machines but require operations to save the log to a
file. In contrast, *synchronous online* method can react at the exact moment
a violation occurs.

Another important aspect regarding monitors is the overhead associated with the
event analysis. If the system generates events at a frequency higher than the
monitor's ability to process them in the same system, only the *offline*
methods are viable. On the other hand, if the tracing of the events incurs
on higher overhead than the simple handling of an event by a monitor, then a
*synchronous online* monitors will incur on lower overhead.

Indeed, the research presented in:

  De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo Silva.
  *Efficient formal verification for the Linux kernel.* In: International
  Conference on Software Engineering and Formal Methods. Springer, Cham, 2019.
  p. 315-332.

Shows that for Deterministic Automata models, the synchronous processing of
events in-kernel causes lower overhead than saving the same events to the trace
buffer, not even considering collecting the trace for user-space analysis.
This motivated the development of an in-kernel interface for online monitors.

For further information about modeling of Linux kernel behavior using automata,
see:

  De Oliveira, Daniel B.; De Oliveira, Romulo S.; Cucinotta, Tommaso. *A thread
  synchronization model for the PREEMPT_RT Linux kernel.* Journal of Systems
  Architecture, 2020, 107: 101729.

The user interface
==================

The user interface resembles the tracing interface (on purpose). It is
currently at "/sys/kernel/tracing/rv/".

The following files/folders are currently available:

**available_monitors**

- Reading list the available monitors, one per line

For example::

   # cat available_monitors
   wip
   wwnr

**available_reactors**

- Reading shows the available reactors, one per line.

For example::

   # cat available_reactors
   nop
   panic
   printk

**enabled_monitors**:

- Reading lists the enabled monitors, one per line
- Writing to it enables a given monitor
- Writing a monitor name with a '!' prefix disables it
- Truncating the file disables all enabled monitors

For example::

   # cat enabled_monitors
   # echo wip > enabled_monitors
   # echo wwnr >> enabled_monitors
   # cat enabled_monitors
   wip
   wwnr
   # echo '!wip' >> enabled_monitors
   # cat enabled_monitors
   wwnr
   # echo > enabled_monitors
   # cat enabled_monitors
   #

Note that it is possible to enable more than one monitor concurrently.

**monitoring_on**

This is an on/off general switcher for monitoring. It resembles the
"tracing_on" switcher in the trace interface.

- Writing "0" stops the monitoring
- Writing "1" continues the monitoring
- Reading returns the current status of the monitoring

Note that it does not disable enabled monitors but stop the per-entity
monitors monitoring the events received from the system.

**reacting_on**

- Writing "0" prevents reactions for happening
- Writing "1" enable reactions
- Reading returns the current status of the reaction

**monitors/**

Each monitor will have its own directory inside "monitors/". There the
monitor-specific files will be presented. The "monitors/" directory resembles
the "events" directory on tracefs.

For example::

   # cd monitors/wip/
   # ls
   desc  enable
   # cat desc
   wakeup in preemptive per-cpu testing monitor.
   # cat enable
   0

**monitors/MONITOR/desc**

- Reading shows a description of the monitor *MONITOR*

**monitors/MONITOR/enable**

- Writing "0" disables the *MONITOR*
- Writing "1" enables the *MONITOR*
- Reading return the current status of the *MONITOR*

**monitors/MONITOR/reactors**

- List available reactors, with the select reaction for the given *MONITOR*
  inside "[]". The default one is the nop (no operation) reactor.
- Writing the name of a reactor enables it to the given MONITOR.

For example::

   # cat monitors/wip/reactors
   [nop]
   panic
   printk
   # echo panic > monitors/wip/reactors
   # cat monitors/wip/reactors
   nop
   [panic]
   printk