Source location API

From Mailutils
Revision as of 10:17, 15 June 2017 by Gray (talk | contribs)
Jump to navigationJump to search

The source location API is designed to keep track of locations in source files for the diagnostic purposes. It is especially useful in parsers and lexical analyzers.

Data Structures

The basic data structure is mu_locus_point, which identifies a point (locus) in the source text by the source file name, line and column numbers:

  struct mu_locus_point
  {
    char const *mu_file;        /* Name of the source file */
    unsigned mu_line;           /* Line number */
    unsigned mu_col;            /* Column number */
  };

Both line and column numbers start at 1. The value of 0 means information not available.

The mu_file member merits special attention. In practice, when writing a lexer, locations are assigned to each token extracted from the source, so that all tokens returned when processing a given file will have the same mu_file value. Obviously, it would be a waste of memory to keep a newly allocated copy of the same string in each mu_locus_point. Therefore a special interface is provided for storing such identifiers:

  int mu_locus_point_set_file (struct mu_locus_point *pt, const char *filename);

The filename argument is not directly stored in the structure and can be safely disposed of after successful return. It is guaranteed that any two points that's been assigned the filename arguments equal in the sense of strcmp, will have the same pointer stored in their mu_file member. This means that, given two mu_locus_point structures a and b, the following check suffices to ensure they both refer to the same source file:

  if (a.mu_file == b.mu_file) ...

The similar function

  int mu_locus_point_init (struct mu_locus_point *pt, const char *filename);

calls mu_locus_point_set_file, and sets mu_line and mu_col members to 0. It provides a way of dynamically initializing the structure. A point can also be initialized statically, like that:

  struct mu_locus_point pt = MU_LOCUS_POINT_INITIALIZER;

This initializes all members to 0, which is treated by all diagnostic functions as lack of location information.

Both mu_locus_point_set_file and mu_locus_point_init return 0 on success, and a non-zero error description on error[1].

To deallocate a point when it is no longer needed, use

  void mu_locus_point_deinit (struct mu_locus_point *pt);

Internally, a reference counter is kept for each value of mu_file used. It is guaranteed that the memory allocated for the identifier will be freed when mu_locus_point_deinit has been called exactly so many times as mu_locus_point_set_file with the same filename previously (including any implicit calls from mu_locus_point_init).

To copy information between two points use the following function:

  int mu_locus_point_copy (struct mu_locus_point *dest, struct mu_locus_point const *src);

It takes care to call mu_locus_deinit on dest, prior to copying data. The return codes are the same as described above.

The following structure indicates a source range:

  struct mu_locus_range
  {
    struct mu_locus_point beg;
    struct mu_locus_point end;
  };

and is preferred for more exact diagnostics. The structure can be initialized statically as follows:

  struct mu_locus_range lr = MU_LOCUS_RANGE_INITIALIZER;

For dynamic initialization, mu_locus_point_init should be called on its beg and end members.

To copy one range to another, the following function is provided:

  int mu_locus_range_copy (struct mu_locus_range *dest, struct mu_locus_range const *src);

A range is deallocated with the following function:

  void mu_locus_range_deinit (struct mu_locus_range *lr);

Formatting

Several functions are provided for formatting locations for the output.

void mu_stream_print_locus_point (mu_stream_t stream, struct mu_locus_point const *lpt);
Formats a point to the output stream. The formatting rules are as follows:
  1. If all three members are not zero, the output format is FILE:LINE.COL, e.g.
    filter.siv:160.15
  2. If mu_col is 0, the format is FILE:LINE, as in:
    filter.siv:160
  3. If only mu_file is set, it is printed.
  4. Finally, if all members are 0, nothing is output.
Notice that in any case no delimiters are output after the formatted point.
void mu_stream_print_locus_range (mu_stream_t stream, struct mu_locus_range const *loc);
Prints a locus range. The rules for printing ranges are:
  1. The beg member is formatted as described above.
  2. If the end member is initialized to nulls, it is omitted.
  3. Otherwise, a dash is printed and the end member is formatted as follows:
    1. If its mu_file differs from that of beg, the point is printed following the same set of rules as above.
    2. Otherwise, if the mu_line differs, it is printed followed by the mu_col, unless it is zero.
    3. Otherwise, only the mu_col is printed (unless it is 0).
This gives the following possible outputs:
filter.sv:160.15-include.sv:2-10
Range spans two source files (e.g. an include statement)
filter.sv:160.15-162.10
Range occupies several lines in the same source file.
filter.sv:160.15-30
Range spans several columns on the same line.
filter.sv:160-180
Range spans several lines (no column information available).
filter.sv:160
Single line in file (no column information available).
filter.sv
Only source name information available.

The following functions are available for formatting complex diagnostic messages:

void mu_stream_vlprintf (mu_stream_t stream, struct mu_locus_range const *loc, char const *fmt, va_list ap);
void mu_stream_lprintf (mu_stream_t stream, struct mu_locus_range const *loc, char const *fmt, ...);
These two functions output locus range loc, followed by a column and a space, then format their arguments according to printf-style format fmt.
void mu_lrange_debug (struct mu_locus_range const *loc, char const *fmt, ...);
Does the same using mu_strerr as output stream.

Log Stream Interface

A locus range can be associated with the log stream. It will be automatically prefixed to each message written to the stream. Any manipulations with the associated locations are done via the ioctl interface, using the MU_IOCTL_LOGSTREAM family.

To associate a locus with the stream, use the MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE opcode:

  struct mu_locus_range range;
  /* initialize the range here ... */
  /* associate it with the standard error stream */
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);

The opcode treats its argument as struct mu_locus_range const * and takes care to properly update identifier reference counts. If the range is no longer needed, it should be deinitialized after the successful set operation using mu_locus_range_deinit call.

A locus associated with the stream can be retrieved using the MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE opcode:

  struct mu_locus_range range;
  
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);

As an example, the following function increments the beginning line number of the range associated with the standard error stream:

int
increase_line (void)
{
  struct mu_locus_range range;
  int rc;
  
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
  if (rc)
    {
      range.beg.mu_line++;
      rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);
      mu_locus_range_deinit (&range);
    }
  
  return rc;
}

Notes

  1. Possible error codes are:
    ENOMEM
    Not enough memory to allocate the resource.
    MU_ERR_OUT_PTR_NULL
    refname argument is NULL.
    MU_ERR_BUFSPACE
    Symbol table is full.