Source location API

From Mailutils
Revision as of 18:40, 15 June 2017 by Gray (talk | contribs) (→‎Deprecated interface)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

The source location API is designed to keep track of locations in source files for diagnostic purposes. It is especially useful in parsers and lexical analyzers.

The functions and data structures are defined in the header file <mailutils/locus.h>.

Data Structures

The basic data structure is mu_locus_point, which identifies a point (locus) in the source text by the source file name, line and column numbers:

  struct mu_locus_point
  {
    char const *mu_file;        /* Name of the source file */
    unsigned mu_line;           /* Line number */
    unsigned mu_col;            /* Column number */
  };

Prior to use any other functions discussed in this article, the mu_locus_point structure must be initialized. There are two ways of doing so:

  /* Initialization by assignment */
  struct mu_locus_point pt = MU_LOCUS_POINT_INITIALIZER;

or

  /* Dynamic initialization */
  struct mu_locus_point pt;
  mu_locus_point_init (&pt);

Both ways initialize all members of the structure to 0, which is treated by all diagnostic functions as lack of location information. Failure to initialize the structure will cause coredumps when trying to work with it.

In general, both line and column numbers start at 1. The value of 0 means information not available. The mu_file member merits special attention. In practice, when writing a lexer, locations are assigned to each token extracted from the source, so that all tokens returned when processing a given file will have the same mu_file value. Obviously, it would be a waste of memory to keep a newly allocated copy of the same string in each mu_locus_point. Therefore a special interface is provided for storing such identifiers:

  int mu_locus_point_set_file (struct mu_locus_point *pt, const char *filename);

The filename argument is not directly stored in the structure and can be safely disposed of after successful return. It is guaranteed that any two points that's been assigned the filename arguments equal in the sense of strcmp, will have the same pointer stored in their mu_file member. This means that, given two mu_locus_point structures a and b, the following check suffices to ensure they both refer to the same source file:

  if (a.mu_file == b.mu_file) ...

The mu_locus_point_set_file function returns 0 on success, and a non-zero error description on error[1].

To deallocate a point when it is no longer needed, use

  void mu_locus_point_deinit (struct mu_locus_point *pt);

Internally, a reference counter is kept for each value of mu_file used. It is guaranteed that the memory allocated for the identifier will be freed when mu_locus_point_deinit has been called exactly so many times as mu_locus_point_set_file with the same filename previously.

To copy information between two points use the following function:

  int mu_locus_point_copy (struct mu_locus_point *dest, struct mu_locus_point const *src);

It takes care to call mu_locus_deinit on dest, prior to copying data. The return codes are the same as described above.

The following structure indicates a source range:

  struct mu_locus_range
  {
    struct mu_locus_point beg;
    struct mu_locus_point end;
  };

and is preferred for more exact diagnostics. The structure can be initialized statically as follows:

  struct mu_locus_range lr = MU_LOCUS_RANGE_INITIALIZER;

or dynamically, as in

  struct mu_locus_range lr;
  mu_locus_range_init (&lr);

To copy one range to another, the following function is provided:

  int mu_locus_range_copy (struct mu_locus_range *dest, struct mu_locus_range const *src);

A range is deallocated with the following function:

  void mu_locus_range_deinit (struct mu_locus_range *lr);

Formatting

Several functions are provided for formatting locations for the output.

void mu_stream_print_locus_point (mu_stream_t stream, struct mu_locus_point const *lpt);
Formats a point to the output stream. The formatting rules are as follows:
  1. If all three members are not zero, the output format is FILE:LINE.COL, e.g.
    filter.siv:160.15
  2. If mu_col is 0, the format is FILE:LINE, as in:
    filter.siv:160
  3. If only mu_file is set, it is printed.
  4. Finally, if all members are 0, nothing is output.
Notice that in any case no delimiters are output after the formatted point.
void mu_stream_print_locus_range (mu_stream_t stream, struct mu_locus_range const *loc);
Prints a locus range. The rules for printing ranges are:
  1. The beg member is formatted as described above.
  2. If the end member is initialized to nulls, it is omitted.
  3. Otherwise, a dash is printed and the end member is formatted as follows:
    1. If its mu_file differs from that of beg, the point is printed following the same set of rules as above.
    2. Otherwise, if the mu_line differs, it is printed followed by the mu_col, unless it is zero.
    3. Otherwise, only the mu_col is printed (unless it is 0).
This gives the following possible outputs:
filter.sv:160.15-include.sv:2-10
Range spans two source files (e.g. an include statement)
filter.sv:160.15-162.10
Range occupies several lines in the same source file.
filter.sv:160.15-30
Range spans several columns on the same line.
filter.sv:160-180
Range spans several lines (no column information available).
filter.sv:160
Single line in file (no column information available).
filter.sv
Only source name information available.

The following functions are available for formatting complex diagnostic messages:

void mu_stream_vlprintf (mu_stream_t stream, struct mu_locus_range const *loc, char const *fmt, va_list ap);
void mu_stream_lprintf (mu_stream_t stream, struct mu_locus_range const *loc, char const *fmt, ...);
These two functions output locus range loc, followed by a column and a space, then format their arguments according to printf-style format fmt.
void mu_lrange_debug (struct mu_locus_range const *loc, char const *fmt, ...);
Does the same using mu_strerr as output stream.

Log Stream Interface

A locus range can be associated with the log stream. It will be automatically prefixed to each message written to the stream. Any manipulations with the associated locations are done via the ioctl interface, using the MU_IOCTL_LOGSTREAM family.

To associate a locus with the stream, use the MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE opcode:

  struct mu_locus_range range;
  /* initialize the range here ... */
  /* associate it with the standard error stream */
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);

The opcode treats its argument as struct mu_locus_range const * and takes care to properly update identifier reference counts. If the range is no longer needed, it should be deinitialized after the successful set operation using mu_locus_range_deinit call.

A locus associated with the stream can be retrieved using the MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE opcode:

  struct mu_locus_range range;
  
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);

As an example, the following function increments the beginning line number of the range associated with the standard error stream[2]:

int
increase_line (void)
{
  struct mu_locus_range range;
  int rc;
  
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
  if (rc)
    {
      range.beg.mu_line++;
      rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);
      mu_locus_range_deinit (&range);
    }
  
  return rc;
}

Deprecated interface

Mailutils version up to 3.2.91-46 [release-3.2-71-g719e64a] (2017-06-14), used the following structure for representing source point location:

  struct mu_locus
  {
    char *mu_file;
    unsigned mu_line;
    unsigned mu_col;
  };

It is mostly equivalent to mu_locus_range_point, except that the mu_file member used to be allocated via malloc(3) or similar function. Two opcodes were available in the MU_IOCTL_LOGSTREAM family for manipulating it: MU_IOCTL_LOGSTREAM_GET_LOCUS, and MU_IOCTL_LOGSTREAM_SET_LOCUS.

This interface is considered deprecated. However, for backward compatibility, the mu_locus structure and the two opcodes are still retained. They will be phased out in version 3.4. Until then, each use of the above mentioned ioctl opcodes will trigger the following warning during compilation:

 warning: 'mu_ioctl_logstream_get_locus_deprecated' is deprecated

At runtime, the following warning will be printed to the mu_strerr upon the very first use of any of them:

the program uses MU_IOCTL_LOGSTREAM_GET_LOCUS, which is deprecated


Authors are strongly urged to change their code and use the new interface instead. The following guidelines should help them in doing so.

First, replace each use of struct mu_locus with struct mu_locus_range.

Then, replace each use of MU_IOCTL_LOGSTREAM_GET_LOCUS with MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE. Use the mu_locus_range_deinit function when the obtained value is no longer needed.

Old code:

  struct mu_locus loc;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS, &loc);
  if (rc == 0)
    {
      /* do something with it */
      free (loc.mu_file);
    }

New code:

  struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
  if (rc == 0)
    {
      /* do something with it */      
      mu_locus_range_deinit (&range);
    }

Similarly, use MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE where you used to have MU_IOCTL_LOGSTREAM_GET_LOCUS.

Old code:

  struct mu_locus loc;

  loc.mu_file = "file.in";
  loc.mu_line = 1;
  loc.mu_col = 10;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS, &loc);

New code:

  struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;

  mu_locus_point_set_file (&range.beg, "file.in");
  range.beg.mu_line = 1;
  range.beg.mu_col = 10;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);
  mu_locus_range_deinit (&range);

Line Tracker

Line tracker is an auxiliary facility for updating current location during lexical analysis phase. It is represented by the mu_linetrack_t data type. The tracker is created using the mu_linetrack_create function:

int mu_linetrack_create (mu_linetrack_t *ret, char const *file_name, size_t max_lines);
Arguments:
ret
Storage where the pointer to the allocated object will be returned.
file_name
Name of the source file.
max_lines
Number of recent lines for which the tracker should keep information. Minimum is 2.

When the lexical analizer receives a new token, it should call the mu_linetrack_advance function:

void mu_linetrack_advance (mu_linetrack_t trk, struct mu_locus_range *loc, char const *text, size_t leng);

where text is the obtained token, leng is its length. The function updates information in trk and fills loc with the location of the token.

During error recovery stage it may become necessary to backtrack the tracker by certain number of input characters. It is done using the following function:

int mu_linetrack_retreat (mu_linetrack_t trk, size_t n);

where n is the number of recent characters to discard from the tracker. If there are not enough characters in the tracker history, the function will emit the error message to mu_strerr and return ERANGE.

Current position in tracker can be obtained by the mu_linetrack_locus function, defined as follows:

int mu_linetrack_locus (struct mu_linetrack *trk, struct mu_locus_point *lp);

Notes

  1. Possible error codes are:
    ENOMEM
    Not enough memory to allocate the resource.
    MU_ERR_OUT_PTR_NULL
    refname argument is NULL.
    MU_ERR_BUFSPACE
    Symbol table is full.
  2. This operation can be done using the MU_IOCTL_LOGSTREAM_ADVANCE_LOCUS_LINE opcode