Difference between revisions of "Source location API"

From Mailutils
Jump to navigationJump to search
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
The source location API is designed to keep track of locations in  
+
The source location API is designed to keep track of locations in source files for diagnostic purposes.  It is especially useful
source files for the diagnostic purposes.  It is especially useful
 
 
in parsers and lexical analyzers.
 
in parsers and lexical analyzers.
  
Line 20: Line 19:
 
</source>
 
</source>
  
Both line and column numbers start at <tt>1</tt>.  The value of
+
Prior to use any other functions discussed in this article, the <tt>mu_locus_point</tt> structure must be ''initialized''. There are two ways of doing so:
<tt>0</tt> means ''information not available''.
 
  
 +
<source lang="C">
 +
  /* Initialization by assignment */
 +
  struct mu_locus_point pt = MU_LOCUS_POINT_INITIALIZER;
 +
</source>
 +
 +
or
 +
 +
<source lang="C">
 +
  /* Dynamic initialization */
 +
  struct mu_locus_point pt;
 +
  mu_locus_point_init (&pt);
 +
</source>
 +
 +
Both ways initialize all members of the structure to 0, which is treated by all diagnostic functions as ''lack of location information''.
 +
Failure to initialize the structure will cause coredumps when trying to work with it.
 +
 +
In general, both line and column numbers start at <tt>1</tt>.  The value of <tt>0</tt> means ''information not available''.
 
The <tt>mu_file</tt> member merits special attention.  In practice,
 
The <tt>mu_file</tt> member merits special attention.  In practice,
 
when writing a lexer, locations are assigned to each token extracted
 
when writing a lexer, locations are assigned to each token extracted
Line 48: Line 63:
 
</source>
 
</source>
  
The similar function
+
The <tt>mu_locus_point_set_file</tt> function returns 0 on success, and a non-zero error description on error<ref>
 
 
<source lang="C">
 
  int mu_locus_point_init (struct mu_locus_point *pt, const char *filename);
 
</source>
 
 
 
calls ''mu_locus_point_set_file'', and sets <tt>mu_line</tt> and
 
<tt>mu_col</tt> members to 0.  It provides a way of dynamically
 
initializing the structure.  A point can also be initialized
 
statically, like that:
 
 
 
<source lang="C">
 
  struct mu_locus_point pt = MU_LOCUS_POINT_INITIALIZER;
 
</source>
 
 
 
This initializes all members to 0, which is treated by all diagnostic
 
functions as ''lack of location information''.
 
 
 
Both <tt>mu_locus_point_set_file</tt> and <tt>mu_locus_point_init</tt>
 
return 0 on success, and a non-zero error description on error<ref>
 
 
Possible error codes are:
 
Possible error codes are:
  
Line 88: Line 84:
 
for the identifier will be freed when <tt>mu_locus_point_deinit</tt>
 
for the identifier will be freed when <tt>mu_locus_point_deinit</tt>
 
has been called exactly so many times as
 
has been called exactly so many times as
<tt>mu_locus_point_set_file</tt> with the same ''filename'' previously
+
<tt>mu_locus_point_set_file</tt> with the same ''filename'' previously.
(including any implicit calls from <tt>mu_locus_point_init</tt>).
 
  
 
To copy information between two points use the following function:
 
To copy information between two points use the following function:
Line 117: Line 112:
 
</source>
 
</source>
  
For dynamic initialization, <tt>mu_locus_point_init</tt> should be
+
or dynamically, as in
called on its <tt>beg</tt> and <tt>end</tt> members.
+
 
 +
<source lang="C">
 +
  struct mu_locus_range lr;
 +
  mu_locus_range_init (&lr);
 +
</source>
  
 
To copy one range to another, the following function is provided:
 
To copy one range to another, the following function is provided:
Line 289: Line 288:
 
    
 
    
 
<source lang="C">
 
<source lang="C">
   struct mu_locus_range range;
+
   struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;
 
   int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
 
   int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
 
   if (rc == 0)
 
   if (rc == 0)
Line 317: Line 316:
 
   struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;
 
   struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;
  
   mu_locus_point_init (&range.beg, "file.in");
+
   mu_locus_point_set_file (&range.beg, "file.in");
 
   range.beg.mu_line = 1;
 
   range.beg.mu_line = 1;
 
   range.beg.mu_col = 10;
 
   range.beg.mu_col = 10;
Line 323: Line 322:
 
   mu_locus_range_deinit (&range);
 
   mu_locus_range_deinit (&range);
 
</source>
 
</source>
 
+
 
 +
==Line Tracker==
 +
 
 +
Line tracker is an auxiliary facility for updating current location during lexical analysis phase.  It is represented by the <tt>mu_linetrack_t</tt> data type.  The tracker is created using the
 +
<tt>mu_linetrack_create</tt> function:
 +
;<source lang="C">int mu_linetrack_create (mu_linetrack_t *ret, char const *file_name, size_t max_lines);</source>
 +
:Arguments:
 +
:;<tt>ret</tt>
 +
::Storage where the pointer to the allocated object will be returned.
 +
:;<tt>file_name</tt>
 +
::Name of the source file.
 +
:;<tt>max_lines</tt>
 +
::Number of recent lines for which the tracker should keep information. Minimum is 2.
 +
 
 +
When the lexical analizer receives a new token, it should call the <tt>mu_linetrack_advance</tt> function:
 +
 
 +
<source lang="C">
 +
void mu_linetrack_advance (mu_linetrack_t trk, struct mu_locus_range *loc, char const *text, size_t leng);
 +
</source>
 +
 
 +
where <tt>text</tt> is the obtained token, <tt>leng</tt> is its length.  The function updates information in <tt>trk</tt> and fills <tt>loc</tt> with the location of the token.
 +
 
 +
During error recovery stage it may become necessary to backtrack the tracker by certain number of input characters.  It is done using the following function:
 +
 
 +
<source lang="C">
 +
int mu_linetrack_retreat (mu_linetrack_t trk, size_t n);
 +
</source>
 +
 
 +
where <tt>n</tt> is the number of recent characters to discard from the tracker.  If there are not enough characters in the tracker history, the function will emit the error message to <tt>mu_strerr</tt> and return <tt>ERANGE</tt>.
 +
 
 +
Current position in tracker can be obtained by the <tt>mu_linetrack_locus</tt> function, defined as follows:
 +
 
 +
<source lang="C">
 +
int mu_linetrack_locus (struct mu_linetrack *trk, struct mu_locus_point *lp);
 +
</source>
  
 
==Notes==
 
==Notes==
 
<references/>
 
<references/>

Latest revision as of 18:40, 15 June 2017

The source location API is designed to keep track of locations in source files for diagnostic purposes. It is especially useful in parsers and lexical analyzers.

The functions and data structures are defined in the header file <mailutils/locus.h>.

Data Structures

The basic data structure is mu_locus_point, which identifies a point (locus) in the source text by the source file name, line and column numbers:

  struct mu_locus_point
  {
    char const *mu_file;        /* Name of the source file */
    unsigned mu_line;           /* Line number */
    unsigned mu_col;            /* Column number */
  };

Prior to use any other functions discussed in this article, the mu_locus_point structure must be initialized. There are two ways of doing so:

  /* Initialization by assignment */
  struct mu_locus_point pt = MU_LOCUS_POINT_INITIALIZER;

or

  /* Dynamic initialization */
  struct mu_locus_point pt;
  mu_locus_point_init (&pt);

Both ways initialize all members of the structure to 0, which is treated by all diagnostic functions as lack of location information. Failure to initialize the structure will cause coredumps when trying to work with it.

In general, both line and column numbers start at 1. The value of 0 means information not available. The mu_file member merits special attention. In practice, when writing a lexer, locations are assigned to each token extracted from the source, so that all tokens returned when processing a given file will have the same mu_file value. Obviously, it would be a waste of memory to keep a newly allocated copy of the same string in each mu_locus_point. Therefore a special interface is provided for storing such identifiers:

  int mu_locus_point_set_file (struct mu_locus_point *pt, const char *filename);

The filename argument is not directly stored in the structure and can be safely disposed of after successful return. It is guaranteed that any two points that's been assigned the filename arguments equal in the sense of strcmp, will have the same pointer stored in their mu_file member. This means that, given two mu_locus_point structures a and b, the following check suffices to ensure they both refer to the same source file:

  if (a.mu_file == b.mu_file) ...

The mu_locus_point_set_file function returns 0 on success, and a non-zero error description on error[1].

To deallocate a point when it is no longer needed, use

  void mu_locus_point_deinit (struct mu_locus_point *pt);

Internally, a reference counter is kept for each value of mu_file used. It is guaranteed that the memory allocated for the identifier will be freed when mu_locus_point_deinit has been called exactly so many times as mu_locus_point_set_file with the same filename previously.

To copy information between two points use the following function:

  int mu_locus_point_copy (struct mu_locus_point *dest, struct mu_locus_point const *src);

It takes care to call mu_locus_deinit on dest, prior to copying data. The return codes are the same as described above.

The following structure indicates a source range:

  struct mu_locus_range
  {
    struct mu_locus_point beg;
    struct mu_locus_point end;
  };

and is preferred for more exact diagnostics. The structure can be initialized statically as follows:

  struct mu_locus_range lr = MU_LOCUS_RANGE_INITIALIZER;

or dynamically, as in

  struct mu_locus_range lr;
  mu_locus_range_init (&lr);

To copy one range to another, the following function is provided:

  int mu_locus_range_copy (struct mu_locus_range *dest, struct mu_locus_range const *src);

A range is deallocated with the following function:

  void mu_locus_range_deinit (struct mu_locus_range *lr);

Formatting

Several functions are provided for formatting locations for the output.

void mu_stream_print_locus_point (mu_stream_t stream, struct mu_locus_point const *lpt);
Formats a point to the output stream. The formatting rules are as follows:
  1. If all three members are not zero, the output format is FILE:LINE.COL, e.g.
    filter.siv:160.15
  2. If mu_col is 0, the format is FILE:LINE, as in:
    filter.siv:160
  3. If only mu_file is set, it is printed.
  4. Finally, if all members are 0, nothing is output.
Notice that in any case no delimiters are output after the formatted point.
void mu_stream_print_locus_range (mu_stream_t stream, struct mu_locus_range const *loc);
Prints a locus range. The rules for printing ranges are:
  1. The beg member is formatted as described above.
  2. If the end member is initialized to nulls, it is omitted.
  3. Otherwise, a dash is printed and the end member is formatted as follows:
    1. If its mu_file differs from that of beg, the point is printed following the same set of rules as above.
    2. Otherwise, if the mu_line differs, it is printed followed by the mu_col, unless it is zero.
    3. Otherwise, only the mu_col is printed (unless it is 0).
This gives the following possible outputs:
filter.sv:160.15-include.sv:2-10
Range spans two source files (e.g. an include statement)
filter.sv:160.15-162.10
Range occupies several lines in the same source file.
filter.sv:160.15-30
Range spans several columns on the same line.
filter.sv:160-180
Range spans several lines (no column information available).
filter.sv:160
Single line in file (no column information available).
filter.sv
Only source name information available.

The following functions are available for formatting complex diagnostic messages:

void mu_stream_vlprintf (mu_stream_t stream, struct mu_locus_range const *loc, char const *fmt, va_list ap);
void mu_stream_lprintf (mu_stream_t stream, struct mu_locus_range const *loc, char const *fmt, ...);
These two functions output locus range loc, followed by a column and a space, then format their arguments according to printf-style format fmt.
void mu_lrange_debug (struct mu_locus_range const *loc, char const *fmt, ...);
Does the same using mu_strerr as output stream.

Log Stream Interface

A locus range can be associated with the log stream. It will be automatically prefixed to each message written to the stream. Any manipulations with the associated locations are done via the ioctl interface, using the MU_IOCTL_LOGSTREAM family.

To associate a locus with the stream, use the MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE opcode:

  struct mu_locus_range range;
  /* initialize the range here ... */
  /* associate it with the standard error stream */
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);

The opcode treats its argument as struct mu_locus_range const * and takes care to properly update identifier reference counts. If the range is no longer needed, it should be deinitialized after the successful set operation using mu_locus_range_deinit call.

A locus associated with the stream can be retrieved using the MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE opcode:

  struct mu_locus_range range;
  
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);

As an example, the following function increments the beginning line number of the range associated with the standard error stream[2]:

int
increase_line (void)
{
  struct mu_locus_range range;
  int rc;
  
  rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
  if (rc)
    {
      range.beg.mu_line++;
      rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);
      mu_locus_range_deinit (&range);
    }
  
  return rc;
}

Deprecated interface

Mailutils version up to 3.2.91-46 [release-3.2-71-g719e64a] (2017-06-14), used the following structure for representing source point location:

  struct mu_locus
  {
    char *mu_file;
    unsigned mu_line;
    unsigned mu_col;
  };

It is mostly equivalent to mu_locus_range_point, except that the mu_file member used to be allocated via malloc(3) or similar function. Two opcodes were available in the MU_IOCTL_LOGSTREAM family for manipulating it: MU_IOCTL_LOGSTREAM_GET_LOCUS, and MU_IOCTL_LOGSTREAM_SET_LOCUS.

This interface is considered deprecated. However, for backward compatibility, the mu_locus structure and the two opcodes are still retained. They will be phased out in version 3.4. Until then, each use of the above mentioned ioctl opcodes will trigger the following warning during compilation:

 warning: 'mu_ioctl_logstream_get_locus_deprecated' is deprecated

At runtime, the following warning will be printed to the mu_strerr upon the very first use of any of them:

the program uses MU_IOCTL_LOGSTREAM_GET_LOCUS, which is deprecated


Authors are strongly urged to change their code and use the new interface instead. The following guidelines should help them in doing so.

First, replace each use of struct mu_locus with struct mu_locus_range.

Then, replace each use of MU_IOCTL_LOGSTREAM_GET_LOCUS with MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE. Use the mu_locus_range_deinit function when the obtained value is no longer needed.

Old code:

  struct mu_locus loc;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS, &loc);
  if (rc == 0)
    {
      /* do something with it */
      free (loc.mu_file);
    }

New code:

  struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_GET_LOCUS_RANGE, &range);
  if (rc == 0)
    {
      /* do something with it */      
      mu_locus_range_deinit (&range);
    }

Similarly, use MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE where you used to have MU_IOCTL_LOGSTREAM_GET_LOCUS.

Old code:

  struct mu_locus loc;

  loc.mu_file = "file.in";
  loc.mu_line = 1;
  loc.mu_col = 10;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS, &loc);

New code:

  struct mu_locus_range range = MU_LOCUS_RANGE_INITIALIZER;

  mu_locus_point_set_file (&range.beg, "file.in");
  range.beg.mu_line = 1;
  range.beg.mu_col = 10;
  int rc = mu_stream_ioctl (mu_strerr, MU_IOCTL_LOGSTREAM, MU_IOCTL_LOGSTREAM_SET_LOCUS_RANGE, &range);
  mu_locus_range_deinit (&range);

Line Tracker

Line tracker is an auxiliary facility for updating current location during lexical analysis phase. It is represented by the mu_linetrack_t data type. The tracker is created using the mu_linetrack_create function:

int mu_linetrack_create (mu_linetrack_t *ret, char const *file_name, size_t max_lines);
Arguments:
ret
Storage where the pointer to the allocated object will be returned.
file_name
Name of the source file.
max_lines
Number of recent lines for which the tracker should keep information. Minimum is 2.

When the lexical analizer receives a new token, it should call the mu_linetrack_advance function:

void mu_linetrack_advance (mu_linetrack_t trk, struct mu_locus_range *loc, char const *text, size_t leng);

where text is the obtained token, leng is its length. The function updates information in trk and fills loc with the location of the token.

During error recovery stage it may become necessary to backtrack the tracker by certain number of input characters. It is done using the following function:

int mu_linetrack_retreat (mu_linetrack_t trk, size_t n);

where n is the number of recent characters to discard from the tracker. If there are not enough characters in the tracker history, the function will emit the error message to mu_strerr and return ERANGE.

Current position in tracker can be obtained by the mu_linetrack_locus function, defined as follows:

int mu_linetrack_locus (struct mu_linetrack *trk, struct mu_locus_point *lp);

Notes

  1. Possible error codes are:
    ENOMEM
    Not enough memory to allocate the resource.
    MU_ERR_OUT_PTR_NULL
    refname argument is NULL.
    MU_ERR_BUFSPACE
    Symbol table is full.
  2. This operation can be done using the MU_IOCTL_LOGSTREAM_ADVANCE_LOCUS_LINE opcode