input.c cleanup
With rc-1.7.1 released, I'm starting to look at
wider issues. Top of my list is to improve the way rc interacts with line editing libraries.
Partly this is because I want to support the BSD editline library.
Partly I want to improve the support for readline, which I have a nagging feeling is
still buggy in places. Partly I want to clean up some of the
ugly code in input.c: there are only three
appearances of #if READLINE || EDITLINE, which
isn't too bad, but one of these surrounds part of an if /
else clause in a way that triggers my
yeuch!
detectors.
As a first step, and as part of my longer term goal of improving
the modularization of rc, I have created
input.h, the interface file for the functions in
input.c, now separate from rc.h.
Thus, these functions are only known in those other source files
which actually need them (which is a fairly disparate bunch).
Time permitting, I will eventually remove almost everything from
rc.h.
Now, as well as function interfaces, the new
input.h defines a couple of global variables. One
is rcrc, which is only used to suppress error
messages when builtins.c::b_dot() is called from
input.c::doit()to source the user's
.rcrc file. Although it might seem like good code
reuse to call b_dot() here, I'm unconvinced. It
means that we have to fake up an argv array in
doit(). And b_dot() does things like
pushing an exception to save $*, and making various
checks on whether we're interactive or not: all totally
pointless in this case.
So now, doit() simply opens the .rcrc
file directly (and calls itself to parse it). This, in my
opinion, tidies up both doit() and
b_dot() slightly, as well as removing the global
variable. As a bonus, I changed the code so that only
ENOENT is silently ignored. If, for example, the
user has no read permission on their .rcrc, they
now get an error message.
Anyone who can find a use for the newly available
$* in .rcrc will receive the dubious
distinction of it being detailed here!
On further reflection, this code doesn't belong in
input.c anyway. Unless there's something subtle
going on that I'm missing, it makes more sense in
main.c. This also eliminates another global
variable - dashell.
2003-09-00
rc-1.7.1 released2003-07-17
history bug fixesSuch a small program; so many bugs.
Callum Gibson sent a patch to fix a core dump in
his previous patch for stuttering
colons (the core dump was
triggered by specifying more colons than there are possible
substitutions).
While testing that patch, I came across the following
fork() bomb.
; > $history
; -
The code in history that skips prior invocations of the
history program itself works most of the time, but if all the
entries in the history file are history itself, it
returns the first one anyway.
While fixing that, I noticed the horrid code in
history.c::getcommand(), which would merrily read and
write the character before the allocated hist buffer.
The least disruptive change I could make was to add a sentinel
,
so these accesses are now valid. (It's a rather unusual sentinel: its
value is immaterial, just its location is used.)
2002-11-27
sigaction() tidy up
Jeremy Fitzhardinge pointed out a couple of
errors around sigaction() that were reported by the remarkable
valgrind.
sigaction() for
SIGKILL or SIGSTOP, which was happening in
signal.c::initsignal(). Although real Unices silently
ignore this, we're trying to be correct, right? Now, is anybody
compiling rc anywhere that lacks
SIGSTOP? Or even SIGKILL? I doubt it, but
I wrapped them in #ifdef anyway. Hmm... hmm... this is
pretty ugly. Maybe mksignal, which checks if these
signals exist anyway, should flag them as not being installable?
sigaction() code, I omitted to set the
sa_mask field of the handler to be installed. Should we
mask everything, or nothing? For now, I've taken what seems to be the
safer route of masking everything,
i.e. sigfillset(&new.sa_mask).
While looking at signal.c, and SUSv3, I noticed the curious use of
SA_INTERRUPT. Notwithstanding that I wrote this code, I
actually have no idea where SA_INTERRUPT comes from: both
SUSv3 and BSD have
SA_RESTART instead (with the inverted semantics), and Linux comments SA_INTERRUPT as a
historical no-op
. Memo to self: must smoke better drugs.
Since the configury checked for both sigaction() and
SA_INTERRUPT before enabling the sigaction()
code, we were using the old signal(), and the expensive
test for restartable syscalls, and the grody code to make syscalls not
restart, in many places where they were not necessary. For example,
on NetBSD/i386, the development version of rc runs its configure script in 3.3s as opposed to
9.3s for rc-1.7. In addition, the binary is 175 bytes
smaller. Oh, and the configure script itself is over 1k
smaller.
2002-08-15
I actually noticed this bug before the release of rc-1.7,
but then forgot about it again. Darn.
Certain glob patterns fail to match dangling symlinks. (A dangling symlink is one whose target doesn't exist, or is inaccessible.) More specifically, if the dangling symlink's name matches a fixed part of the pattern, it is incorrectly ignored. For example:
; mkdir qux
; ln -s /nonesuch qux/foo
; echo qux/foo*
qux/foo
; echo qux*/foo
qux*/foo
The bug was, needless to say, the use of stat() instead
of lstat() in glob.c::dmatch(), so this
really was a one-character fix! I also added a test to
trip.rc, and configury to #define away the
lstat() call into a plain ol' stat() on
systems that lack lstat(). (It's 14 years since I used a
system without lstat(), and that didn't have anything
resembling an ANSI C compiler, but I realise I've had a somewhat
sheltered upbringing.)
You might wonder about the other call to stat() in
glob.c::dmatch(). This is correct, because at this point
we want to know whether the name in question is a directory or not.
You might also wonder about the reason for the inconsistency shown
above (i.e. qux/foo* works, but qux*/foo
doesn't). It's fairly simple: globbing takes each component of the
path name in turn. If the component we are looking at contains a glob
metacharacter (e.g. foo*) then we readdir()
the relevant directory, and pull out any matches. This works, even
for dangling symlinks. On the other hand, if the component we are
looking at contains no metacharacters (e.g. foo), then we
generate the complete pathname and lstat() it.
2002-08-14
rc-1.7 released2002-06-18
version tidy upErik Quanstrom brought this up; Paul Haahr and Carlo Strozzi made useful comments.
I'd made the version variable far too magical. I'd added
it to var.c::varlookup(), next to apids and
status. It was also marked not for export
in
hash.c::var_exportable().
Initially I planned simply to give it a default value on startup (in
main.c::main()), but this means that any descendant rc processes inherit the value, which is not useful.
So, there's a new array of variable names and flags (struct
nameflag maybeexport[] in hash.c). It contains
entries only for version and prompt (which
seemed like it would benefit from the same treatment). The flag is
set to FALSE in main.c::assigndefault(), and
to TRUE in var.c::varassign().
While I was looking at this, I noticed that bqstatus and
status were exportable. I changed them to be not exportable.
2002-04-03
limit code tidy up
Thanks to Chris Siebenmann for pointing out the
flaws in the old code. The fundamental problem was that
builtins.c::parselimit() used negative values for error
returns, but on some systems RLIM_INFINITY is negative
(e.g. -1 on Linux; -3 on
Solaris.) The result is that on these systems,
any attempt to set a resource to
unlimited fails:
; limit stacksize unlimited
bad limit
The parselimit() code has other flaws. For example, an
attempt to parse futhark results in a return value of
-1024; parsing boring gives
-1073741824. (In each case, the final character is taken
as a multiplier.) All of these are considered bad
limits. And 1:fifteen parses as 59!
Clearly, overloading the return value from parselimit()
was a mistake. I followed the obvious route, changing
parselimit() into a predicate that returns
TRUE or FALSE according to whether it could
parse its argument or not, with the result of the parse returned in a
pointer argument. Along the way I noticed that the return value of
parselimit() is stuffed into an int
variable. This error doubtless goes back to my autoconfiscation of
rc, when I introduced the use of
rlim_t.
I also taught rc about some new limits.
RLIMIT_AS is the SUSv2 spelling of
RLIMIT_VMEM; in Solaris, they are
synonymous (despite being documented separately). Linux has both RLIMIT_AS and
RLIMIT_RSS, which apparently do have subtly different
meanings. Linux also adds
RLIMIT_NPROC, RLIMIT_MEMLOCK, and
RLIMIT_LOCKS.
There are no standard English names for the newer limits. Here's a
table which shows what I used, and compares it to several other
popular shells. N/A
means that the limit is not available in
this operating system; unimp
means that the limit is not
implemented in this shell.
| macro | rc | Linux csh | NetBSD csh | Solaris csh | bash |
|---|---|---|---|---|---|
| CPU | cputime | -t | |||
| FSIZE | filesize | -f | |||
| DATA | datasize | -d | |||
| STACK | stacksize | -s | |||
| CORE | coredumpsize | -c | |||
| NOFILE | descriptors | openfiles | descriptors | -n | |
| AS (or VMEM) | memoryuse | unimp | N/A | memorysize | -v |
| RSS | memoryrss | memoryuse | N/A | -m | |
| NPROC | maxproc | N/A | -u | ||
| MEMLOCK | memorylocked | N/A | -l | ||
| LOCKS | filelocks | unimp | N/A | N/A | unimp |
So, at the time of writing, rc has one feature that bash doesn't!
2001-11-23
Thanks to Scott Schwartz for pointing out the
need for this, and Chris Siebenmann for helping
me with the implementation. Seems to be trivial, using the
AC_SYS_LARGEFILE macro available in recent versions of
autoconf.
There was just one wrinkle: under Linux at least,
it is vital to define _FILE_OFFSET_BITS before including
(any header file which includes) features.h.
At one point, I had the moral equivalent of this, broken, code.
#include <assert.h>
#define _FILE_OFFSET_BITS 64
#include <fcntl.h>
2001-10-25
fn prompt tailspinAmazingly, I'm not aware of this having been reported as a bug before. I came across it while doing something completely different.
If there is a semantic error in fn prompt,
rc goes into a tailspin, and must be killed. For
example:
; fn prompt { echo (a b)^(c d e) }
bad concatenation
bad concatenation
... [ ad infinitum ]
A semantic error is anything that throws an eError
exception. Considerably more insidious cases can be devised: fn
prompt { echo $a^$b } may work fine till a and
b take on unfortunate values.
My initial fix was to push a new exception on the stack before calling
fn prompt, but on reflection I realised that a simple
flag variable achieves the same goal.
2001-10-05
Again, this isn't exactly a bug, more of a misfeature.
; fn x { echo (a b c d e) }
; whatis x
fn x {echo ((((a b) c) d) e)}
This was easy to fix. Simply keep a state variable inside
footobar.c::Tconv() so that we only print the parentheses
once.
You might be wondering why I chose a static variable
here, but introduced the alternate form
for the very similar
case of quoting within variable names. To be honest, I didn't realise
the similarity of the two cases at the time, and just coded them with
whatever came to hand. Ex post facto, it seems to me that
there is an important difference: in this case, only the
handling of nLappend is affected, and the scope of the
inlist variable can thus be made extremely tight. In the
other case, there is an interaction between nVar and
nWord processing, so a state variable would need to have
the function scope of dollar anyway.
2001-10-03
On output (Tconv()), rc would quote a
string if and only if it was quoted on input. Wolfgang Zekoll discovered that it's possible to
sneak a quote-worthy string in by other means: specifically, by
surrounding a variable name with parentheses. (I suspect there may be
other ways, but I haven't discovered any.)
; (a.b) = foo
; fn x { echo $(a.b) }
; x
foo
; whatis x
fn x {echo $a.b}
Note that the last line is wrong: on input this is parsed as
echo $a^.b. This means that the function x
doesn't work in descendant rc processes.
The fix is in two parts. First, rc now scans
strings for metacharacters to decide if they need quoting or not.
This is handled by quotep() (which I put in
lex.c, since it accesses nw[] and
dnw[] which, I claim, should be static to
that file—at present they're not, but maybe one day.) This is
made uglier by the different quoting rules needed for variable names
and other words, which is all down to free carets. I'm afraid I
introduced an alternate form
for Tconv()
(i.e. fmtprint(... "%#T" ...)) which turns on the
variable name quoting rules.
Now there is almost no need for the distinction between the
Node types nWord and nQword, so
I abolished it. (Note that in 7 out of 9 places, they were treated
identically. The eighth was in Tconv(), which was fixed
as described above. The ninth we shall come to shortly.)
Removing this distinction tidied up the lexer and parser slightly:
previously, the lexer returned a quoted string as 'foo,
which resulted in an unnatural do... while loop in the
lexer, and a mysterious +1 in the parser. Unfortunately
(here comes the ninth case) it also broke here documents:
<<'EOF' is a very different thing from
<<EOF. Darn.
So I added a q field to struct Word to hold
this information. The lexer sets q if the string was
quoted; the only place it is tested is in
heredoc.c::qdoc().
And that's it. On reflection, the bug could probably have been fixed
simply by resetting dollar when the lexer hits
(. However, I think the new code is cleaner (and the
compiler agrees: the binary is slightly smaller). It also means that
rc now uses minimal quoting
on output:
; fn x { echo 'foo' }
; whatis x
fn x {echo foo}
Earlier versions of rc would have maintained the
quotes on 'foo'.
Incidentally, the hilarious quoting bug
, mentioned in
trip.rc comes from here. In tree.c::mk(), I
added the new member to nWord, but omitted to update the
size of it. Hilarity ensues: on my development machine, the result
was that alternate members of a list were quoted:
; x=('#' '#' '#')
; whatis x
x=('#' # '#')
I don't suppose this particular bug will ever occur again, but if it does, the new test will catch it.
2001-10-02
(Actually it's a misfeature, not a bug.)
This is relatively straightforward. I stole Wconv() from
es to output a list with ^A and
^B quoting. The functions are short enough that I don't
think it would be worth trying to make Lconv() more
generic, to handle all possible cases of outputting lists.
I rewrote footobar.c::parse_var() completely, removing
the Hacky routine... [that] scans backwards
. The new code is
straightforward, and I don't believe it's any less efficient than the
old: both basically examine the string twice.
2001-09-27