Thursday, September 15, 2011

An alternative to strtok(3) in C

If you've ever tried to split strings in C you know that strtok() is an abomination, modifying the string passed to it and in general breaking most of the standard patterns for how memory is allocated and handled in reasonable C programs. The obvious solution is to use another programming language, but sometimes that isn't possible. Here's an alternative function strwrd that I give my students for a homework that involves command line parsing, which follows the standard practice of returning strings in caller-allocated storage.
/* find the next word starting at 's', delimited by characters
 * in the string 'delim', and store up to 'len' bytes into *buf
 * returns pointer to immediately after the word, or NULL if done.
 */
char *strwrd(char *s, char *buf, size_t len, char *delim)
{
    s += strspn(s, delim);
    int n = strcspn(s, delim);  /* count the span (spn) of bytes in */
    if (len-1 < n)              /* the complement (c) of *delim */
        n = len-1;
    memcpy(buf, s, n);
    buf[n] = 0;
    s += n;
    return (*s == 0) ? NULL : s;
}
which is used like this:
char line[some_length];
char argv[10][20];
int argc;
for (argc = 0; argc < 10; argc++) {
    line = strwrd(line, argv[argc], sizeof(argv[argc]), " \t");
    if (line == NULL)
        break;
}