Toybox: A simpler, cleaner way to write tools...

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
Ibidem
Posts: 549
Joined: Wed 26 May 2010, 03:31
Location: State of Jefferson

Toybox: A simpler, cleaner way to write tools...

#1 Post by Ibidem »

There have been a few references to Rob Landley's toybox project, but no threads dedicated to it and no explanations of contributing or writing applets ("toys" in toybox terminology). So I thought I'd start one.
If you want a more clear explanation of what toybox is than "busybox, done right", please visit the homepage that I linked. This post is about the how.

Tree layout:
generated/: Files that the build system creates; all can be removed.
generated/help.h needs Python to build, so it's included in the tarball; if you add a new toy you need to disable help or have python 2.
lib/: all the toybox common code. This provides functions that are shared by a number of toys; do not add functions to it unless they are used by at least two toys.
kconfig/: Build system (toybox uses kconfig, like the Linux kernel and busybox)
scripts/: Build and convenience scripts, including test.sh.
scripts/test/: where test scripts live.
toys/: Individual commands. The available commands are picked up from the four subdirectories:
toys/posix/*.c: Implementations of the SUSv4/POSIX{2008,2013} command line tools.
toys/lsb/*.c: Implementations of LSB command line tools (md5sum, sha1sum, mknod, dmesg,...)
toys/other/*.c: Implementations of other command line tools. While there is no standard for these, they are widely used on Linux systems and may be needed for a regular system to boot or for many packages to build.
toys/pending/*.c: Not quite finished code. Generally speaking, this may be (a) placeholder skeletons, (b) toys that work but need cleanup, or (c) toys that do part of what they should do.

General features of a new toy:
The official example is http://landley.net/hg/toybox/file/tip/t ... er/hello.c; I'm going to try to explain that.
A toy must start with a multi-line comment in this general fashion:

Code: Select all

/* hello.c - A hello world program. (Template for new commands.)
 *
 * Copyright 2012 Rob Landley <rob@landley.net>
 *
 * See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/
 * See http://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/cmdbehav.html

// Accept many different kinds of command line argument:

USE_HELLO(NEWTOY(hello, "(walrus)(blubber):;(also):e@d*c#b:a", TOYFLAG_USR|TOYFLAG_BIN))

config HELLO
  bool "hello"
  default n
  help
    usage: hello [-a] [-b string] [-c number] [-d list] [-e count] [...]

    A hello world program.  You don't need this.

    Mostly used as an example/skeleton file for adding new commands,
    occasionally nice to test kernel booting via "init=/bin/hello".
*/
Things to note here:
The "See http://... should link to a standard for the utility you are implementing.
If there is no standard in POSIX 2008/2013 nor in LSB 4.1, write "No standard." Links to a manpage or text description of a file format are sometimes provided after that.
The build system looks at everything from the first line not starting with " *" to the first line to begin with "*/".
Note that the usual comment close with a space before it won't work.
(You will get truly spectacular build failures that way...)

The name in USE_... must match the name following "config", and the first argument to NEWTOY is the name of the command.

The second argument to NEWTOY() is the option string-it tells what options/arguments to parse and how.
"(string)" means that the long option "--string" is supported.
"b:" means that -b requires a string:
-b blah
"c#" requires a number, in decimal, octal (leading 0), or hexadecimal (leading 0x):
-c1 or -c=07 or -c 0xDEADBEEF
"e@" means that the number of occurences of -e are counted.
"d*" stores multiple arguments (as a linked list):
hello -d abc -d def
saves both "abc" and "def"

Everything after "help" is used as both the kconfig help and the command help (toybox command --help / toybox help command)

Code: Select all

#define FOR_hello
#include "toys.h"

// Hello doesn't use these globals, they're here for example/skeleton purposes.

GLOBALS(
  char *b_string;
  long c_number;
  struct arg_list *d_list;
  long e_count;
  char *also_string;
  char *blubber_string;

  int more_globals;
)
GLOBALS() is a macro that expands to a struct specific to the applet.
However, you don't need to worry about the name of the struct:
it is #define'd to be "TT." so here you can use "TT.blubber_string" to access the string passed with --blubber.
The count from "e@" is stored in TT.e_count, and so on.

Now, perhaps you're wondering about the order.
Whenever you have a modifier that saves some information (:@#*), it gets saved in the GLOBALS().
The order in the "option string" is the reverse of the order in GLOBALS();
the first variable in GLOBALS() is always the last modifier in the option string.
: always maps to char *; @ and # are of type long. (option)* maps to a pointer to a linked list (struct arg_list).

After accounting for everything in the option string in GLOBALS(), you can add pointers or non-array types to the end of it. If you need global storage, this is what you are expected to use.

The rest of the arguments are broken up as follows:
toys.optargs: a char ** (basically, the part of argv that isn't flags, options, and similar).
toys.optc: the number of arguments in toys.optargs
toys.optflags: for every option or long option in the option string, there is a bit set in this.
You can check what options were passed by doing a binary AND of this and the automatically generated FLAG_ macros. In the example these are the values.

Code: Select all

FLAG_walrus = 1
FLAG_blubber = 2
FLAG_also = 4
FLAG_e = 8
FLAG_d = 16
FLAG_c = 32
FLAG_b = 64
FLAG_a = 128
(toys.optflags & FLAG_e) will be either 0 (no -e) or 8 (at least one -e).

That's the argument parsing part of toybox. Ask questions if you find it unclear.
After this I'd like to start covering the convenience and wrapper functions.
Convenience funtions fall into several categories; the main ones are loopfiles, dirtree, and llist/dlist. I'll start with dirtree first (probably using lspci or acpi as an example?).

Ibidem
Posts: 549
Joined: Wed 26 May 2010, 03:31
Location: State of Jefferson

#2 Post by Ibidem »

As an example, here's an explanation of the acpi toy.

Code: Select all

/* acpi.c - show power state
 *
 * Written by Isaac Dunham, 2013
 *
 * No standard.

USE_ACPI(NEWTOY(acpi, "ab", TOYFLAG_USR|TOYFLAG_BIN))

config ACPI
  bool "acpi"
  default y
  help
    usage: acpi [-ab]
    
    Show status of power sources.

    -a	show power adapters
    -b	show batteries
*/

#define FOR_acpi
#include "toys.h"

GLOBALS(
  int ac;
  int bat;
)
This has been cleaned up already, so it's default "y".
The GLOBALS are for counting power sources.

Code: Select all

int read_int_at(int dirfd, char *name)
{
  int fd, ret=0;
  FILE *fil;

  if ((fd = openat(dirfd, name, O_RDONLY)) < 0) return -1;
  fscanf(fil = xfdopen(fd, "r"), "%d", &ret);
  fclose(fil);

  return ret;
}
This function is a simple way of getting an int from a sysfs file.
I use openat() rather than open() because it's rather awkward
to keep making string copies and changing names, but keeping track
of an int is simple.
Then, since fscanf is the simplest way to read an int, I fdopen() the
file descriptor to get a FILE pointer...
OK, I used xfdopen(). What's the difference?
The x...() functions are "die on error" wrappers.
fdopen() fails if malloc() fails, which results in a null pointer.
But if malloc() fails, you need to free memory, and many programs won't
be able to continue. So we just bail here.

Before we continue, let me show you part of the dirtree-related header entries.

Code: Select all

// dirtree.c

// Values returnable from callback function (bitfield, or them together)
// Default with no callback is 0

// Add this node to the tree
#define DIRTREE_SAVE         1
// Recurse into children
#define DIRTREE_RECURSE      2
// Call again after handling all children of this directory
// (Ignored for non-directories, sets linklen = -1 before second call.)
#define DIRTREE_COMEAGAIN    4
// Follow symlinks to directories
#define DIRTREE_SYMFOLLOW    8
// Don't look at any more files in this directory.
#define DIRTREE_ABORT      256

#define DIRTREE_ABORTVAL ((struct dirtree *)1)

struct dirtree {
  struct dirtree *next, *parent, *child;
  long extra; // place for user to store their stuff (can be pointer)
  struct stat st;
  char *symlink;
  int data;  // dirfd for directory, linklen for symlink, -1 = comeagain
  char name[];
};

Now, we get to the interesting part.
This is the callback function.

Code: Select all

int acpi_callback(struct dirtree *tree)
{
  int dfd;

  errno = 0;

  if (tree->name[0]=='.') return 0;

  if (strlen(dirtree_path(tree, NULL)) < 26)
    return DIRTREE_RECURSE | DIRTREE_SYMFOLLOW;
The last test is a very crude way of checking if we are in
the right directory.

The return value is a bitmask. If we return 0,
the caller goes on to the next entry.
For sysfs, the only hidden entries (that have a leading '.') are
./ and ../; we must return 0 on both of these,
since we don't want to end up in the parent directory or stuck in a loop.
dirtree_notdotdot(tree) should also work to detect this.

When we return DIRTREE_RECURSE, the caller tries to treat the
current entry as a directory.

Without DIRTREE_SYMFOLLOW, the caller treats symlinks as files, and
fails to follow them. Since sysfs uses lots of symlinks, I'm following
them.

Code: Select all


  if (0 <= (dfd = open(dirtree_path(tree, NULL), O_RDONLY))) {
    int fd, len;

    if ((fd = openat(dfd, "type", O_RDONLY)) < 0) goto done;
    len = readall(fd, toybuf, sizeof(toybuf));
    close(fd);
    if (len < 1) goto done;

    if (!strncmp(toybuf, "Battery", 7)) {
      if ((toys.optflags & FLAG_b) || (!toys.optflags)) {
        int cap = 0, curr = 0, max = 0;

        if ((cap = read_int_at(dfd, "capacity")) < 0) {
          if ((max = read_int_at(dfd, "charge_full")) > 0)
            curr = read_int_at(dfd, "charge_now");
          else if ((max = read_int_at(dfd, "energy_full")) > 0)
            curr = read_int_at(dfd, "energy_now");
          if (max > 0 && curr >= 0) cap = 100 * curr / max;
        }
        if (cap >= 0) printf("Battery %d: %d%%\n", TT.bat++, cap);
      }
    } else if (toys.optflags & FLAG_a) {
      int on;

      if ((on = read_int_at(dfd, "online")) >= 0)
        printf("Adapter %d: %s-line\n", TT.ac++, (on ? "on" : "off"));
    }
done:
    close(dfd);
  }
  return 0;
}
That goto found its way in in the cleanup.
Toybox follows the Linux kernel's policy on gotos, which is that they are
suitable for error handling when you need to do a little cleanup before
returning.

Code: Select all


void acpi_main(void)
{
  dirtree_read("/sys/class/power_supply", acpi_callback);
}
And that is how to use dirtree_read(), which is the caller I was mentioning.
Call it with a pathname and callback, and
1) it calls the callback with a "struct dirtree *" of the original path.
2) If the return value indicates that it should recurse, dirtree_read
checks whether it is looking at a directory, symlink, or file.
If it's a directory, or DIRTREE_SYMFOLLOW is set and it's a symlink,
dirtree_read opens the path and repeats (1) for each entry in the directory.

And yes: acpi_main is the main function.
When you call a command, the multiplexer calls
void <command>_main(void);

Post Reply