unsorted C snippets for small/fast static apps

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#21 Post by technosaurus »

still working on syscalls. Here is a wacked out macro that redefines syscall() based on the number of args.

Code: Select all

#define NARGS(...) NARGS_(__VA_ARGS__, 6, 5, 4, 3, 2, 1, 0)(__VA_ARGS__)
#define NARGS_(dummy, n1, n2, n3, n4, n5, n6, n, ...) syscall##n
#define syscall(...) NARGS(__VA_ARGS__)
thus syscall(__NR_mycall) becomes syscall0(mycall)
while syscall(__NR_mycall,a,b,c,d,e,f) becomes syscall6(mycall,a,b,c,d,e,f)

... and you thought you needed c++ for this :)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#22 Post by technosaurus »

here is a good link to some try-catch macros
http://www.di.unipi.it/~nids/docs/longj ... catch.html
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#23 Post by technosaurus »

I now have the beginnings of my nano-libc that can be used as a single header file and have added many mime types to the "file --mime-type alternative"

The libc should work with any gcc toolchain and handles all linux syscalls (stat is the only one with the structs predefined though) as well as a couple of string functions and a basic {f}printf - I recommend using musl libc to add any extra functions you may need.

A statically compiled program that uses printf is just over 1kb (other libc implementations are at least 7kb) and just a basic write(1,"hello world\n",12); is ~600b (without using sstrip, - that brings it to under half a kb)

ftype.c has also been tested with it and runs 60-120 times faster than `file --mime-type`
Attachments
ftype.c.gz
(10.13 KiB) Downloaded 713 times
libc.h.gz
(5.14 KiB) Downloaded 777 times
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

goingnuts
Posts: 932
Joined: Sun 07 Dec 2008, 13:33
Contact:

#24 Post by goingnuts »

Cool!
Is it possible to use the libc.h directly as #include "libc.h" or is it better to copy the different functions into the main code?
I tried to substitute the codeblock in ftype.c within #ifdef STANDALONE with #include "libc.h" but got some errors compiling:

Code: Select all

libc.h:554: error: `__NR_write' undeclared
ftype.c:910: error: `__NR_stat' undeclared (first use in this function)
Compiling ftype.c unmodified

Code: Select all

 [diet]  gcc -nostdlib -nostdinc -fno-builtin -Os -fno-asynchronous-unwind-tables -fomit-frame-pointer -fdata-sections -ffunction-sections -Wl,--gc-sections,-s ftype.c -o ftype
with diet libc I get errors:

Code: Select all

ftype.c:(.text+0x0): multiple definition of `_start'
/usr/dietlibc/lib-i386/start.o:(.text+0x0): first defined here
/usr/dietlibc/lib-i386/libc.a(atexit.o): In function `exit':
atexit.c:(.text+0x24): multiple definition of `exit'
but with uclibc & glibc no problems.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#25 Post by technosaurus »

I leave off nostdinc so that it picks up the NR* syscall defines from the linux headers ... See unistd.h (the only include in libc.h) they could be included directly, but I am leaving platform specific stuff out as much as possible for future arm stuff. My latest version of ftype only uses libc.h instead of builtin standalone code. I think Its currently a good starting point though to bootstrap whatever additional functions you may need for a single purpose app. At least printf doesnt keep your app from running on the stack like most implementations do... at least for now since only the most common bits are implemented (va_* was the hard part, but it can be used for other fxns that need variable# of args)

When you build with the parameters in the libc.h header it should work the same for any toolchain ... Otherwise it is sucking in includes from other than just the linux headers, but I havent found a good way to get 1 without the other.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#26 Post by technosaurus »

Its a bit disorganized but I added a ton of stuff to test
Edit: ok, really disorganized, I will do a bunch of cleanup and testing before the next version and try to complete the string functions and will be in a new thread.

The primary purpose is to allow for smaller overheads in static binaries so that multicall binaries are not needed so much. To finish this out, I need to make a tool chain that symlinks all of the stdinc files to libc.h and wraps cc with nostdlib nostdinc and fnobuiltin and some optimizations.

If anyone uses it and runs into missing structs or defined constants, please post them, (figuring out the basic types used in structs can take a lot of time) which also reminds me, I should define these to the basic types as I find them (rather than typedeffing them) ... Of course that means you should do development against a "real" c library first for sanity checks (any noted differences will help). Currently only targeting x86, but will consider basic arm support (64 bit versions wouldnt make sense, but feel free to fork )

note, linux 3.10 is lts so it will be the basis, thus sycalls may not be available in older kernels... most static binaries will work unless you try to use a newer syscall like finit_module on an older kernel
Attachments
libc.h-0.1.gz
(12.4 KiB) Downloaded 737 times
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#27 Post by technosaurus »

Here is another set of macros for packed/sparse boolean values useful for storing a large array of flags:

Code: Select all

#define getbit(x,n) x[n>>((sizeof(*x)>>1)+3)]  &  1 << (n&((sizeof(*x)<<3)-1))
#define setbit(x,n) x[n>>((sizeof(*x)>>1)+3)] |=  1 << (n&((sizeof(*x)<<3)-1))
#define flpbit(x,n) x[n>>((sizeof(*x)>>1)+3)] ^=  1 << (n&((sizeof(*x)<<3)-1))
#define clrbit(x,n) x[n>>((sizeof(*x)>>1)+3)] &= ~( 1 << (n&((sizeof(*x)<<3)-1)) )

//note the << and >> operations could depend on endianness
// >>3 == /8 and <<3 == *8, but is faster it with optimization off
to initialize a large array of booleans all you need to do is:
char cbits[]={0,0xF,0,0xFF};
// 00000000000011110000000011111111
or for all zeroes
char cbits[4]={0};
// 00000000000000000000000000000000
int ibits[]={0xF0F0F0F0,~0};
//1111000011110000111100001111000011111111111111111111111111111111

then just use the macros to access the bitfields
If you will only be accessing 1 type of array like this it may be better to make the macros into proper functions like:

Code: Select all

char getbit(char *x, unsigned n){
  return x[n>>3]  &  1 << (n&7);
}
Why do this? Normally boolean types take at least 8 bits because they have to be addressable in memory on x86(_64) they are 32 (or 64) bits.

If you have a large amount of flags in the range of thousands, it could be the difference in whether the program stays in cache (or run out of memory on constrained systems)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#28 Post by technosaurus »

Here is a demo of an array of function pointers that simplifies and expands what most demos will show. The F() macro included should work for most function types

Code: Select all

#include <stdio.h>

int f0(int a, int b){return a+b;} //ADD
int f1(int a, int b){return a-b;} //SUBTRACT
int f2(int a, int b){return a*b;} //MULTIPLY
int f3(int a, int b){return a/b;} //DIVIDE

//use these to access the function in the array
enum{ ADD, SUBTRACT, MULTIPLY, DIVIDE, NUM_FUNCS};

//assign the corresponding functions in same order as enum above 
int (*f[NUM_FUNCS]) () = {f0,f1,f2,f3};

//this uses the first arg as the function pointer index and passes the remaining args to it
#define F(x,...) (*f[x]) (__VA_ARGS__)

int main(void){
  printf("%d\n", F(ADD, 50, 6) );
  printf("%d\n", F(SUBTRACT, 50, 6) );
  printf("%d\n", F(MULTIPLY, 50, 6) );
  printf("%d\n", F(DIVIDE, 50, 6) );
  return 0;
}
This is useful for iterating on data with different functions, conditionally calling arbitrary functions, implementing an object based system, or even to save small amount of space in a shared library (enums can be 1 byte vs <long_function_name>)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#29 Post by technosaurus »

This code will load a shared library and pass args to a function it contains. Currently only supports char* types, but can add a parser later

Code: Select all

//gcc -rdynamic -o foo foo.c -ldl 
#include <dlfcn.h> /*for dlopen,dlsym,dlclose*/

int main(int argc, char **argv){

	/* get a "handle" for a shared library*/
	void *handle = dlopen(argv[1], RTLD_LAZY);

	/* make sure we got a handle before continuing*/
	if (! handle) return 1;

	/*undefined, but workable solution : POSIX.1-2003 (Technical
       Corrigendum 1) */
	void* (*f)()=dlsym(handle, argv[2]);

	/*now call the function f(argv[3],argv[4],...argv[argc]); */
	//TODO convert args to unsigned char representations for other types
	while (argc > 2)  /*ugh, have to use asm to preserve stack*/
		asm("push %0"::"r"(argv[argc--])); /*from right to left*/
	asm("call *%0"::"r"(f)); //TODO  "=a"(ret) where is uchar[XXX]

	/*remember that shared library we opened?*/
	dlclose(handle);
	return 0;
}
Another helper for seeing if a string end with an extension

Code: Select all

int strhasext(char *s, char *ext){
	return !strcmp(s+strlen(s)-strlen(ext),ext);
}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#30 Post by technosaurus »

This is not the fastest way on _all_ systems, but pretty close to the fast inverse square method and more portable by using a simple binary split method

Code: Select all

//replaces x with the floor of its base-2 logarithm
#define TO_MSB(x) do{int i=(sizeof(x)*8),r=-!x;while(i>>=1)x>>i?x>>=i,r+=i:0;x=r;}while(0)
//same but returns it to r
#define MSB(x,r) do{int i=(sizeof(x)*8);r=-!x;while(i>>=1)x>>i?x>>=i,r+=i:0;}while(0)
Explanation:
1. floor of its base-2 logarithm is just a fancy way of saying the position of the most significant bit.

Further info here:
http://codegolf.stackexchange.com/a/35211/21288

Code: Select all

char *nthstring(const char *s, unsigned n){
	while(n--)while(*s++);
	return s;
}

//usage:
//static const char strings[]="hello\0world\0test";
//printf("%s\n",nthstring(strings,1));
//> world
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

stevenhoneyman1
Posts: 4
Joined: Sun 28 Sep 2014, 11:50

#31 Post by stevenhoneyman1 »

technosaurus wrote:Its a bit disorganized but I added a ton of stuff to test
Edit: ok, really disorganized, I will do a bunch of cleanup and testing before the next version and try to complete the string functions and will be in a new thread.

The primary purpose is to allow for smaller overheads in static binaries so that multicall binaries are not needed so much. To finish this out, I need to make a tool chain that symlinks all of the stdinc files to libc.h and wraps cc with nostdlib nostdinc and fnobuiltin and some optimizations.

If anyone uses it and runs into missing structs or defined constants, please post them, (figuring out the basic types used in structs can take a lot of time) which also reminds me, I should define these to the basic types as I find them (rather than typedeffing them) ... Of course that means you should do development against a "real" c library first for sanity checks (any noted differences will help). Currently only targeting x86, but will consider basic arm support (64 bit versions wouldnt make sense, but feel free to fork )

note, linux 3.10 is lts so it will be the basis, thus sycalls may not be available in older kernels... most static binaries will work unless you try to use a newer syscall like finit_module on an older kernel
@technosaurus - I really like these "fake libc" macros, but was wondering (before I spend any more time using/testing the "0.1" version), have you made any updates since this was posted? I had a search but this thread is all that seems to appear.

Thanks :)

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#32 Post by technosaurus »

I am working on libc.h again. To be more exact, I am adding x86_64 and networking. My current approach to this is to replace the cumbersome gethostbyname, getaddrinfo and other bloated tools with a simple function that takes a host name, does a DNS query using a public DNS server and returns an IP address. Not only does it make it way smaller, but there is no need to set up a plethora of structs and buffers just to connect to a server.

I may patch the kernel to use it when dhcp is enabled if I can figure out where to wedge it in ... I recall seeing a function that parsed the x.x.x.x formatted strings to an IP address somewhere but can't find it now.

TIP: echo | gcc -E -dM |sort
will give you the gcc predefined macros for your platform (note that it changes when you add some compiler flags)

Here is my alternative method to get an IP address from a hostname:
It uses google's public DNS @ 8.8.8.8 which is a palindromic ip address, so it will work independent of endianness (arm, mips, x86)

Code: Select all

#include <netinet/in.h>

uint32_t host2ip(char *host, uint32_t dns){
	unsigned char buf[4096]={0}, *bufp=buf,
		*hp=(unsigned char *)"\0\0" "\x01\0" "\0\x01" "\0\0" "\0\0" "\0\0";
	struct sockaddr_in dest = { //CN 0x72727272 RU 0x3E4C4C3E US2 0x08080808
		.sin_family=AF_INET, .sin_port=htons(53), .sin_addr.s_addr=dns 
	};
	uint32_t i, j, ans, ip=0, destsz=sizeof(struct sockaddr_in);
	int	s=socket(AF_INET , SOCK_DGRAM , IPPROTO_UDP);
	if (s<0) goto IPV4END;
	for(i=0;i<12;i++) *bufp++=*hp++; //copy header
	i=j=0;
	do{ /* convert www.example.com to 3www7example3com */
		if(host[i]=='.' || !host[i]){ //could use strchrnul() here instead
			*bufp++ = i-j;
			for(;j<i;j++)
				*bufp++=host[j];
			++j;
		}
	}while(host[i++]);
	*bufp++='\0';
	if (!(bufp-buf)&1) *bufp++='\0';
	*(bufp++)=0; *(bufp++)=1; *(bufp++)=0; *(bufp++)=1; //extra Q fields
	i=sendto(s, buf, bufp-buf, 0, (struct sockaddr*)&dest, destsz);
	if (i < 0) goto IPV4END;
	i=recvfrom(s,buf,sizeof(buf),0,(struct sockaddr*)&dest,(socklen_t*)&destsz);
	if (i < 0) goto IPV4END;
	for(i=0;i<buf[7];i++){ //[7] holds num of answers([6] does too but >256?)
		while(*bufp) ++bufp; //skip names
		ans=bufp[1]; //[1] holds the answer type ([0] does too, but >256???)
		bufp += 10;
		if(ans == 1){ uint32_t j=4; // ipv4 address
			unsigned char *ipp=(unsigned char *)&ip;
			while(j--) *ipp++=*bufp++;
			goto IPV4END;
		}else while(*bufp) ++bufp; //skip (alias) names
	}
IPV4END:
	close(s);
	return ip;
}

#ifdef TEST
#include <stdio.h> //printf ... adds ~16k on static musl builds
int main( int argc ,char **argv){
	if (argc < 2) return 1;
	in_addr_t ip=host2ip(argv[1],0x04020204);
	if (!ip){
		perror("host2ip");
		return 1;
	}
	printf("%d.%d.%d.%d\n",((unsigned char*)&ip)[0],((unsigned char*)&ip)[1],((unsigned char*)&ip)[2],((unsigned char*)&ip)[3]);
	return 0;
}
#endif
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#33 Post by technosaurus »

screw using repeated calls to strcat, strcpy, etc... or bloated sprintf and friends here is a macro to do it all in one go:

Code: Select all

#define strcpyall(buf, ...) do{ \
	char *bp=buf, *a[] = { __VA_ARGS__,NULL}, **ss=a, *s; \
	while(s=*ss++) while(*s)*bp++=*s++; *bp=0;
}while(0)

#include <stdio.h>

int main(int argc, char **argv){
	char buf[4096], *world=" world!\n";
	strcpyall(buf,"hello", world, "this", " ", "is", " ", "a", " ", "great", world);
	printf("%s", buf);
}
or a slightly slower but checked version that requires char[] instead of allowing char* or char[]

Code: Select all

#define strcpyall_checked(buf, ...) do{ int l=sizeof(buf); \
	char *bp=buf, *a[] = { __VA_ARGS__,NULL}, **ss=a, *s; \
	while(s=*ss++) while(*s && --l)*bp++=*s++; *bp=0; \
}while(0)
Edit: Cleaned up to remove warnings.

Code: Select all

#include <errno.h>
#define _PASTE(x,y) x##y
#define PASTE(x,y) _PASTE(x,y)

#ifdef __cplusplus
#define ASSERT_ARRAY(x) do { \
	const int PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__)=((void*)&(x)==&(x)[0]); \
	typedef struct{ \
		int a :PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__); \
	}a; \
}while(0)
#else
#define ASSERT_ARRAY(x) do{ \
(void)sizeof(struct { \
	int PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__) : ((void*)&(x) == &(x)[0]); \
}); \
}while(0)
#endif

#define strcpyALL_CHECKED(buf,offset, ...) do{ \
	ASSERT_ARRAY(buf); \
	char *bp=buf+(size_t)offset; /* make it unsigned to prevent underrun */ \
    const char *s, *a[] = { __VA_ARGS__,NULL}, **ss=a; \
	while((s=*ss++)) \
		while((*s)&&(++offset<sizeof(buf)))*bp++=*s++; \
	if (offset<sizeof(buf)) \
		*bp=0; \
/*	else { /* or just leave it alone and check the offset vs sizeof(buf)? */ \
/*		offset=-1; */ \
/*		errno=ERANGE;*/ \
/*	}*/ \
}while(0)

#define strcpyALL_UNCHECKED(buf, ...) do{ \
	char *bp=buf; \
    const char *s, *a[] = { __VA_ARGS__,NULL}, **ss=a; \
	while((s=*ss++)) \
		while((*bp++=*s++))); \
}while(0)

Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#34 Post by technosaurus »

Here is a preview of some stuff that is cooking:

Code: Select all

//unfortunately gcc has no builtin for stack pointer, so we use assembly
#ifdef __x86_64__ || defined __i386__
	#define STACK_POINTER "esp"
#elif defined __aarch64__
	#define STACK_POINTER "x13"
#elif defined __arm__
	#define  STACK_POINTER "r13"
#endif
char **environ;
int main();
void _start(void){
	register long *sp __asm__( STACK_POINTER );
//if you don't use argc, argv or envp/environ,  you can just remove them
	long argc = *sp;
	char **argv = (char **)(sp + 1);
	environ = (char **)(sp + argc + 1);
	exit(main(argc, argv, environ) );
	__builtin_unreachable(); //or for(;;); to shut up gcc
}
I have a condensed format for adding new architectures with most of the details in a tabular format:

Code: Select all

#define ARCH_TEMPLATE stckptr,syscall,callnum,ret,arg1,arg2,arg3,arg4,arg5,arg6,arg7,"memory",...
#define ARCH_ALPHA sp,syscall,v0,v0,a0,a1,a2,a3,a4,a5,a6,"memory"
#define ARCH_ARM   r13,swi 0x0,r7,r0,r0,r1,r2,r3,r4,r5,r6,"memory"
#define ARCH_ARM64 x13,svc 0,x8,x0,x0,x1,x2,x3,x4,x5,0,"memory", \
	"x7","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18"
#define ARCH_AVR32 ???,scall,
#define ARCH_BFIN  SP,excpt 0x0,P0,R0,R0,R1,R2,R3,R4,R5,0,"memory" //more clobs?
#define ARCH_CRIS  ??,break 13,r9,r9??,r10,r11,r12,r13,mof,srp,0,"memory"
#define ARCH_HPPA  %usp,ble 0x100(%sr2,%r0),%r20,%r28,%r26,%r25,%r24,%r23,%r22,%r21,0, \
	"memory","r1","r2","r20","r29","r31"
#define ARCH_IA64  ???,break 0x100000,r15,r10/r8,out0,out1,out2,out3,out4,out5,0,"memory"
#define ARCH_M68K  %sp,trap &0,%d0,%d0,%d1,%d2,%d3,%d4,%d5,%a0,"memory","%d0","%d1","%a0"
#define ARCH_MBLAZ ???,brki r14, 0x8,r12,r12??,r5,r6,r7,r8,r9,r10,0,"memory","r4"
#define ARCH_MIPS  $sp,syscall,$v0,$v0,$a0,$a1,$a2,$a3,$a4,$a5,$a6,"memory", \
	"$at","$t0","$t1","$t2","$t3","$t4","$t5","$t6","$t7","$t8","$t9","$hi","$lo"
#define ARCH_MIPS64 $sp,syscall,$v0,$v0,$a0,$a1,$a2,$a3,$a4,$a5,$a6,"memory"
	// ,"$at","$t0","$t1","$t2","$t3","$t4","$t5","$t6","$t7","$t8","$t9","$hi","$lo"
#define ARCH_OR1K  ???,l.sys 1,r11,r11,r3,r4,r5,r6,r7,r8,0,"memory","r12","r13","r15","r17","r19","r21","r23","r25","r27","r29","r31"
#define ARCH_PPC   ???,sc,r0,r0,r3,r4,r5,r6,,r7,r8,0,"memory","cr0","ctr","r8","r9","r10","r11","r12"
#define ARCH_PPC64 ???,sc,r0,r0,r3,r4,r5,r6,,r7,r8,0,"memory","cr0","ctr","r8","r9","r10","r11","r12"
#define ARCH_S390  ???,svc 0,r1,r2,r2,r3,r4,r5,r6,r7,0,"memory"
#define ARCH_SH    ???,trapa #,r3,r3??,r4,r5,r6,r7,r0,r1,"memory" //
#define ARCH_SPARC32 ???,t 0x10,g1,o0,o0,o1,o2,o3,o4,o5,0,"memory"
#define ARCH_SPARC64 ???,t 0x6d,g1,o0,o0,o1,o2,o3,o4,o5,0,"memory"
#define ARCH_X8664 esp,syscall,rax,rax,rdi,rsi,rdx,r10,r8,r9,0,"memory","rcx","r11"
#define ARCH_X86   esp,int $128,eax,eax,ebx,ecx,edx,esi,edi,ebp,0,"memory"
#define ARCH_XTNSA ???,syscall,a2,a2??,a6,a3,a4,a5,a8,a9,0,"memory"
This table is incomplete and some architectures may be completely wrong, It was started from info at http://man7.org/linux/man-pages/man2/syscall.2.html and various ABI descriptions. If you see something wrong, let me know.

At the moment I have stripped out all the internal C functions and am defining them to use builtins (including atomic and Cilk Plus) until someone finds a need for a separate function. So far I have only needed strstr and thus strncmp (but may be able to use __builtin_memcmp?)

Here is an example:

Code: Select all

#ifdef __clang__
	#define HAS(...) __has_builtin(__VA_ARGS__)
#elif defined __GNUC__ //assume gcc ... (where the list came from)
	#define HAS(...) 1
#else
	#define HAS(...) 0
#endif
#if HAS(__builtin_abort)
	#define abort __builtin_abort
#endif
//....
the example downloader compiles to <2kb (stripped) on x86_64 with
gcc -Os -ffreestanding -nostartfiles -nostdlib -fno-asynchronous-unwind-tables -fomit-frame-pointer -mno-accumulate-outgoing-args -finline-small-functions -finline-functions-called-once -o get get.c -s -Wl,--gc-sections,--sort-common,-s -Wall -Wextra

Note: I still need to implement socketcall for x86 (and others) and map the related calls to it if the syscall is not defined
Attachments
bqc.h.gz
added mmx and basic x86 intrinsics since initial upload
(31.41 KiB) Downloaded 425 times
get.c.gz
simple downloader. usage: get host path
get www.puppylinux.com /index.html #note the space after the host
(1.17 KiB) Downloaded 441 times
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#35 Post by technosaurus »

compilers are generally bad at creating jump tables for switch() optimization if the case: contains any function.

this would probably get optimized:

Code: Select all

void putstring(unsigned long i){
const char *s;
switch(i){
  case 0: s="zero";break;
  case 1: s="one";break;
  case 2: s="two";break;
  case 3: s="three";break;
  case 4: s="four";break;
  case 5: s="five";break;
  case 6: s="six";break;
  case 7: s="seven";break;
  case 8: s="eight";break;
  case 9: s="nine";break;
  default: s="error";
}
puts(s);
}
But if you were to replace the s=*; with puts(*); it will compile to the equivalent of a series of if-else statements instead of a jump table... some compilers will do this anyhow, but you can do the same thing a slightly different way that is optimized on all compilers just by using an array of const strings.

Here is a basic example:

Code: Select all

enum{ZERO,ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT,NINE,LASTNUM};
const char *strings[]=
{"zero","one","two","three","four","five","six","seven","eight","nine","error"};
static inline void put_string(size_t x, size_t last){
   puts(strings[ (x < last) ? x : last]);
}
//put_string(ZERO, LASTNUM);
Its not too difficult to follow and can save a lot of code in the long run
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Moose On The Loose
Posts: 965
Joined: Thu 24 Feb 2011, 14:54

#36 Post by Moose On The Loose »

[quote="technosaurus"]

Code: Select all

const char *strings[]=
{"zero","one","two","three","four","five","six","seven","eight","nine","error"};
static inline void put_string(size_t x, size_t last){
   puts(strings[ (x < last) ? x : last]);
}
//put_string(ZERO, LASTNUM);
Why the "size_t" for something that will be used as an array index?

BTW: If you need a huge number of strings, there is a way you can compress the strings at the cost of overhead in the display process and work at the compile time.

It is very common to have a bunch of messages with the same words in them. Messages also never have the characters 128 and above in them. You can dribble the string out with a putc() checking each character for being above 128 as you go. If you see a value above 128 you recur with (ThisCharacter-128+DICTIONARY_START)

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#37 Post by technosaurus »

Yes, it seems odd, but if an index is not the equivalent of size_t, the compiler will add an extra MOV instruction to extend it.

Re string compression. I thought about using 0-X for run length encoding and 128-255 for dictionary entries.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#38 Post by technosaurus »

I wrote a macro that implements a buffered replacement for *printf() based on my strcpy_ALL code that allows (forces) you to do away with format strings altogether.

Code: Select all

int write_chars(int fd, const char **a){
	char buf[4096]; /*alignas(PAGESIZE)?*/
	size_t offset=0;
	int ret=0;
	const char *s;
	while(s=*a++){
		while(*s){
			buf[offset++]=*s++;
			if (offset==sizeof(buf)){
				ret += write(fd,buf,offset);
				offset=0;
			}
		}
	}
	if (offset) ret+=write(fd,buf,offset);
	return ret;
}

#define FDPRINTF(fd,...) write_chars(fd,(const char *[]){__VA_ARGS__,NULL})
#define FPRINTF(fs,...) FDPRINTF(fileno(fs),__VA_ARGS__)
#define PRINTF(...) FDPRINTF(1,__VA_ARGS__)
#define EPRINTF(...) FDPRINTF(2,__VA_ARGS__)
So the format is significantly different from their lower case non-macro counterparts, but the same things can be accomplished.

Code: Select all

printf("start: %d,%d end\n", 0xFFCF, 999);
PRINTF("start : ", itoa(0xFFCF), ",", itoa(999), " : end\n");
So it is really formatted more like C++ cout
...maybe I should rename it accordingly.

I'm working on a sprintf/snprintf replacement next ... not sure if I will be able to combine them or not yet.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#39 Post by technosaurus »

I was working on PDMP3 and found that pow(x,4.0f/3.0f) was considerably faster when converted to cbrt((x*x)*(x*x)), but then I tried to optimize the cbrt part and combined them as follow:

Code: Select all

/* Description: returns x^(4/3)
 * same as cbrt((x*x)*(x*x)), but optimized for the limited cases we handle (integers 0-8209)
 */
static inline float pow43opt2(float x) {
  if (x<2) return x;
  else x*=x,x*=x; //pow(x,4)
  float f3,x2=x+x;
  union {float f; unsigned i;} u = {x};
  u.i = u.i/3 + 0x2a517d3c; //~cbrt(x)
  int accuracy_iterations=2;  //reduce for speed, increase for precision
  while (accuracy_iterations--){ //Lancaster iterations
    f3=u.f*u.f*u.f;
    u.f *= (f3 + x2) / (f3 + f3 + x);
  }
  return u.f;
}
This is roughly 50% faster than using musl's similar cbrtf() function or even gcc's __builtin_cbrtf() ... maybe because it doesn't deal with negative values and over 200% faster if accuracy_iterations=0.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#40 Post by technosaurus »

I took a look at some of the math functions and came up with a way to make some of the functions compile fast or small with the same code using taylor series approximation.

Code: Select all

float inverse_factorial_f[]={
  0.0, 1.000000e+00,  5.000000e-01, 1.666667e-01,  4.166667e-02, 8.333333e-03, 1.388889e-03, 1.984127e-04,
};

float cosf(float x){
  float xx=-(x*x), term=1, res=1;
  int i, max=8;  //taylor series => 1-x^2/2!+x^4/4!-x^6/6!+x^8/8!...
  for (i=2;i<max;i+=2)
    res+=(term*=xx)*inverse_factorial_f[i];
  return res;
}

float sinf(float x){
  float xx=-(x*x), term=x, res=x;
  int i, max=8; //taylor series => x-x^3/3!+x^5/5!-x^7/7!+x^9/9!-...
  for (i=3;i<max;i+=2)
    res+=(term*=xx)*inverse_factorial_f[i];
  return res;
}

float atanf(float x){
  float xx=-(x*x), term=x, res=x;
  int i, max=8; //taylor series => x-x^3/3+x^5/5-x^7/7+x^9/9-...
  for (i=3;i<max;i+=2)
    res+=(term*=xx)/i;
  return res;
}

float expf(float x){
  float term=x, res=1+x;
  int i, max=10; //taylor series => 1+x+x^2/2!+x^3/3!+x^4/4!+x^5/5!...
  for (i=2;i<max;++i)
    res+=(term*=x)*inverse_factorial_f[i];
  return res;
}

Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply