unsorted C snippets for small/fast static apps

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Message
Author
stevenhoneyman1
Posts: 4
Joined: Sun 28 Sep 2014, 11:50

#31 Post by stevenhoneyman1 »

technosaurus wrote:Its a bit disorganized but I added a ton of stuff to test
Edit: ok, really disorganized, I will do a bunch of cleanup and testing before the next version and try to complete the string functions and will be in a new thread.

The primary purpose is to allow for smaller overheads in static binaries so that multicall binaries are not needed so much. To finish this out, I need to make a tool chain that symlinks all of the stdinc files to libc.h and wraps cc with nostdlib nostdinc and fnobuiltin and some optimizations.

If anyone uses it and runs into missing structs or defined constants, please post them, (figuring out the basic types used in structs can take a lot of time) which also reminds me, I should define these to the basic types as I find them (rather than typedeffing them) ... Of course that means you should do development against a "real" c library first for sanity checks (any noted differences will help). Currently only targeting x86, but will consider basic arm support (64 bit versions wouldnt make sense, but feel free to fork )

note, linux 3.10 is lts so it will be the basis, thus sycalls may not be available in older kernels... most static binaries will work unless you try to use a newer syscall like finit_module on an older kernel
@technosaurus - I really like these "fake libc" macros, but was wondering (before I spend any more time using/testing the "0.1" version), have you made any updates since this was posted? I had a search but this thread is all that seems to appear.

Thanks :)

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#32 Post by technosaurus »

I am working on libc.h again. To be more exact, I am adding x86_64 and networking. My current approach to this is to replace the cumbersome gethostbyname, getaddrinfo and other bloated tools with a simple function that takes a host name, does a DNS query using a public DNS server and returns an IP address. Not only does it make it way smaller, but there is no need to set up a plethora of structs and buffers just to connect to a server.

I may patch the kernel to use it when dhcp is enabled if I can figure out where to wedge it in ... I recall seeing a function that parsed the x.x.x.x formatted strings to an IP address somewhere but can't find it now.

TIP: echo | gcc -E -dM |sort
will give you the gcc predefined macros for your platform (note that it changes when you add some compiler flags)

Here is my alternative method to get an IP address from a hostname:
It uses google's public DNS @ 8.8.8.8 which is a palindromic ip address, so it will work independent of endianness (arm, mips, x86)

Code: Select all

#include <netinet/in.h>

uint32_t host2ip(char *host, uint32_t dns){
	unsigned char buf[4096]={0}, *bufp=buf,
		*hp=(unsigned char *)"\0\0" "\x01\0" "\0\x01" "\0\0" "\0\0" "\0\0";
	struct sockaddr_in dest = { //CN 0x72727272 RU 0x3E4C4C3E US2 0x08080808
		.sin_family=AF_INET, .sin_port=htons(53), .sin_addr.s_addr=dns 
	};
	uint32_t i, j, ans, ip=0, destsz=sizeof(struct sockaddr_in);
	int	s=socket(AF_INET , SOCK_DGRAM , IPPROTO_UDP);
	if (s<0) goto IPV4END;
	for(i=0;i<12;i++) *bufp++=*hp++; //copy header
	i=j=0;
	do{ /* convert www.example.com to 3www7example3com */
		if(host[i]=='.' || !host[i]){ //could use strchrnul() here instead
			*bufp++ = i-j;
			for(;j<i;j++)
				*bufp++=host[j];
			++j;
		}
	}while(host[i++]);
	*bufp++='\0';
	if (!(bufp-buf)&1) *bufp++='\0';
	*(bufp++)=0; *(bufp++)=1; *(bufp++)=0; *(bufp++)=1; //extra Q fields
	i=sendto(s, buf, bufp-buf, 0, (struct sockaddr*)&dest, destsz);
	if (i < 0) goto IPV4END;
	i=recvfrom(s,buf,sizeof(buf),0,(struct sockaddr*)&dest,(socklen_t*)&destsz);
	if (i < 0) goto IPV4END;
	for(i=0;i<buf[7];i++){ //[7] holds num of answers([6] does too but >256?)
		while(*bufp) ++bufp; //skip names
		ans=bufp[1]; //[1] holds the answer type ([0] does too, but >256???)
		bufp += 10;
		if(ans == 1){ uint32_t j=4; // ipv4 address
			unsigned char *ipp=(unsigned char *)&ip;
			while(j--) *ipp++=*bufp++;
			goto IPV4END;
		}else while(*bufp) ++bufp; //skip (alias) names
	}
IPV4END:
	close(s);
	return ip;
}

#ifdef TEST
#include <stdio.h> //printf ... adds ~16k on static musl builds
int main( int argc ,char **argv){
	if (argc < 2) return 1;
	in_addr_t ip=host2ip(argv[1],0x04020204);
	if (!ip){
		perror("host2ip");
		return 1;
	}
	printf("%d.%d.%d.%d\n",((unsigned char*)&ip)[0],((unsigned char*)&ip)[1],((unsigned char*)&ip)[2],((unsigned char*)&ip)[3]);
	return 0;
}
#endif
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#33 Post by technosaurus »

screw using repeated calls to strcat, strcpy, etc... or bloated sprintf and friends here is a macro to do it all in one go:

Code: Select all

#define strcpyall(buf, ...) do{ \
	char *bp=buf, *a[] = { __VA_ARGS__,NULL}, **ss=a, *s; \
	while(s=*ss++) while(*s)*bp++=*s++; *bp=0;
}while(0)

#include <stdio.h>

int main(int argc, char **argv){
	char buf[4096], *world=" world!\n";
	strcpyall(buf,"hello", world, "this", " ", "is", " ", "a", " ", "great", world);
	printf("%s", buf);
}
or a slightly slower but checked version that requires char[] instead of allowing char* or char[]

Code: Select all

#define strcpyall_checked(buf, ...) do{ int l=sizeof(buf); \
	char *bp=buf, *a[] = { __VA_ARGS__,NULL}, **ss=a, *s; \
	while(s=*ss++) while(*s && --l)*bp++=*s++; *bp=0; \
}while(0)
Edit: Cleaned up to remove warnings.

Code: Select all

#include <errno.h>
#define _PASTE(x,y) x##y
#define PASTE(x,y) _PASTE(x,y)

#ifdef __cplusplus
#define ASSERT_ARRAY(x) do { \
	const int PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__)=((void*)&(x)==&(x)[0]); \
	typedef struct{ \
		int a :PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__); \
	}a; \
}while(0)
#else
#define ASSERT_ARRAY(x) do{ \
(void)sizeof(struct { \
	int PASTE(x##_must_be_array_not_a_pointer_on_line_,__LINE__) : ((void*)&(x) == &(x)[0]); \
}); \
}while(0)
#endif

#define strcpyALL_CHECKED(buf,offset, ...) do{ \
	ASSERT_ARRAY(buf); \
	char *bp=buf+(size_t)offset; /* make it unsigned to prevent underrun */ \
    const char *s, *a[] = { __VA_ARGS__,NULL}, **ss=a; \
	while((s=*ss++)) \
		while((*s)&&(++offset<sizeof(buf)))*bp++=*s++; \
	if (offset<sizeof(buf)) \
		*bp=0; \
/*	else { /* or just leave it alone and check the offset vs sizeof(buf)? */ \
/*		offset=-1; */ \
/*		errno=ERANGE;*/ \
/*	}*/ \
}while(0)

#define strcpyALL_UNCHECKED(buf, ...) do{ \
	char *bp=buf; \
    const char *s, *a[] = { __VA_ARGS__,NULL}, **ss=a; \
	while((s=*ss++)) \
		while((*bp++=*s++))); \
}while(0)

Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#34 Post by technosaurus »

Here is a preview of some stuff that is cooking:

Code: Select all

//unfortunately gcc has no builtin for stack pointer, so we use assembly
#ifdef __x86_64__ || defined __i386__
	#define STACK_POINTER "esp"
#elif defined __aarch64__
	#define STACK_POINTER "x13"
#elif defined __arm__
	#define  STACK_POINTER "r13"
#endif
char **environ;
int main();
void _start(void){
	register long *sp __asm__( STACK_POINTER );
//if you don't use argc, argv or envp/environ,  you can just remove them
	long argc = *sp;
	char **argv = (char **)(sp + 1);
	environ = (char **)(sp + argc + 1);
	exit(main(argc, argv, environ) );
	__builtin_unreachable(); //or for(;;); to shut up gcc
}
I have a condensed format for adding new architectures with most of the details in a tabular format:

Code: Select all

#define ARCH_TEMPLATE stckptr,syscall,callnum,ret,arg1,arg2,arg3,arg4,arg5,arg6,arg7,"memory",...
#define ARCH_ALPHA sp,syscall,v0,v0,a0,a1,a2,a3,a4,a5,a6,"memory"
#define ARCH_ARM   r13,swi 0x0,r7,r0,r0,r1,r2,r3,r4,r5,r6,"memory"
#define ARCH_ARM64 x13,svc 0,x8,x0,x0,x1,x2,x3,x4,x5,0,"memory", \
	"x7","x9","x10","x11","x12","x13","x14","x15","x16","x17","x18"
#define ARCH_AVR32 ???,scall,
#define ARCH_BFIN  SP,excpt 0x0,P0,R0,R0,R1,R2,R3,R4,R5,0,"memory" //more clobs?
#define ARCH_CRIS  ??,break 13,r9,r9??,r10,r11,r12,r13,mof,srp,0,"memory"
#define ARCH_HPPA  %usp,ble 0x100(%sr2,%r0),%r20,%r28,%r26,%r25,%r24,%r23,%r22,%r21,0, \
	"memory","r1","r2","r20","r29","r31"
#define ARCH_IA64  ???,break 0x100000,r15,r10/r8,out0,out1,out2,out3,out4,out5,0,"memory"
#define ARCH_M68K  %sp,trap &0,%d0,%d0,%d1,%d2,%d3,%d4,%d5,%a0,"memory","%d0","%d1","%a0"
#define ARCH_MBLAZ ???,brki r14, 0x8,r12,r12??,r5,r6,r7,r8,r9,r10,0,"memory","r4"
#define ARCH_MIPS  $sp,syscall,$v0,$v0,$a0,$a1,$a2,$a3,$a4,$a5,$a6,"memory", \
	"$at","$t0","$t1","$t2","$t3","$t4","$t5","$t6","$t7","$t8","$t9","$hi","$lo"
#define ARCH_MIPS64 $sp,syscall,$v0,$v0,$a0,$a1,$a2,$a3,$a4,$a5,$a6,"memory"
	// ,"$at","$t0","$t1","$t2","$t3","$t4","$t5","$t6","$t7","$t8","$t9","$hi","$lo"
#define ARCH_OR1K  ???,l.sys 1,r11,r11,r3,r4,r5,r6,r7,r8,0,"memory","r12","r13","r15","r17","r19","r21","r23","r25","r27","r29","r31"
#define ARCH_PPC   ???,sc,r0,r0,r3,r4,r5,r6,,r7,r8,0,"memory","cr0","ctr","r8","r9","r10","r11","r12"
#define ARCH_PPC64 ???,sc,r0,r0,r3,r4,r5,r6,,r7,r8,0,"memory","cr0","ctr","r8","r9","r10","r11","r12"
#define ARCH_S390  ???,svc 0,r1,r2,r2,r3,r4,r5,r6,r7,0,"memory"
#define ARCH_SH    ???,trapa #,r3,r3??,r4,r5,r6,r7,r0,r1,"memory" //
#define ARCH_SPARC32 ???,t 0x10,g1,o0,o0,o1,o2,o3,o4,o5,0,"memory"
#define ARCH_SPARC64 ???,t 0x6d,g1,o0,o0,o1,o2,o3,o4,o5,0,"memory"
#define ARCH_X8664 esp,syscall,rax,rax,rdi,rsi,rdx,r10,r8,r9,0,"memory","rcx","r11"
#define ARCH_X86   esp,int $128,eax,eax,ebx,ecx,edx,esi,edi,ebp,0,"memory"
#define ARCH_XTNSA ???,syscall,a2,a2??,a6,a3,a4,a5,a8,a9,0,"memory"
This table is incomplete and some architectures may be completely wrong, It was started from info at http://man7.org/linux/man-pages/man2/syscall.2.html and various ABI descriptions. If you see something wrong, let me know.

At the moment I have stripped out all the internal C functions and am defining them to use builtins (including atomic and Cilk Plus) until someone finds a need for a separate function. So far I have only needed strstr and thus strncmp (but may be able to use __builtin_memcmp?)

Here is an example:

Code: Select all

#ifdef __clang__
	#define HAS(...) __has_builtin(__VA_ARGS__)
#elif defined __GNUC__ //assume gcc ... (where the list came from)
	#define HAS(...) 1
#else
	#define HAS(...) 0
#endif
#if HAS(__builtin_abort)
	#define abort __builtin_abort
#endif
//....
the example downloader compiles to <2kb (stripped) on x86_64 with
gcc -Os -ffreestanding -nostartfiles -nostdlib -fno-asynchronous-unwind-tables -fomit-frame-pointer -mno-accumulate-outgoing-args -finline-small-functions -finline-functions-called-once -o get get.c -s -Wl,--gc-sections,--sort-common,-s -Wall -Wextra

Note: I still need to implement socketcall for x86 (and others) and map the related calls to it if the syscall is not defined
Attachments
bqc.h.gz
added mmx and basic x86 intrinsics since initial upload
(31.41 KiB) Downloaded 425 times
get.c.gz
simple downloader. usage: get host path
get www.puppylinux.com /index.html #note the space after the host
(1.17 KiB) Downloaded 441 times
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#35 Post by technosaurus »

compilers are generally bad at creating jump tables for switch() optimization if the case: contains any function.

this would probably get optimized:

Code: Select all

void putstring(unsigned long i){
const char *s;
switch(i){
  case 0: s="zero";break;
  case 1: s="one";break;
  case 2: s="two";break;
  case 3: s="three";break;
  case 4: s="four";break;
  case 5: s="five";break;
  case 6: s="six";break;
  case 7: s="seven";break;
  case 8: s="eight";break;
  case 9: s="nine";break;
  default: s="error";
}
puts(s);
}
But if you were to replace the s=*; with puts(*); it will compile to the equivalent of a series of if-else statements instead of a jump table... some compilers will do this anyhow, but you can do the same thing a slightly different way that is optimized on all compilers just by using an array of const strings.

Here is a basic example:

Code: Select all

enum{ZERO,ONE,TWO,THREE,FOUR,FIVE,SIX,SEVEN,EIGHT,NINE,LASTNUM};
const char *strings[]=
{"zero","one","two","three","four","five","six","seven","eight","nine","error"};
static inline void put_string(size_t x, size_t last){
   puts(strings[ (x < last) ? x : last]);
}
//put_string(ZERO, LASTNUM);
Its not too difficult to follow and can save a lot of code in the long run
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Moose On The Loose
Posts: 965
Joined: Thu 24 Feb 2011, 14:54

#36 Post by Moose On The Loose »

[quote="technosaurus"]

Code: Select all

const char *strings[]=
{"zero","one","two","three","four","five","six","seven","eight","nine","error"};
static inline void put_string(size_t x, size_t last){
   puts(strings[ (x < last) ? x : last]);
}
//put_string(ZERO, LASTNUM);
Why the "size_t" for something that will be used as an array index?

BTW: If you need a huge number of strings, there is a way you can compress the strings at the cost of overhead in the display process and work at the compile time.

It is very common to have a bunch of messages with the same words in them. Messages also never have the characters 128 and above in them. You can dribble the string out with a putc() checking each character for being above 128 as you go. If you see a value above 128 you recur with (ThisCharacter-128+DICTIONARY_START)

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#37 Post by technosaurus »

Yes, it seems odd, but if an index is not the equivalent of size_t, the compiler will add an extra MOV instruction to extend it.

Re string compression. I thought about using 0-X for run length encoding and 128-255 for dictionary entries.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#38 Post by technosaurus »

I wrote a macro that implements a buffered replacement for *printf() based on my strcpy_ALL code that allows (forces) you to do away with format strings altogether.

Code: Select all

int write_chars(int fd, const char **a){
	char buf[4096]; /*alignas(PAGESIZE)?*/
	size_t offset=0;
	int ret=0;
	const char *s;
	while(s=*a++){
		while(*s){
			buf[offset++]=*s++;
			if (offset==sizeof(buf)){
				ret += write(fd,buf,offset);
				offset=0;
			}
		}
	}
	if (offset) ret+=write(fd,buf,offset);
	return ret;
}

#define FDPRINTF(fd,...) write_chars(fd,(const char *[]){__VA_ARGS__,NULL})
#define FPRINTF(fs,...) FDPRINTF(fileno(fs),__VA_ARGS__)
#define PRINTF(...) FDPRINTF(1,__VA_ARGS__)
#define EPRINTF(...) FDPRINTF(2,__VA_ARGS__)
So the format is significantly different from their lower case non-macro counterparts, but the same things can be accomplished.

Code: Select all

printf("start: %d,%d end\n", 0xFFCF, 999);
PRINTF("start : ", itoa(0xFFCF), ",", itoa(999), " : end\n");
So it is really formatted more like C++ cout
...maybe I should rename it accordingly.

I'm working on a sprintf/snprintf replacement next ... not sure if I will be able to combine them or not yet.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#39 Post by technosaurus »

I was working on PDMP3 and found that pow(x,4.0f/3.0f) was considerably faster when converted to cbrt((x*x)*(x*x)), but then I tried to optimize the cbrt part and combined them as follow:

Code: Select all

/* Description: returns x^(4/3)
 * same as cbrt((x*x)*(x*x)), but optimized for the limited cases we handle (integers 0-8209)
 */
static inline float pow43opt2(float x) {
  if (x<2) return x;
  else x*=x,x*=x; //pow(x,4)
  float f3,x2=x+x;
  union {float f; unsigned i;} u = {x};
  u.i = u.i/3 + 0x2a517d3c; //~cbrt(x)
  int accuracy_iterations=2;  //reduce for speed, increase for precision
  while (accuracy_iterations--){ //Lancaster iterations
    f3=u.f*u.f*u.f;
    u.f *= (f3 + x2) / (f3 + f3 + x);
  }
  return u.f;
}
This is roughly 50% faster than using musl's similar cbrtf() function or even gcc's __builtin_cbrtf() ... maybe because it doesn't deal with negative values and over 200% faster if accuracy_iterations=0.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#40 Post by technosaurus »

I took a look at some of the math functions and came up with a way to make some of the functions compile fast or small with the same code using taylor series approximation.

Code: Select all

float inverse_factorial_f[]={
  0.0, 1.000000e+00,  5.000000e-01, 1.666667e-01,  4.166667e-02, 8.333333e-03, 1.388889e-03, 1.984127e-04,
};

float cosf(float x){
  float xx=-(x*x), term=1, res=1;
  int i, max=8;  //taylor series => 1-x^2/2!+x^4/4!-x^6/6!+x^8/8!...
  for (i=2;i<max;i+=2)
    res+=(term*=xx)*inverse_factorial_f[i];
  return res;
}

float sinf(float x){
  float xx=-(x*x), term=x, res=x;
  int i, max=8; //taylor series => x-x^3/3!+x^5/5!-x^7/7!+x^9/9!-...
  for (i=3;i<max;i+=2)
    res+=(term*=xx)*inverse_factorial_f[i];
  return res;
}

float atanf(float x){
  float xx=-(x*x), term=x, res=x;
  int i, max=8; //taylor series => x-x^3/3+x^5/5-x^7/7+x^9/9-...
  for (i=3;i<max;i+=2)
    res+=(term*=xx)/i;
  return res;
}

float expf(float x){
  float term=x, res=1+x;
  int i, max=10; //taylor series => 1+x+x^2/2!+x^3/3!+x^4/4!+x^5/5!...
  for (i=2;i<max;++i)
    res+=(term*=x)*inverse_factorial_f[i];
  return res;
}

Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Moose On The Loose
Posts: 965
Joined: Thu 24 Feb 2011, 14:54

#41 Post by Moose On The Loose »

technosaurus wrote:I took a look at some of the math functions and came up with a way to make some of the functions compile fast or small with the same code using taylor series approximation.
On a Pentiuuuuum, it is often faster to do a multiply or divide than to do a table look up. This is because you can get a cache miss on the first access to a table. If the table straddles a page boundary, you can get two misses.

Way back on a Z80, when coding a game I needed sin() and cos() very inaccurately. I observed that the first half cycle of sin() looks a lot like the shape of X(1-X) from 0 to 1 to work well enough to look reasonable.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#42 Post by technosaurus »

Moose On The Loose wrote:On a Pentiuuuuum, it is often faster to do a multiply or divide than to do a table look up. This is because you can get a cache miss on the first access to a table. If the table straddles a page boundary, you can get two misses.
With -O3 these small loops and the lookup tables are unrolled/inlined, so that isn't a problem; with -Os the code is quite a bit smaller... Unlike many implementations that have a ton of compile time options to control which hand optimized implementation to use, I prefer to let the user choose which is more important without much effort. Often this is accomplished using a simplified implementation that the compiler can optimize as desired.

more optimized for speed memset

Code: Select all

//unlike memset, returns next address after ... useful in memset
__attribute__ ((optimize("3"))) static inline void *mempset64(void *dest, unsigned long long x,unsigned long len){
	unsigned long long *dp=dest;
  while(len--)*dp++=x;
  return dp;
}

__attribute__ ((optimize("3"))) static inline void *mymempset(void *dest,  int x,unsigned long len){ 
  unsigned char *dp = dest;
  while ((unsigned long)dp&7ULL)
    *dp++=x; //align to 8byte boundary
  len -= dp - (unsigned char *) dest;
  if (len>7)  dp = mempset64(dest,x*0x0101010101010101ULL,len>>3); //set 8 byte chunks
  len &= 7;
  while (len--)
    *dp++=x; //set remaining <8 bytes
  return dp;
}

__attribute__ ((optimize("3"))) static inline void *mymemset(void *dest,  int x,unsigned long len){ 
	(void)mymempset(dest,x,len);
	return dest;
}
this is for 64 bit arches: a memset32 call should probably use x|x<<8|x<<16|x<<24 instead of the magic multiply (except x should be unsigned)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#43 Post by technosaurus »

Next task, convert this proxy to use my get.c code.
http://www.murga-linux.com/puppy/viewtopic.php?p=671246
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#44 Post by technosaurus »

gcc and clang reduce the following rotate right/left macros to a single instruction

Code: Select all

#define ROL(x,y) (x<<y)|(x>>((sizeof(x)*CHAR_BIT) -y))
#define ROR(x,y) (x>>y)|(x<<((sizeof(x)*CHAR_BIT) -y))
here are some associated functions:

Code: Select all

static inline unsigned char rolb(const unsigned char x,const unsigned char y){
  return ROL(x,y);
}
static inline unsigned char rorb(const unsigned char x,const unsigned char y){
  return ROR(x,y);
}

static inline unsigned short rolw(const unsigned short x,const unsigned char y){
  return ROL(x,y);
}
static inline unsigned short rorw(const unsigned short x,const unsigned char y){
  return ROR(x,y);
}

static inline unsigned roll(const unsigned x,const unsigned char y){
  return ROL(x,y);
}

static inline unsigned rorl(const unsigned x,const unsigned char y){
  return ROR(x,y);
}

static inline unsigned long long rolll(const unsigned long long x,const unsigned char y){
  return ROL(x,y);
}
static inline unsigned long long rorll(const unsigned long long x,const unsigned char y){
  return ROR(x,y);
}
I also rewrote most of the ctype functions (all except the wide char functions) to be branchless... its better for compiler optimizations especially for vectorizing code to simd when things like -mavx2 or non-standard x86_64 instruction sets are enabled

Code: Select all

static inline int isalnum(int c){
	return ((unsigned)c-'0' < 10)|(((unsigned)c|32)-'a' < 26);
}

static inline int isalpha(int c){
	return (((unsigned)c|32)-'a' < 26);
}

static inline int isascii(int c){
	return (unsigned)c<128;
}

static inline int isblank(int c){
	return (c==' ')|(c=='\t');
}

static inline int iscntrl(int c){
	return ((unsigned)c < 0x20) | (c == 0x7f);
}

static inline int isdigit(int c){
	return (unsigned)c-'0' < 10;
}

static inline int isgraph(int c){
	return (unsigned)c-0x21 < 0x5e;
}

static inline int islower(int c){
	return (unsigned)c-'a' < 26;
}

static inline int isprint(int c){
	return (unsigned)c-0x20 < 0x5f;
}

static inline int ispunct(int c){
	return ((unsigned)c-0x21 < 0x5e) & //isgraph
	!(((unsigned)c-'0' < 10)|(((unsigned)c|32)-'a' < 26)); //!isalnum

}

static inline int isspace(int c){
	return ((unsigned)c-'\t' < 5)|(c == ' ');
}

static inline int isupper(int c){
	return (unsigned)c-'A' < 26;
}

static inline int isxdigit(int c){
	return ((unsigned)c-'0' < 10) | (((unsigned)c|32)-'a' < 6);
}

static inline int tolower(int c){
	return c | ((isupper(c))<<5);
}

static inline int toupper(int c){
	return c & 0x5f & (-((unsigned)c-'a' < 26));
}
Sometimes you can't eliminate all of the branches, but you can minimize them. Take strncpy() for example, where all elements before the null terminator are copied and the rest up to "n" are '\0'.

Code: Select all

char *mystrncpy(char * restrict dest, const char * restrict src, size_t n){
  char * restrict dp=dest;
  if (n) do {
    *dp++=*src;
    src+=!!*src; //only increment src pointer till the '\0' is reached
  } while (--n);
  return dest;
}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

Moved to github

#45 Post by technosaurus »

I am moving development of libc.h to github and renaming it to
Brad's Quixotic C
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#46 Post by technosaurus »

I recently removed most of the __builtin_*(...) wrappers because there is no standardize way to check for them (yet another thing to suggest to the C standards board ... Clang's has_builtin() would be a good standard) I do plan on putting them back in, but I wanted to have a fallback for unsupported browsers as well as older versions of compilers like gcc-4.2.1

In case anyone else wants to do something similar, this is how to grok 90% of them.
1. Create a wrapper around the __builtin_*(...)

Code: Select all

v4hi pmulhrw(v4hi a, v4hi b){return __builtin_ia32_pmulhrw(a,b);}
2. Compile it with -S to get the assembly output (or use gcc.godbolt.org)

Code: Select all

pmulhrw:
        movdq2q %xmm1, %mm0
        movdq2q %xmm0, %mm1
        pmulhrw %mm0, %mm1
        movq2dq %mm1, %xmm0
        ret
3. Grok the assembly for the appropriate line(s) of code into inline asm(it helps to know the platform's calling convention, so you can tell which line are just to move the input parameters and returns)
For this case it is really just:

Code: Select all

         pmulhrw %mm0, %mm1
Which becomes this inline asm:

Code: Select all

v4hi __not_builtin_pmulhrw(v4hi a, v4hi b){__asm("pmulhrw %1, %0":"+y"(a):"y"(b));return a;}
Note the registers get replaced with %0 and %1, those are the parameter numbers in order and that instead of using "r" for a general purpose register, I used "y" for an mmx register according to https://gcc.gnu.org/onlinedocs/gcc-5.3. ... aints.html
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#47 Post by technosaurus »

I have had a few projects where I needed to share code between C and javascript. Rather than having to update 2 separate files or run a build process to generate both, I came up with some hacks to allow the code to be valid in both:

http://stackoverflow.com/a/35012334/1162141

Code: Select all

/* C comment ends with the '/' on next line but js comment is open  *\
/ //BEGIN C Block
#define function int
/* This ends the original js comment, but we add an opening '/*' for C  */

/*Most compilers can build K&R style C with parameters like this:*/
function volume(x,y,z)/**\
/int x,y,z;/**/
{
  return x*y*z;
}

/**\
/
#undef function
#define var const char**
#define new (const char*[])
#define Array(...)  {__VA_ARGS__}
/**/

var cars = new Array("Ford", "Chevy", "Dodge");

/* Or a more readable version *\
/// BEGIN C Block
#undef var
#undef new
/* END C Block */
You can do something similar for Java by using the "??/" triglyph for the '\'
and setting up some macros and structs with function pointers as they did here.
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#48 Post by technosaurus »

musl libc uses some funky #include +macro hackery to map enums to strings for strerror, and although it is pretty clever, its not quite obvious what it is doing since the data is in a separate file, so here is the simplified version:

Code: Select all

#define TAG_MAP { \
	_MAP(TAG_BODY,"body"), \
	_MAP(TAG_HEAD,"head"), \
	_MAP(TAG_HTML,"html"), \
	_MAP(TAG_UNKNOWN,"unknown"), \
}

#define _MAP(x,y) x
enum tags TAG_MAP;
#undef _MAP
#define _MAP(x,y) y
const char *tagstrings[] = TAG_MAP;
#undef _MAP
//usage: printf("%s\n",tagstrings[TAG_HTML]);
This could be extended for any amount of tabular data
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#49 Post by technosaurus »

... and some more macro hackery

this allows you to reduce multiple 3-line #ifdefs to a single line or even inline them in your functions

Code: Select all

#define PASTE_(x,y) x##y
#define PASTE(x,y) PASTE_(x,y)
#define ENABLED(...) __VA_ARGS__
#define DISABLED(...)
#define NOT_DISABLED ENABLED
#define NOT_ENABLED DISABLED
#define IF_ENABLED(x,...) x(__VA_ARGS__)
#define IF_NOT_ENABLED(x,...) PASTE(NOT_,x)(__VA_ARGS__)
example

Code: Select all

#define PNG_SUPPORT ENABLED
#define JPG_SUPPORT DISABLED
void init(void){
  IF_ENABLED(PNG_SUPPORT, init_png();)
  IF_ENABLED(JPG_SUPPORT, init_jpg();)
  return;
}

int main(void){
	puts("supported types:\n"
		IF_ENABLED(PNG_SUPPORT,     "\tpng supported\n")
		IF_ENABLED(JPG_SUPPORT,     "\tjpeg supported\n")
		IF_NOT_ENABLED(JPG_SUPPORT, IF_NOT_ENABLED(PNG_SUPPORT, "\tnone supported\n"))
	);
}
vs. the traditional way

Code: Select all

#define PNG_SUPPORT
#define JPG_SUPPORT
void init(void){
#ifdef PNG_SUPPORT
  init_png();
#endif
#ifdef JPG_SUPPORT
  init_jpg();
#endif
  return;
}

int main(void){
  puts("supported types:\n"
#ifdef PNG_SUPPORT
    "\tpng supported\n"
#endif
#ifdef JPG_SUPPORT
    "\tjpeg supported\n"
#endif
#if !defined(JPG_SUPPORT) && !defined(PNG_SUPPORT)
    "\tnone supported\n"
#endif	
  );
}
It works for multiple commands as well:

Code: Select all

IF_ENABLED(PNG_SUPPORT,
int *getRGBfromPNG(void *buf, void *return_data){
  //etc...
})
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Ibidem
Posts: 549
Joined: Wed 26 May 2010, 03:31
Location: State of Jefferson

#50 Post by Ibidem »

Well, I've been poking at bqc.
So far, I've implemented _socketcall() (looking at musl src/internal/syscall.h to figure out how) and almost all the socketcall wrappers.
I've also discovered a small (*cough*) problem.
With GCC 5.3.x (stock for Alpine Linux) on i386 and the standard flags (-nostdlib -nostartfiles), apparently the argc/argv initialization doesn't work; for example, if I run ./get google.com /index.html it thinks argc is 0.
(I hacked a debug line in to check that.)
Hardcoding the host/url seems to result in a 'working' binary.

Attaching a patch (git format-patch) to fix what I can figure out.
Attachments
socketcall.patch.gz
gzipped _socketcall() implementation, along with socketcall-based networking functions
(2.08 KiB) Downloaded 261 times

Post Reply