filesystem as a database using only shell

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

filesystem as a database using only shell

#1 Post by technosaurus »

I don't know why this has never been done before, but can't seem to find anything similar via google, so here is the template for a shell+filesystem based database (basically just started - in planning stages)

Code: Select all

#!/bin/ash

#+-{ALL_DBS}
#  +-{DB}
#    +-{table}
#      +-0 set -- COL1 COL2 ... #header
#      +-1 COL1="" COL2="" ... #more records follow
#  +-transactions #used to undo

ALL_DBS="$HOME/.SHBASE"
DB="${ALL_DBS}/${1:-default}"
[ -d "$TABLE" ] && mkdir -p "$TABLE"

create_table(){
	mkdir -p "$ALL_DBS/$DB"
}

drop_table(){
	echo will hide the associated table by renaming to ".$table_name"
	echo this transaction will be recorded for undoing
}

delete_table(){
	echo will actually delete the table
	echo this is not a recorded transaction since it cannot be undone
}

insert_into(){
	echo will record transaction and add the new record
}

resync_db(){
	echo rewrites all records to remove superceded transactions
	echo also reinitializes transaction table when complete
	echo this will speed up new transactions at the cost of lost history
	echo it can also reduce the size of the DB to possibly fit in inodes
}

select(){
	echo will return the fields that match
}

undo_transaction(){
	echo this will eventually revert transaction $1
}

undo_transactions_since(){
	echo this will eventually revert all transactions since transaction $1
}

update(){
	echo transaction, then fields
}

usage(){
	echo $0 database action ...
}

where(){
	echo will return all fields where all $@ matches
}
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#2 Post by jpeps »

ACID compliant?

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#3 Post by technosaurus »

That is the plan
I've considered using a file for each record with file 0 as the header that uses a format as:
COL1 column_two COL3
so that the order can be set using set -- < DB/table/0
each column must fit the requirements for a shell variable
to change the order just echo COL3 COL1 column_two >DB/table/0

then each record starting at 1000000000000 (up to 9 trillion records)
I already planned to allow an internal history for each record
COL1='hello' column_two='world' COL3='foo'
#transaction-1234
COL3="bar"

so to read the record it is sourced and you would get
. $DB/$TABLE/$recordnum
echo $COL1 $column_two $COL3
hello world bar

Note: this is slightly simplified as the actual variable names may come from the header file and require an eval echo type statement

but to prevent issues like bad userspace programs doing two transactions with a delay in between allowing a third transaction to take place in between, numeric values can be set as:
#transaction-1235
COL4=$((${COL4}-999))

which is what you would use to decrement a balance by 9.99
shell doesn't directly support float types, so integer only - could use awk or dc for float types (later version maybe)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#4 Post by jpeps »

technosaurus wrote:That is the plan
Shouldn't take too much more than a million hours of testing; at least that was the case for the sqlite folks. That's after the app is fully developed, of course.
I think Lobster was working on one a while back. Sqlite is already in base, so why not use it?

Shell doesn't even round; dc isn't much better.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#5 Post by technosaurus »

just trying to get something that will work for busybox-only host
I think by using the filesystem it can grow infinitely (limited only by filesystem) without slowing down much
I'm actually modelling it after sqlite, but even more simplified

I may end up using awk for calculations since it can be built into busybox and has a lot of good math functions
calc(){ awk "BEGIN{print $1}"; }

I have create, insert, update working

Code: Select all

#!/bin/ash

#+-{ALL_DBS}
#  +-{DB}
#    +-{table}
#      +-.0 set -- COL1 COL2 ... #header
#      +-.trans #used to undo
#      +-1000000000000 COL1="" COL2="" ... #more records follow


ALL_DBS="$HOME/.SHBASE"

create_table(){		#makes table $2 in DB $1 with col for each extra arg
	mkdir -p "$TABLE"
	echo "COLS='$@'"	> "$TABLE/.0"
	echo 'LAST=999999999999' >> "$TABLE/.0"
}

databases(){		#lists all tables in DB $1
	for database in "$ALL_DBS/"*;do
		echo ${database##*/}
	done
}

drop_table(){		#hide $1=DB $2=table by renaming to ".table_name"
	echo this transaction will be recorded for undoing
	mv "$TABLE" "$DB/.${TABLE##*/}"
}

delete_table(){		#delete $1=DB $2=table
	rm -rf "$TABLE"
}

insert(){		#add new record to $DB/$TABLE 'args="entries"'
	. "$TABLE/.0"
	echo "LAST=$(($LAST+1))" >>"$TABLE/.0"
	echo "$@" >> "$TABLE/$(($LAST+1))"
}

resync_db(){
	echo rewrites all records to remove superceded transactions
	echo also reinitializes transaction table when complete
	echo this will speed up new transactions at the cost of lost history
	echo it can also reduce the size of the DB to possibly fit in inodes
}

select(){		#$1=DB $2=table $@ is ???
[ ! "$1" ] && . "$TABLE/.0" && set -- $COLS
[ ! "$RECORDS" ] && RECORDS="*"
#not finished - still thinking for this and where()
}

table_info(){
	. "$TABLE/.0"
	echo $COLS
}

tables(){		#lists all tables in DB
	for TABLE in "$DB/"*;do
		echo ${TABLE##*/}
	done
}

undo_trans(){
	echo this will eventually revert transaction $1
}

undo_since(){
	echo this will eventually revert all transactions since transaction $1
}

update(){ 		#update records with 'COL_NAME="some value"' ...
	for RECORD in $RECORDS; do
		[ ${RECORD##*/} -lt 1000000000000 ] && RECORD="${RECORD%/*}/$((${RECORD##*/}+1000000000000))"
		echo "$@" >> "$RECORD"
	done
}

usage(){
	echo "$0 [database] [table] [record(s)] action ..."
	echo "$FUNCS"
	exit
}

where(){
	echo will return all record numbers where all $@ matches
}

FUNCS=`while read LINE; do case "$LINE" in *"()""{"*)echo "$LINE";;esac;done < "$0"`
ACTIONS=`echo "$FUNCS"|while read LINE; do printf "${LINE%%(*}|";done`"quit"

while [ "$1" ];do
	eval "case \"$1\" in $ACTIONS)break;;esac"
	[ "$DB" ] && [ "$TABLE" ] && [ "$RECORDS" ] && RECORDS="$RECORDS $TABLE/$1"
	[ "$DB" ] && [ "$TABLE" ] && [ ! "$RECORDS" ] && RECORDS="$TABLE/$1"
	[ "$DB" ] && [ ! "$TABLE" ] && TABLE="$DB/$1"
	[ ! "$DB" ] && DB="$ALL_DBS/$1"
	shift
done
case " $DB $TABLE $RECORDS " in *"./"*)echo nice try;exit;;esac
[ ! "$1" ] || [ "$1" == "--help" ] || [ "$1" == "-h" ] && usage
$@
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
sunburnt
Posts: 5090
Joined: Wed 08 Jun 2005, 23:11
Location: Arizona, U.S.A.

#6 Post by sunburnt »

I seem to recall years ago someone did a text file database.

Different from what you`re proposing, but maybe simpler?

seaside
Posts: 934
Joined: Thu 12 Apr 2007, 00:19

#7 Post by seaside »

technosaurus,

What? No awk.

I thought it was your favorite :D

Look Mum! No database!

From someone who's been from spreadsheet to database
and back....

Regards,
s

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#8 Post by technosaurus »

awk is perfect for small single file based database where each table is a file and records have a separator (see pet package "database" - could really use some awking) but it is not as good at loading external files on the fly (or I haven't figured out a good way). I wrote one a while back, but that was the one to throw away.

I actually put quite a bit of thought into this structuring.
By separating the records out into files and using the filesystem:
- time to get a record is reduced significantly in large databases
- multiple records can be accessed by separate process
- locking is handled by filesystem's file locks
- data fields can be of any size without large write penalties
- can run in a cgi script with busybox httpd
- it is easy to port (I typically prototype my C stuff in shell)
- no special tools to browse (just cd and ls or file manager)
- its easy to integrate with other applications
- allows encryption if the filesystem is encrypted
- allows compression if the filesystem supports it (BTRFS)
- allows extensibility if the filesystem supports it (NFS, GFS, 9p)
- allows simple backups and failover mechanisms (same as filesystem)
- you get last access/modify time for free for each record
- file based tools like inotifyd can do something when records change
- binary data is just a file name

here is the current hashing

Code: Select all

#!/bin/ash

#+-{ALL_DBS}
#  +-{DB}
#    +-{table}
#      +-.0 set -- COL1 COL2 ... #header
#      +-.trans #used to undo
#      +-1000000000000 COL1="" COL2="" ... #more records follow


ALL_DBS="$HOME/.SHBASE"

create_table(){		#makes table $2 in DB $1 with col for each extra arg
	mkdir -p "$TABLE"
	echo "COLS='$@'"	> "$TABLE/.0"
	echo 'LAST=999999999999' >> "$TABLE/.0"
}

databases(){		#lists all tables in DB $1
	for database in "$ALL_DBS/"*;do
		echo ${database##*/}
	done
}

drop_table(){		#hide $1=DB $2=table by renaming to ".table_name"
	echo this transaction will be recorded for undoing
	mv "$TABLE" "$DB/.${TABLE##*/}"
}

delete_table(){		#delete $1=DB $2=table
	rm -rf "$TABLE"
}

insert(){		#add new record to $DB/$TABLE 'args="entries"'
	. "$TABLE/.0"
	echo "LAST=$(($LAST+1))" >>"$TABLE/.0"
	echo "$@" >> "$TABLE/$(($LAST+1))"
}

resync_db(){
	echo rewrites all records to remove superceded transactions
	echo also reinitializes transaction table when complete
	echo this will speed up new transactions at the cost of lost history
	echo it can also reduce the size of the DB to possibly fit in inodes
}

select(){		#$1=DB $2=table $@ is ???
[ ! "$1" ] && . "$TABLE/.0" && set -- $COLS
[ ! "$RECORDS" ] && RECORDS="*"
}

table_info(){
	. "$TABLE/.0"
	echo $COLS
}

tables(){		#lists all tables in DB
	for TABLE in "$DB/"*;do
		echo ${TABLE##*/}
	done
}

undo_trans(){
	echo this will eventually revert transaction $1
}

undo_since(){
	echo this will eventually revert all transactions since transaction $1
}

update(){ 		#update records with 'COL_NAME="some value"' ...
	for RECORD in $RECORDS; do
		[ ${RECORD##*/} -lt 1000000000000 ] && RECORD="${RECORD%/*}/$((${RECORD##*/}+1000000000000))"
		echo "$@" >> "$RECORD"
	done
}

usage(){
	echo "$0 [database] [table] [record(s)] action ..."
	echo "$FUNCS"
	exit
}

where(){
	echo will return all record numbers where all $@ matches
}

FUNCS=`while read LINE; do case "$LINE" in *"()""{"*)echo "$LINE";;esac;done < "$0"`
ACTIONS=`echo "$FUNCS"|while read LINE; do printf "${LINE%%(*}|";done`"quit"

while [ "$1" ];do
	eval "case \"$1\" in $ACTIONS)break;;esac"
	[ "$DB" ] && [ "$TABLE" ] && [ "$RECORDS" ] && RECORDS="$RECORDS $TABLE/$1"
	[ "$DB" ] && [ "$TABLE" ] && [ ! "$RECORDS" ] && RECORDS="$TABLE/$1"
	[ "$DB" ] && [ ! "$TABLE" ] && TABLE="$DB/$1"
	[ ! "$DB" ] && DB="$ALL_DBS/$1"
	shift
done
case " $DB $TABLE $RECORDS " in *"./"*)echo nice try;exit;;esac
[ ! "$1" ] || [ "$1" == "--help" ] || [ "$1" == "-h" ] && usage
$@
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
Flash
Official Dog Handler
Posts: 13071
Joined: Wed 04 May 2005, 16:04
Location: Arizona USA

#9 Post by Flash »

Is this a Relational Dabase, or just a plain old Database? Is a database related to a Content-Addressable Memory?

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#10 Post by jpeps »

seaside wrote:technosaurus,

What? No awk.

I thought it was your favorite :D

Look Mum! No database!

From someone who's been from spreadsheet to database
and back....

Regards,
s
The point of using a database is the ability to perform complex joins from multiple tables without having to go through all the gymnastics noted in the link. Of course, someone would first have to learn how to use a database.

Examples:

http://zetcode.com/db/sqlite/joins/

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#11 Post by technosaurus »

Flash wrote:Is this a Relational Dabase, or just a plain old Database? Is a database related to a Content-Addressable Memory?
Well it already has support for multiple databases with multiple tables containing multiple records with multiple data fields. All that is required to make it relational is to add a function that source another record like:
. DB/Table/record
(pretty simple to do once I get the select function working)
Content-Addressable Memory basically comes free if the filesystem supports it for the FAT - to get to he records (still need to process the records), but it can even store the whole thing in ram if it is mounted on a ramfs or tmpfs

I mentioned inotifyd earlier which could be used as an optional trigger daemon if the database is going to be used by external programs (otherwise triggers can be done internally in a ReSTful way on a per transaction basis)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

User avatar
tallboy
Posts: 1760
Joined: Tue 21 Sep 2010, 21:56
Location: Drøbak, Norway

#12 Post by tallboy »

I don't know why this has never been done before, but can't seem to find anything similar via google
It has been done.
I have scrutinized several bookmark files back to 2005, but without any luck. I know there was a database project going, that was based on primarily Linux builtins, but also bash commands, but I cannot remember any name, and cannot find any link. I wonder if it was a german or dutch project, used in an educational setting to make students aware of the possibilities that lies under your fingertips, as opposed to a fancy GUI database.
The problem is mainly that the words you use in a Google query to find such a base, are so general, and give so many hits, that it is almost impossible to use!

tallboy
True freedom is a live Puppy on a multisession CD/DVD.

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#13 Post by jpeps »

tallboy wrote:
I don't know why this has never been done before, but can't seem to find anything similar via google
It has been done.
I have scrutinized several bookmark files back to 2005, but without any luck.
Fast forward to 2013: There are a zillion sophisticated GUI apps that utilize databases, and you don't have to read a book on SQL commands to use them.
Sqlite has already found its way onto a few billion devices (none of which use a command line). Developers need to understand the language, and the engines themselves are now extremely well tested and reliable (think BIG DATA).

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#14 Post by technosaurus »

Code: Select all

where(){
#stub, but this was the PITA part (todo add other comparisons and iterate over $@ with shift)
eval val=\$${1%=*}
[ "$val" == "${1#*=}" ] && echo match
}
usage: where COL="some value"

some others related discussions
https://news.ycombinator.com/item?id=5229883
http://webdevrefinery.com/forums/topic/ ... ngle-file/

Note: we can use readahead to get in-memory benefits

Another advantage symlinks to get almost free extra indices but put them in hidden directories with .NAME
also should store large binaries in a hidden .BINARY directory
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

amigo
Posts: 2629
Joined: Mon 02 Apr 2007, 06:52

#15 Post by amigo »

When you first started the thread I thought you had in mind to use no files at all -that entries would only consist of directories and/or links.

Using flat files to hold data becomes a space problem since every file has a minimum size even if it only contains one character. The block size of the filesystem determines the minimum size requirement -usually at least 4k.

Since the filesystem is _itself_ a database, the structures are already there -in the metadata. But the problem becomes handling very long entries which could exceed the allowable amount permitted for directory names; characters which are not allowed in dirnames, and maximum number of directory entries allowed by the filesystem.

I know of an implementation of this idea, where lists of files and/or directories must be compared. Each item gets represented as a directory -no real data is ever written.

Since the size allotted for the metadata of each item is constant, the filesystem will always show as being virtually empty. Using 'ls -l' or 'stat' will tell you everything you need to know.

Not long ago I found a shell-based database and mentioned it in a database thread here -I can't find the name at the moment, though...

User avatar
tallboy
Posts: 1760
Joined: Tue 21 Sep 2010, 21:56
Location: Drøbak, Norway

#16 Post by tallboy »

technosaurus, no matter how you succeed with your database, it is always a pleasure to read your code!

tallboy
Last edited by tallboy on Wed 15 May 2013, 19:17, edited 1 time in total.
True freedom is a live Puppy on a multisession CD/DVD.

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#17 Post by jpeps »

amigo wrote:When you first started the thread I thought you had in mind to use no files at all -that entries would only consist of directories and/or links.

Using flat files to hold data becomes a space problem since every file has a minimum size even if it only contains one character. The block size of the filesystem determines the minimum size requirement -usually at least 4k.
That's just the beginning..think about complexity of memory allocation.

re: planning for ACID compliance. There's the small issue of database integrity. Databases use a complex series of locks, so that the master file maintains integrity while various users are simultaneously reading and writing to it. Issues like that are why they need millions of tests.

For a personal database, there are many ways to create a single file. I have a "MyLibrary" database that creates an XML file which is easily parsed for individual books and their related data.

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#18 Post by technosaurus »

That was the original plan, but then I read about a couple of filesystems that can store small files in the inode table, which most well planned databases should be. Using the files allows it to maintain a history, but if speed and storage is more important than keeping history it can rewrite the files once they grow large enough that they cannot fit in the inode (git also uses the filesystem and can do a similar operation IIRC)
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

jpeps
Posts: 3179
Joined: Sat 31 May 2008, 19:00

#19 Post by jpeps »

technosaurus wrote:That was the original plan, but then I read about a couple of filesystems that can store small files in the inode table, which most well planned databases should be.
Wouldn't that involve a separate inode for every process of every user of every file?

User avatar
technosaurus
Posts: 4853
Joined: Mon 19 May 2008, 01:24
Location: Blue Springs, MO
Contact:

#20 Post by technosaurus »

jpeps wrote:Wouldn't that involve a separate inode for every process of every user of every file?
Sorry for the delayed reply, this kind of dropped off my radar, but to answer the question, each file gets an inode in the filesystem, but processes typically would only be opening 1 file descriptor (or so) at a time.

This would also be a good way to implement light weight, parse-free configuration.

the c psuedo code would look like

Code: Select all

int get_value(const char *path, void *buf,size_t len){
  int fd = open(path, O_RDONLY);
  if (fd<0) return fd;
  len=read(fd,buf,len);
  close(fd);
  return len; 
}
int set_value(const char *path, void *buf,size_t len){
  int fd = open(path, O_CREAT|O_WRONLY|O_TRUNC);
  if (fd<0) return fd;
  len=write(fd,buf,len);
  close(fd);
  return len; 
}
So rather than having a file with lots of VAR=value,
It would be a directory containing files named VAR with value as the contents.
This could be a simple RGB color for a background, a string for a tooltip or even a struct for a desktop icon.

Now all you have to do is run an inotify watch on the directory in case entries are added/changed/removed and the config program can be completely separated. The total size of config data may be a bit larger, but now you don't have to rewrite/parse an entire configuration file for every single change. You don't need to keep global variables around for holding the values (just read'em when you need'em) If you are really clever, with a few code changes you can put the type for each variable in the first byte so that it is simple to automatically build a config gui (for example the enum FILENAME would indicate a type 1 and the gui could read the data as a string for a file selector widget)

Edit: Here are some filesystems capable of storing small files with inode data rather than allocating a whole ??kb block
ext4, btrfs, zfs, reiserfs, gpfs
Check out my [url=https://github.com/technosaurus]github repositories[/url]. I may eventually get around to updating my [url=http://bashismal.blogspot.com]blogspot[/url].

Post Reply