About single and double precision

For discussions about programming, programming questions/advice, and projects that don't really have anything to do with Puppy.
Post Reply
Message
Author
User avatar
mahaju
Posts: 487
Joined: Mon 11 Oct 2010, 07:11
Location: between the keyboard and the chair

About single and double precision

#1 Post by mahaju »

1. What is single and double precision exactly?
Does single precision imply exactly this
http://en.wikipedia.org/wiki/Single_pre ... int_format
only and double precision a system with twice the number of bits?

2. Or can the terms be used in a more general way, for eg, if I have a device that works on 10 bit numbers at a time, I call it single precision and if I modify it in some way so that it can now work on 20 bit numbers, then I call it double precision?

Also, a single precision float number is 32 bits (4 bytes) which is the size of a number that can be handled at one time by modern processors. So if I work with an 8086, would single precision imply 16 bits? I think this was the case before C was standardized so if it is true my point number 2 stated above should be valid. Please give me your ideas.

Thank you
:D

ken geometrics
Posts: 76
Joined: Fri 23 Jan 2009, 14:59
Location: California

Re: About single and double precision

#2 Post by ken geometrics »

mahaju wrote:1. What is single and double precision exactly?
Does single precision imply exactly this
http://en.wikipedia.org/wiki/Single_pre ... int_format
only and double precision a system with twice the number of bits?
As a general rule, these days, floating point numbers are stored in the IEEE floating point format. A single precision means a 4 byte float and double means an 8 byte float. The machines also often have what is called "long double". On the X86 like machines this generally means a 10 byte floating point value.

It is an bad thing about the definition of C that the floating point values are not specified as an accuracy and exponent range. This means that code is not really portable to and from machines that have floating point number sizes other than 4 and 8 byte. It also
means that there are some checks on you code that can't really be done at compile time.

On many machines, there is a speed cost to using a 4 byte float in your code. The floating point section may do its work in doubles and the storing process requires a conversion step. A double gets flung out of the core into the cache without this step. On the X86 machines the 10 byte "long double" is the native format of the floating point section.

On modern processors with a floating point section, there is a speed advantage to using floating point values. The integer section can be doing the addressing calculation at the same time as the floating point is working out the value. The really slow thing is the loading and storing of values. It gets really really slow if there is a cache miss. For this reason, it is best to compute a value rather than looking up in tables unless the computing takes much too many cycles.

Post Reply