Site banner
.
Home Forums Blogs Articles Photos Videos Contact FAQ                    
.
.
Wisdom Archive
Body Mind and Soul
Faith and Belief
God and Religion
Law of Attraction
Life and Beyond
Love and Happiness
Peace of Mind
Peace on Earth
Personal Faith
Spiritual Festivals
Spiritual Growth
Spiritual Guidance
Spiritual Inspiration
Spirituality and Science
Spiritual Retreats
More Wisdom
Buddhism Archives
Hinduism Archives
Sustainability
Theology Archives
Even more Wisdom
2012 - Year 2012
Affirmations
Aura
Ayurveda
Chakras
Consciousness
Cultural Creatives
Diksha (Deeksha)
Dream Dictionary
Dream Interpretation
Dream interpreter
Dreams
Enlightenment
Essential Oils
Feng Shui
Flower Essences
Gaia Hypothesis
Indigo Children
Kalki Bhagavan
Karma
Kundalini
Kundalini Yoga
Life after death
Mayan Calendar
Meaning of Dreams
Meditation
Morphogenetic Fields
Psychic Ability
Reincarnation
Spiritual Art, Music & Dance
Spiritual Awakening
Spiritual Enlightenment
Spiritual Healing
Spirituality and Health
Spiritual Jokes
Spiritual Parenting
Vastu Shastra
Womens Spirituality
Yoga Positions
Site map 2
Site map


Dream Sharing Forum

at Global Oneness Community.

Share your dreams and let others help you with the interpretation!
Dream Sharing Forum



.

Central processing unit - Design and implementation

Central processing unit - Design and implementation: Encyclopedia II - Central processing unit - Design and implementation

Central processing unit - Integer precision. The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common decimal (base ten) numeral system to represent numbers internally. A few other computers have used more exotic numeral systems like ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "h ...

See also:

Central processing unit, Central processing unit - History, Central processing unit - Discrete transistor and IC CPUs, Central processing unit - Microprocessors, Central processing unit - CPU operation, Central processing unit - Design and implementation, Central processing unit - Integer precision, Central processing unit - Clock rate, Central processing unit - Parallelism, Central processing unit - Vector processors and SIMD, Central processing unit - Notes

Central processing unit, Central processing unit - CPU operation, Central processing unit - Clock rate, Central processing unit - Design and implementation, Central processing unit - Discrete transistor and IC CPUs, Central processing unit - History, Central processing unit - Integer precision, Central processing unit - Microprocessors, Central processing unit - Notes, Central processing unit - Parallelism, Central processing unit - Vector processors and SIMD, CISC, Computer bus, Computer engineering, CPU cooling, CPU core voltage, CPU design, CPU power dissipation, Floating point unit, Instruction pipeline, Instruction set, Notable CPU architectures, RISC, Wait state

Central processing unit: Encyclopedia II - Central processing unit - Design and implementation



Central processing unit - Design and implementation

Main article: CPU design

Central processing unit - Integer precision

The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common decimal (base ten) numeral system to represent numbers internally. A few other computers have used more exotic numeral systems like ternary (base three). Nearly all modern CPUs represent numbers in binary form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" voltage. [7]

Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a bit refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called "word size," "bit width," "data path width," or "integer precision" when dealing with strictly integer numbers (as opposed to floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an 8-bit CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 28 or 256 discrete numbers. In effect, integer precision sets a hardware limit on the range of integers the software run by the CPU can utilize. [8]

Integer precision can also affect the number of locations in memory the CPU can address (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one octet (8 bits), the maximum quantity of memory that CPU can address is 232 octets, or 4 GiB. This is a very simple view of CPU address space, and many modern designs use much more complex addressing methods like paging in order to locate more memory with the same integer precision.

Higher levels of integer precision require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and generally expense. It is not at all uncommon, therefore, to see 4- or 8-bit microcontrollers used in modern applications, even though CPUs with much higher precision (such as 16, 32, 64, even 128 bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra precision (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit precisions, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM System/370 used a CPU that was primarily 32 bit, but it used 128-bit precision inside its floating point units to facilitate greater accuracy and range in floating point numbers (Amdahl et al. 1964). Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required.

Central processing unit - Clock rate

Main article: Clock rate

Most CPUs, and indeed most sequential logic devices, are synchronous in nature. [9] That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a clock signal, usually takes the form of a periodic square wave. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate period for the clock signal.

This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the "edges" of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a transistor-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below).

Architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs, though. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.

One method of dealing with the switching of unneeded components is a technique called clock gating, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. [10] Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire CPUs have been built without utilizing a global clock signal. Two notable examples of this are the ARM compliant AMULET and the MIPS R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous. For example, using asynchronous ALUs in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for embedded computers (Garside et al. 1999).

Central processing unit - Parallelism

Main article: Parallel computing

The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as subscalar, operates on and executes one instruction on one or two pieces of data at a time.

This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result the subscalar CPU gets "hung up" on instructions which take more than one clock cycle to complete execution. Even adding a second execution unit (see below) does not improve performance much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach scalar performance (one instruction per clock). However, the performance is nearly always subscalar (less than one instruction per cycle).

Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. Instruction level parallelism (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and thread level parallelism (TLP) purposes to increase the number of threads (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application. [11]

Main articles: Instruction pipelining, Superscalar

One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding before the previous instruction has finished execution. This is the simplest form of a technique known as instruction pipelining, and is utilized in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is retired.

Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check for these sorts of conditions and delay a portion of the instruction pipeline if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than strictly scalar ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an instruction spending more than one clock cycle in a stage).

Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be superscalar include a long instruction pipeline and multiple identical execution units. In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If they can the CPU dispatches them to any available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.

Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of CPU cache. It also makes hazard-avoiding techniques like branch prediction, speculative execution, and out-of-order execution crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may or may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies.

In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The original Intel Pentium (P5) had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the Pentium architecture, P6, added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.

Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (IPC). [12] Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the very long instruction word (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.

Another strategy commonly used to increase the parallelism of CPUs is to include the ability to run multiple threads (programs) at the same time. In general, high-TLP CPUs have been in use much longer than high-ILP ones. Many of the designs pioneered by Cray during the late 1970s and 1980s concentrated on TLP as their primary method of enabling enormous (for the time) computing capability. In fact, TLP in the form of multiple thread execution improvements has been in use since as early as the 1950s (Smotherman 2005). In the context of single processor design, the two main methodologies used to accomplish TLP are Chip-level multiprocessing (CMP) and Simultaneous multithreading (SMT). On a higher level, it is very common to build computers with multiple totally independent CPUs in arrangements like Symmetric multiprocessing (SMP) and Non-uniform memory access (NUMA). [13] While being very different means, all of these techniques accomplish the same goal: increasing the number of threads that the CPU(s) can run in parallel.

The CMP and SMP methods of parallelism are similar to one another and the most straightforward. These involve little more conceptually than the utilization of two or more complete and independent CPUs. In the case of CMP, multiple processor "cores" are included in the same package, sometimes on the very same integrated circuit. [14] SMP, on the other hand, includes multiple independent packages. NUMA is somewhat similar to SMP but uses a nonuniform memory access model. This is important for computers with many CPUs because each processor's access time to memory is quickly exhausted with SMP's shared memory model, resulting in significant delays due to CPUs waiting for memory. Therefore, NUMA is considered a much more scalable model, successfully allowing many more CPUs to be used in one computer than SMP can feasibly support. SMT differs somewhat from other TLP improvements in that it attempts to duplicate as few portions of the CPU as possible. While considered a TLP strategy, its implementation actually more resembles superscalar design, and indeed is often used in superscalar microprocessors (such as IBM's POWER5). Rather than duplicating the entire CPU, SMT designs only duplicate parts needed for instruction fetching, decoding, and dispatch, as well as things like general-purpose registers. This allows an SMT CPU to keep its execution units busy more often by providing them instructions from two different software threads. Again, this is very similar to the ILP superscalar method, but simultaneously executes instructions from multiple threads rather than executing multiple instructions from the same thread concurrently.

Central processing unit - Vector processors and SIMD

Main articles: Vector processor, SIMD

A less common but increasingly important paradigm of CPUs (and indeed, computing in general) deals with vectors. The processors discussed earlier are all referred to as some type of scalar device. [15] As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar processors, which deal with one piece of data for every instruction. These two schemes of dealing with data are generally referred to as SISD (single instruction, single data) and SIMD (single instruction, multiple data), respectively. The great utility in creating CPUs that deal with vectors of data lies in optimizing tasks that tend to require the same operation (for example, a sum or a dot product) to be performed on a large set of data. Some classic examples of these types of tasks are multimedia applications (images, video, and sound), as well as many types of scientific and engineering tasks. Whereas a scalar CPU must complete the entire process of fetching, decoding, and executing each instruction and value in a set of data, a vector CPU can perform a single operation on a comparatively large set of data with one instruction. Of course, this is only possible when the application tends to require many steps which apply one operation to a large set of data.

Most early vector CPUs, such as the Cray-1, were associated almost exclusively with scientific research and cryptography applications. However, as multimedia has largely shifted to digital mediums, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after floating point execution units started to become commonplace to include in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like Intel's MMX were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with floating point numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's x86-associated SSE and its successors, SSE2 and SSE3; the PowerPC-related AltiVec (also known as VMX); and MIPS MDMX. [16]

Other related archives

06-30, 1945, 8-bit, ALUs, AMULET, ARM, AltiVec, Apollo guidance computer, Arbitrary-precision arithmetic, Athlon 64 X2, CISC, CPU cache, CPU cooling, CPU core voltage, CPU design, CPU power dissipation, Chip-level multiprocessing, Classic RISC pipeline, Clock rate, Computer bus, Computer engineering, Cray, Cray Inc., Cray-1, Digital Equipment Corporation, EDVAC, ENIAC, Floating point unit, GiB, Harvard Mark I, Harvard architecture, History of computing hardware, IA-32, IBM, IC, ILP, Instruction level parallelism, Instruction pipeline, Instruction pipelining, Instruction set, Intel 4004, Intel 8080, Intel Pentium, Itanium, John von Neumann, Konrad Zuse, MIPS, MMX, Moore's law, NOR gates, Non-uniform memory access, Notable CPU architectures, Opteron, P6, PDP-11, PDP-8, POWER4, POWER5, Parallel computing, Pentium 4, PowerPC, RISC, SIMD, SISD, SPARC, SSE, SSE2, SSE3, Simultaneous multithreading, Superscalar, Symmetric multiprocessing, System/360, System/370, TLP, UltraSPARC T1, Vector processor, Von Neumann architecture, Wait state, Xbox 360, addition, address space, arithmetic logic unit, arithmetic overflow, automobiles, binary, bitwise operations, boolean logic, branch prediction, capacitance, cell phones, clock rates, combinatorial logic, computer memory, computer programs, contact bounce, cryptography, decimal, digital, digital computer, direct current, dot product, electrical relays, electromigration, embarrassingly parallel, embedded computers, emulator, floating point, floating point execution units, functions, hazard, input/output, instruction, instruction pipelining, instruction set architecture, integrated circuit, integrated circuits, kHz, loops, main memory, mainframe computer, mainframes, memory address, microcomputers, microcontrollers, microprocessor, microprogram, millimeters, minicomputers, multimedia, numeral system, octet, opcode, out-of-order execution, paging, parallelism, period, personal computer, primary storage, printed circuit boards, program counter, punched paper tape, quantum computer, register, scalar, semiconductor, sequential logic, software, speculative execution, square wave, subthreshold leakage, supercomputers, ternary, thread level parallelism, threads, transistor, vacuum tube, vacuum tubes, vector (spatial), vector processors, very long instruction word, video game console, voltage, word size, x86, x86-64, zSeries



Adapted from the Wikipedia article "Design and implementation", under the G.N U Free Docmentation License. Please also see http://en.wikipedia.org/wiki

More material related to Central Processing Unit can be found here:
Main Page
for
Central Processing Unit
Index of Articles
related to
Central Processing Unit


« Back








Search the Global Oneness web site
Global Oneness is a huge, really huge, web site. Almost whatever you are searching for within health, spirituality, personal development and inspirationals - you will find it here!
Google
 
 

Rate this article!

Please rate this article with 10 as very good and 1 as very poor.

.








Sneak-Peek of Global Oneness Community

Hi friend! The Global Oneness Community, the place for information and sharing about Oneness is not really launched yet (you will see there is still some clean up to do) ...but it is now open for a sneak-peek! And if you wish - please register and become one of the very first members to do so! Jonas

Forum Home, Articles, Photo Gallery, Videos, News, Sitemap
...and much more!


Dream Sharing Forum

at Global Oneness Community.

Share your dreams and let others help you with the interpretation!
Dream Sharing Forum



Forum
Articles
Images Pictures
Videos
News
Sitemap




 

 

 

 

 


 








  » Home » » Home »