Design and Implementation of a 2-input Arithmetic and Logic Unit using Quantum-dot Cellular Automata

Clinton Chakma, Fairuza Laila and Ismat Rahman
Department of Computer Science and Engineering, University of Dhaka
*Email: ismat@cse.du.ac.bd
Received on 25 August 2023, Accepted for Publication on 25 January 2024

ABSTRACT
Quantum-Dot-Cellular Automata (QCA), an emerging nanotechnology rooted in Coulomb repulsion, holds the ability to supplant orthodox complementary metal-oxide semiconductor (CMOS) technology. Its distinctive advantages lie in ultra-low power consumption, fast switching speed, and high-density structures. This study conducts an extensive literature review to introduce an Arithmetic and Logic Unit (ALU) based on QCA principles. The proposed QCA-based ALU incorporates fundamental logic gates, adders, and subtractors, leveraging the latest XOR gate. Utilizing simulation via QCA Designer 2.0.3, an in-depth performance evaluation of the model is conducted, comparing it with established ALUs based on metrics encompassing cell count, delay, and area. The findings showcase the efficiency of the proposed ALU, characterized by its minimal requirement of 277 cells, an area occupancy of 0.43 µm², and a delay of 2.75 clock cycles. This outcome highlights the favorable attributes of our design compared to existing alternatives, suggesting its potential contribution to the advancement of QCA-based digital circuitry.

Keyword: Quantum Cellular Automata, Arithmetic and Logic Unit, QCA Designer 2.0.3, Performance Analysis

1. Introduction
For the past few years, Complementary Metal-Oxide-Semiconductor (CMOS) technology has reached its boundary toward the down scaling of transistor size. But on the other hand, an excessive number of devices packed in a limited area of the transistor is leaving no available space for heat dissipation. The problems like current leakage problems and heat dissipation limit CMOS technology in designing fast and energy-efficient systems at the same time [1]. In recent times, the International Technology Roadmap for Semiconductors (ITRS) argued that Quantum Dot Cellular Automata (QCA) is one of the auspicious technology besides single electron transistor (SET), Tunneling Phase Logic (TPL), Resonant Tunneling Diode (RTD) and carbon nanotubes (CNT) that can replace CMOS technology. The main advantage of QCA is that it is a very high-speed digital circuit that can be operated in ultra-low power and implemented on a nanoscale [2]. Unlike CMOS, the binary states are represented by the configuration of electrons in quantum dots in QCA. The configuration of the electrons of QCA relies on coulombic interactions between the electron pair. This allows the QCA technology to attain a very high switching speed which results in ultra-low power consumption. QCA also facilitates constructing digital circuits with extremely low heat dissipation.

2. Basics of QCA
2.1 QCA Cell
In 1993, Lent et al. introduced the concept of Quantum-dot cellular automata (QCA) [3]. QCA cells are the basic building blocks for building digital circuits using QCA technology. A QCA cell consists of four quantum dots and two electrons reside inside any two of those dots. The movement of electrons from one cell to another is restricted. In the ground state, the two electrons within a cell are positioned diagonally from each other to reduce the repulsive force stemming from their mutual Coulombic interaction. Depending on the orientations of the electrons, a QCA cell shows two polarities. When the two electrons occupy the upper left corner and the lower right corner, it corresponds to polarity -1 as shown in Fig 1(b). The configuration where the two electrons remain in the upper right corner and the lower-left corner corresponds to polarity +1 as shown in Fig 1(a). These bipolar states are mapped to binary 0 and 1.

Fig. 1. Polarization of QCA Cell

2.2 Gates
NOT gate and 3-input Majority gate are the most important two gates in QCA. OR gates as well as AND gate can be constructed from the majority gate easily. These gates provide the foundation to implement complex digital circuits.

2.2.1 NOT Gate
The Output cell is placed diagonally with respect to the
input cell. The electrons in the output cell rearrange themselves to align in an order opposite to the input cell.

### 2.2.2 Majority Gate

The fundamental unit of QCA is the Majority gate, comprised of a total of five cells: three input cells, one output cell, and a pivotal cell positioned at the center, referred to as the device cell, responsible for carrying out computations. The output polarity is derived from the Coulomb repulsion among the three inputs, establishing the polarity of the device cell [4]. Equation 1 indicates the function of a majority gate with input cells A, B, C, and output cell F. Fig 2 illustrates two distinct implementations of majority gates. The second variant, initially introduced by Roohi et al. [5] has been adopted in the proposed ALU design owing to its advantageous input-output connectivity characteristics.

\[
F(A, B, C) = AB + BC + CA \quad (1)
\]

\[
F(A, B, 0) = AB \quad (2)
\]

\[
F(A, B, 1) = A + B \quad (3)
\]

![Fig. 2. Majority Gates](image)

A strategic configuration involves designating one of the three inputs as either -1.00 or +1.00, facilitating the realization of an AND gate or an OR gate, respectively. These configurations are depicted in Equation 2 and Equation 3.

### 2.3 QCA Clocking

All proposed Quantum Cellular Automata (QCA) circuit designs necessitate a clocking mechanism to synchronize and regulate the flow of information. This clocking function also serves as the energy source for circuit operations. A QCA clocking zone denotes a cluster of QCA cells under the influence of a common QCA clock. Within the realm of QCA, four distinct clocking zones have been identified, namely Switch, Hold, Relax, and Release. These delineated clocking zones play a pivotal role in governing the controlled propagation of signals.

The demonstration of clocking in QCA is achieved through the modulation of potential barriers amid neighboring quantum dots [6]. Lower potential barriers result in electron delocalization, signifying an absence of definite polarization. As the potential barriers are elevated, electron localization becomes prominent, determining the polarity of the cell based on its adjacent cells. Upon reaching maximum potential, the cell becomes latched, assuming a state wherein it can function as a virtual input for the subsequent cell. Consequently, the actual input cell becomes free to accept a different input value. This dynamic enables the effective pipelining of QCA circuits. The graphical representation of Clocking Phases and four distinct clocking zones are illustrated in Fig 3(a) and Fig 3(b) respectively.

![Fig. 3. QCA Clocking](image)

### 3. Related works

In 1993, Tougaw and Lent introduced QCA cells as a viable solution for realizing XOR gates and single-bit full adders [3]. A decade later, in 2003, Wang et al. enhanced and simplified the previous design through the application of majority logic reduction techniques [7]. Building upon these advancements, Pudi and Sridharan proposed a QCA-based parallel prefix adder architecture in 2012, achieving a design comprising 94 QCA cells [8]. Significantly, they utilized techniques of majority logic reduction to minimize the delay to just one clock cycle.

Moving forward to 2021, Gassoumi et al. presented a novel full adder design requiring only 19 cells, holding an area of 0.01 \( \mu \text{m}^2 \), and demonstrating a delay of 0.5 clock cycles [23]. Subsequently, in 2022, Gao and Mohammed introduced a more recent iteration of a full adder, having 18 cells, a compact area of 0.02 \( \mu \text{m}^2 \), and a delay of 0.50 clock cycles [9].

In the year 2017, Zoka and Gholami introduced a comprehensive full adder-subtractor unit composed of 83 cells, occupying an area of 0.09 \( \mu \text{m}^2 \), and exhibiting a latency of 1.5 clock cycles [10]. Expanding upon this line of research, Sadeghi et al. achieved a substantial reduction in complexity in 2021, by designing a full adder-subtractor circuit consisting of 43 cells, with a compact footprint of 0.01 \( \mu \text{m}^2 \) and a minimal delay of 0.5 clock cycles [11].

In 2010, a pioneering one-bit ALU capable of concurrently executing 12 ALU operations was introduced by Ganesh [12]. The design resulted in an ALU comprising 494 cells, occupying an area of 0.92 \( \mu \text{m}^2 \), and exhibiting a delay of 3- clock cycles. In 2013, Ghosh et al. presented a multilayer ALU featuring 12 operations based on input signals, with an area of 0.76 \( \mu \text{m}^2 \), and a delay of 5 clock cycles [13].

Babaie et al. proposed a five-layered ALU design in 2018 that incorporated novel components like a 4:1 multiplexer and a novel full adder [14]. This ALU
consisted of 319 cells, occupied an area of 0.245 $\mu m^2$, and demonstrated a 2.25-clock cycle delay. In 2018, Anthony and Arvindan proposed two types of ALUs in their paper [15].

A subsequent 2020 design by Singh et al. focused on basic ALU operations integrating with two 2:1 multiplexers [16]. This ALU comprised 294 cells, occupied 0.61 $\mu m^2$, and exhibited a delay of four-clock cycles. In the same year of 2020, Ahmadpour et. al. introduced two ALUs in two different papers [17] [18].

In 2022, Abbasizadeh et al., proposed an ALU capable of 12 operations, comprising 248 cells, occupying an area of 0.15 $\mu m^2$, and showcasing a 2.25-clock cycle latency [19]. Additionally, Ravi and Veena introduced efficient ALU blocks such as XOR, XNOR, Full Adder, Half Adder, Multiplier, and 2:1 MUX in the same year [20]. However, these blocks were not integrated into a complete ALU. Furthermore, Tripathi and Wariya presented a cost-efficient ALU design in 2022, featuring 391 cells, an area of 0.54 $\mu m^2$, a latency of 2 clock cycles, and a quantum cost of 1.081 [21]. Table 1 shows the performance comparisons of some recent existing studies of QCA based ALU based on cell count, area and latency.

3.1 Research Gap

Despite the considerable volume of research dedicated to XOR gates and Adders, limited attention has been directed towards the design of an Arithmetic and Logic Unit (ALU) within the QCA framework that can offer enhanced speed and area efficiency. A noteworthy advancement in this domain surfaced when Gassoumi et al. [22] introduced an innovative 3-input XOR gate in the year 2022. This XOR gate configuration comprised a total of 12 cells, exhibited a latency of 0.5 clock cycles, and occupied an area of $\mu m^2$. The importance of three-input XOR gates arises from their central contribution to the enhancement of design efficiency in both full Adders and Subtractors.

<table>
<thead>
<tr>
<th>Paper</th>
<th>Cell count</th>
<th>Area($\mu m^2$)</th>
<th>Latency(CC)</th>
</tr>
</thead>
<tbody>
<tr>
<td>[12]</td>
<td>494</td>
<td>0.92</td>
<td>3</td>
</tr>
<tr>
<td>[14]</td>
<td>319</td>
<td>0.245</td>
<td>2.25</td>
</tr>
<tr>
<td>[16]</td>
<td>294</td>
<td>0.61</td>
<td>4</td>
</tr>
<tr>
<td>[19]</td>
<td>248</td>
<td>0.15</td>
<td>2.25</td>
</tr>
<tr>
<td>[21]</td>
<td>391</td>
<td>0.54</td>
<td>2</td>
</tr>
</tbody>
</table>

Table 1: Performance Comparison of ALUs

The core focus of our proposed undertaking centers on the development of a partial ALU system by manipulating QCA gates. At the basic level, this proposition involves incorporating Gassoumi’s [22] recently innovated XOR gate as a fundamental building block within our original designs for Adder and Subtractor circuits. In summary, the objectives of this research can be outlined as follows:

- Conduct simulation and comparative analysis to evaluate the performance of the proposed design against existing literature-based designs.

4. Proposed design

The primary goal of the proposed design approach is to create a modular structure, wherein each component can be independently designed before being integrated into the final assembly. To achieve this, the input lines are strategically positioned along the left side of the structure. Within the middle section, distinct substructures execute various arithmetic and logic operations. In the proposed model, the arithmetic operations include a Full adder and a Full subtractor. On the other hand, AND and OR operations are included in the logic unit. Ultimately, the outputs of these individual components are directed to a 4:1 Multiplexer, culminating in the ultimate output situated at the rightmost section of the model. The Logical Circuit Diagram of the proposed Arithmetic and Logic Unit is illustrated in Fig 4.

Fig. 4. Logical Circuit Diagram of the proposed ALU

4.1 Full Adder-Subtractor Unit

Comprising three inputs (A, B, C) and two outputs (Sum and Carry-Out), Full Adders are fundamental components in digital circuitry. The Sum output results from an XOR operation utilizing a modified three-input XOR gate by Gassoumi et al. [22]. Notably, the Carry-Out equation bears resemblance to that of a majority voter, and we adopt a rotated Majority gate design as shown in Figure 2b, tailored to our ALU structure.

\[
\text{Sum} = A \oplus B \oplus C \quad (4)
\]

\[
\text{Carry} = AB + BC + CA \quad (5)
\]

A Full Subtractor similarly possesses two outputs: Difference and Borrow-Out. The computation of Difference mirrors that of the Sum. Therefore, its calculation is integrated without separate computation. Notably, the equation for Borrow-Out shares resemblance with a Majority voter equation, albeit with an inversion requirement for one input.

\[
\text{Difference} = A \oplus B \oplus C \quad (6)
\]

\[
\text{Borrow} = AB + BC + CA \quad (7)
\]

Integration of the adder and subtractor elements yields an
adder-subtractor unit. This unit shows a compact configuration, requiring 44 cells and occupying an area of 0.06 \( \mu m^2 \), while maintaining a latency of 0.5 clock cycle. Refer to the upper-left section of Fig 5 for a visual depiction of the Adder-subtractor unit.

4.2 Logic Unit

The Logic unit encompasses both an AND gate and an OR gate. The implementation of these gates can be readily achieved through the utilization of a majority voter. As previously mentioned, the equations governing the behavior of the AND and OR gates have been described in equations 2 and 3.

4.3 Comparison

In the next step, this study conducted a comprehensive performance analysis of the newly proposed ALU in comparison to previously suggested ALU designs [12, 14, 16, 17, 18, 19, 21], utilizing critical performance metrics such as cell count, occupied area, and latency.

The simulation results demonstrate a noticeable improvement in reducing cell count and occupied spatial footprint when designing a partial QCA-based ALU. These comparative assessments are visually represented in Figure 7. The graph explicitly shows that the QCA-based ALU proposed in this study requires fewer cells and occupies less area than most of the prior studies [12, 14, 16, 17, 18, 21]. Moreover, it is evident from the graph that the design presented in [21] boasts lower latency than the proposed ALU design. Specifically, the proposed design achieves spatial efficiency with 40% less area utilization than the contemporary design by Tripathi et al. [21]. Furthermore, the graph highlights that the ALU proposed by [19] has a lower cell count and occupies less area, than the proposed ALU. This is because the ALU designed by [19] consists of a full adder and a 4:1 multiplexer, whereas the proposed ALU not only includes a full adder but also incorporates a full subtractor and a 4:1 multiplexer.

5. Simulation and Experimental Results

This section presents the outcomes of the simulation of the proposed ALU design. Initially, an extensive simulation of each individual operation was executed using QCA Designer 2.0.3 to verify the accuracy of the output for each respective operation. This step was carried out before integrating the 4:1 multiplexer, culminating in generating the ultimate proposed ALU circuit configuration as shown in Fig.5. The output of the combined circuit emerging from the adder-subtractor unit and the Logic Unit encompasses four distinct signals: CarryOut, BorrowOut, AND, and OR. Remarkably, each individual output necessitates one complete clock cycle to manifest. Herein, the selection inputs, denoted as S0 and S1, dictate the operation of the multiplexer, while the final output is denoted by “Out.” Notably, attaining the final output entails a latency of 2.75 clock cycles as shown in Fig. 6. Within the architecture of the adder-subtractor unit, our integrated full adder structure is composed of 22 cells, holds an area of 0.03 \( \mu m^2 \), and exhibits a delay of 0.5 clock cycles. Similarly, the full subtractor component encompasses equivalent area and delay characteristics, albeit with a higher cell count. The power dissipation of the designed ALU is 7.45 e\(^{-3}\) eV in average.
6. Conclusion

A comprehensive methodology was employed to fabricate a full adder-subtractor unit, leveraging the latest XOR gate and an innovative Majority Voter-based logic unit. These components were then seamlessly integrated to construct a partial Arithmetic and Logic Unit (ALU). This proposed ALU accurately performs essential operations, including addition and subtraction as well as AND and OR logical operations. Following this, an extensive comparative analysis was undertaken to evaluate the performance of the proposed design against contemporary recent research studies of the existing literature. The evaluation was based on critical performance metrics, including cell count, occupied area, and latency. The proposed model’s construction involves the utilization of a 4:1 multiplexer, a three-input XOR gate, and four majority gates.

Notably, the structural design of the proposed ALU allows for the convenient inclusion of additional operations in the future, such as NOT and two-input XOR. In essence, the proposed design constitutes a partial Arithmetic and Logic Unit equipped with the capacity to accommodate further arithmetic operations like multiplication and division as the circuit implements both full adder and subtractor. This framework also lays the foundation for the potential development of a comprehensive n-bit Arithmetic and Logic Unit with expanded functionalities.

The adoption of quantum computers is becoming unavoidable in the near future, driven by the rising need for high-speed computations. QCA emerges as a promising and more dependable substitute for CMOS technology, characterized by reduced power consumption, enhanced speed, and assured compactness. Meanwhile, both a full adder and a full subtractor play pivotal roles as foundational components in crafting the essential architecture of Arithmetic Logic Units (ALU). The suggested model showcases remarkable performance improvements, particularly regarding cell count, area coverage, and latency. This is particularly noteworthy in the context of extensive and intricate designs for advanced Very Large Scale Integration (VLSI) applications.

References
