#### ISSN 0002-306X. Proc. of the RA NAS and NPUA Ser. of tech. sc. 2023. V. LXXVI, N3

UDC 621.382

MICROELECTRONICS

DOI: 10.53297/0002306X-2023.v76.3-325

#### **H.T. GRIGORYAN**

# THE DATA TO CLOCK ALIGNMENT METHOD IN QUARTER-RATE HIGH SPEED TRANSMITTERS

Modern hyperscale data centers are increasingly using DSP-based transceivers to compensate for high channel loss. Transceivers operating at a speed equal to or greater than 56 *Gbps*, prove to be a viable alternative to analog transceivers in terms of power and area requirements, delivering better overall performance. The design process of such transceivers is very challenging. The most sensitive parts are the transmitter and receiver which are the main building blocks for such systems. In addition, clocking architecture and data-to-clock alignment is critical for achieving optimal performance. This article provides a data and clock alignment method for high-speed quarter-rate transmitters.

Keywords: SERDES, transmitter, current mode logic, clock, receiver, PAM4.

**Introduction.** In recent years, there has been a transition from 28 *Gbps* to 56 *Gbps* and 112 *Gbps* serial links, which enable switches and networking ASICs to increase throughput from 6.4 *Tbps* to 12.8 *Tbps* and 25.6 *Tbps*, respectively. To keep up with the exponential growth of data center servers, the throughput is forecasted to reach 51.2 *Tbps* in the next few years. Higher SERDES speeds mean fewer server ports, cables and beach-front congestion in the data center, which helps keep up with the throughput growth. However, the power efficiency of SERDES needs to be improved or at least remain the same. Advances in package, connectors, and PCB technology are also necessary, but they have not been able to keep up with the increase in bandwidth and speed. Consequently, the burden is placed on SERDES designers and circuits innovation. For example, at a 112 *Gbps* speed, every channel is a long reach channel, and a common WAN port switch channel has at least 20 *dB* loss, compared to a 14 *dB* loss at a speed of 56 *Gbps*. Most of the 112 *Gbps* channels have more than 35 *dB* channel loss.

The clocking architectures. The clock frequency and data serialization scheme are the most critical design decisions in the TX architecture. To achieve an output data rate of 112 *Gb/s* using PAM-4 modulation, one practical approach is to use either a half-rate architecture with a 28 *GHz* differential clock or a quarter-rate architecture with 14 *GHz* quadrature clocks [1]. The half-rate architecture eliminates the need for quadrature clock spacing error calibration, but it has a high power consumption in the clock distribution network due to the small fan-outs required to

guarantee optimal jitter performance. Additionally, meeting timing requirements across the process voltage temperature variations is challenging in the 1-UI-window data-serialization path. In contrast, the quarter-rate TX design mitigates clock generation and distribution challenges by using half the clock frequency. Using multi-phase clocks for the 4:1 data-serialization scheme, the serializer timing window is also relaxed to 3 UI. However, the quarter-rate architecture presents challenges in designing quadrature clock spacing error detection/correction circuits and a high-bandwidth 4:1 serializer.

Performing the 4:1 serialization [2] before the final driver stage, as shown in Fig.1a reduces the serializer power consumption, but increases the number of full-rate signal nets and leads to extra power consumption in the pre-driving stage [3].



Fig. 1. A 4:1 Serialization

To extend the bandwidth at the internal nets, passive or active inductive compensation techniques can be used, but this comes at the penalty of increased area or additional power consumption. Combining the serializer with the output driver and performing data serialization directly at the pad significantly reduces overall area (Fig. 1b). However, this approach requires an inductor only at the pad to compensate for the large capacitance from the driver and ESD protection diodes. The main challenge with this architecture is seamlessly combining the 4:1 serializer with the driver to provide a sufficiently large output swing and bandwidth

**The problem description.** Maintaining sufficient bandwidth to support the full-rate output is crucial for the final 4-to-1 serializer, which is one of the most essential components in a quarter-rate transmitter. Let's consider current mode driver with 4:1 mux shown in Fig. 2.



Fig. 2. The current mode driver with 4x1 multiplexor

The data serialization process is as follows. With quarter rate different phases the corresponding data signals are selected. With clk0 and clk90, data0 is sampled. With clk90 and clk180 data1 is sampled respectively. It is crucial to have correct data-to-clock alignment in order to have setup hold margins for data selection and omit data errors.

In Fig. 3, we can see that because of PVT variation there is a data-to-clock alignment issue for one of the 3 input NANDs which is part of 4:1 mux. As the data selection takes place when the corresponding phases of the clock are both logically high, the most desirable data-clock alignment will be to have the clock overlap section in the middle of the data.



Fig. 3. The Data-to-clock relation

**The proposed method.** The proposed method performs calibration before starting the actual data transmission. The output of serializer has dedicated data outputs which will go to calibration unit. data\_cal<0> and data\_cal<2> are shifted from each other by 90 degrees. The purpose of this algorithm to have clk90 and data\_cal<0> aligned so that from both sides of data signal there is a 1UI margin. Calibration starts with a defined calibration code. Then it sweeps downwards. By the code sweep phase the mixer changes the phase of the serializer clock (Fig. 4).



Fig. 4. A block diagram for data-to-clock alignment

This results in a shift in the output serialized data. By decreasing the code, the data is shifted left. out\_p and out\_m voltages are generated using the cell in Fig. 6 which is inside the align block. The cells steer the current unit into the left and right branches. When both signals are logic, high current will flow through the unit and out\_p/m will start to decrease. So, based on the time when the clock and the data signals overlap, voltage will be generated. The calibration process ends after out\_p and out\_m signals are equal. This indicates that clk90 is in the middle of the data\_cal\_p<0> signal. Fig. 5 shows how the calibrated clock and data signals should look like.



Fig. 5. The calibrated data-to-clock relation

The align cell is shown in Fig. 6. Eight identical units are used and input signals are chosen in a way to have each unit operating during 1 clock period. Each unit has 3 serially connected nmos devices. Let's consider one scenario for the left most unit. It has as an input clk270, data<0> and data<2>. The last 2 already have 90 degree phase shift from each other. All 3 transistors will be open when 3 of these signals overlap and are logic high. This will steer the current to left most branch and out\_p voltage will decrease. Similarly other units will start to conduct each one after 1 UI.



Fig. 6. The schematic view of the align cell

The left 4 units are responsible for out\_p voltage generation and the right 4 is out\_m voltage generation. If the clock is left from the center, this translates to lower out\_p voltage as visible from the start process of calibration (Fig. 7). Afterwards the calibration logic increases the code which results in the data shift into the right. This reduces the overlap of data and clock between the right 4 units and increases the left 4 units. This results in an increase in filtered out\_p voltage and a decrease of out\_m voltage because in 1 period the right branch is open more than the left branch. Fig. 7 shows that for code 89, voltages are equal. At the end several times code is increased and decreased to make sure that 89 is the most optimal code.



Fig. 7. The calibration process

**Simulation Results.** After this the calibration process data transmission is started with the obtained code. Eye diagrams before and after calibration are shown in Fig. 8. Both vertical and horizontal openings are increased. The vertical opening is increased from 401 mv to 865 mv and the horizontal one is improved from 10.8 ps to 16.6 ps. Simulation is done with Hspice [4] simulator using various PVT variations. Eye diagrams shown in Fig. 8 are for typical process. The temperature for this corner is 25°C. Also, the supply noise is included in simulation to calculate the jitter impact because of supply variation.

Besides the code value, setup time between clk90 and data<0> is calculated. For a typical a case, after calibration, it is 18 *ps*. Considering that ideal UI value is 17.6 *ps* the obtained results are acceptable for proper operation in 112 *Gbps*. In case of SS corner with 125°C the calibrated setup time is 18.8 *ps* and for FF this 16.6 *ps* respectively. Most parts that are used for the calibration process are already available in modern transceivers [5, 6]. The only added blocks are the align unit and the cal logic. The calibration process is performed before data transmission and consumes 0.62 *mA*. The verall calibration process takes 60*ns* to complete.



Fig. 8. Eye diagrams for transmitter NRZ mode

**Conclusion.** The proposed method performs data-clock alignment by which eye vertical and horizontal openings are improved. It can be used in high speed SERDES transceivers. For simulations, SAED 14 *nm* FinFet [7] technology is used.

#### REFERENCES

- A 112Gb/S 2.6pJ/b 8-Tap FFE PAM-4 SST TX in 14nm CMOS / C. Menolfi, et al // 2018 IEEE International Solid - State Circuits Conference (ISSCC).- San Francisco, CA, USA, 2018.- P. 104-106.
- A 40-to-64 Gb/s NRZ Transmitter With Supply-Regulated Front-End in 16 nm FinFET / Y. Frans, et al // IEEE Journal of Solid-State Circuits.- Dec. 2016.-Vol. 51, no. 12.-P. 3167-3177.-. doi: 10.1109/JSSC.2016.2587689.

- A 112 Gb/s PAM-4 56 Gb/s NRZ Reconfigurable Transmitter With Three-Tap FFE in 10-nm FinFET / J. Kim, et al // IEEE Journal of Solid-State Circuits.- Jan.2019.-Vol. 54, no. 1.- P. 29-42, doi: 10.1109/JSSC.2018.2874040.
- 4. Hspice Reference Manual, Synopsys Inc.- 2017. -846p.
- Roshan-Zamir A., Elhadidy O., Yang H. -W., and Palermo S. A Reconfigurable 16/32 Gb/s Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS // IEEE Journal of Solid-State Circuits. - Sept. 2017. - Vol. 52, no. 9.- P. 2430-2447, doi: 10.1109/JSSC.2017-.2705070.
- 3.5 A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS / J. Kim, et al // 2015 IEEE International Solid-State Circuits Conference (ISSCC), Digest of Technical Papers. - San Francisco, CA, USA, 2015.- P. 1-3, doi: 10.1109/ISSCC.2015.7062925.
- Melikyan V., Martirosyan M., Piliposyan G. 14nm Educational Design Kit // Capabilities, Deployment and Future, Small Systems Simulation Symposium. - 2018. -P. 42-55.

National Polytechnic University of Armenia. The material is received on 29.08.2023.

### *Հ.*Տ. ԳՐԻԳՈՐՅԱՆ

# ԱՐԱԳԱԳՈՐԾ ՔԱՌԱՓՈՒԼ ՏԱԿՏԱՎՈՐՄԱՄԲ ՀԱՂՈՐԴԻՉ ՀԱՆԳՈՒՅՑՆԵՐՈՒՄ ՏՎՅԱԼԻ ԵՎ ՏԱԿՏԱՅԻՆ ԱԶԴԱՆՇԱՆԻ ՀԱՄԱՁԱՅՆԵՑՄԱՆ ՄԵԹՈԴԸ

Ժամանակակից հիպերմասշտաբային տվյալների կենտրոններն օգտագործում են ազդանշանի թվային մշակման վրա հիմնված հաղորդիչ-ընդունիչ հանգույցներ՝ կանալի մեծ կորուստները փոխհատուցելու համար։ Ավելի քան 56 *Գբիթ/վ* արագությամբ աշխատող այս հանգույցները կենսունակ այլընտրանք են անալոգային տարբերակներին էներգիայի և զբաղեցրած մակերեսի պահանջների առումով, ինչպես նաև ապահովում են ավելի լավ ընդհանուր արտադրողականություն։ Այսպիսի համակարգերի նախագծման գործընթացը շատ բարդ է։ Առավել զգայուն մասերը տանող հանգույցն ու ընդունիչն են, որոնք նման համակարգերի հիմնական կառուցվածքային մասերն են։ Ի հավելում այդ ամենի՝ տակտային ազդանշանի ձարտարապետությունը և դրա համաձայնեցումը տվյալի հետ շատ կարևոր նշանակություն ունի օպտիմալ արտադրողականություն ապահովելու համար։ Ներկայացվում է տվյալների և տակտային ազդանշանի համաձայնեցման մեթոդը գերարագ քառափուլ տակտավորմամբ հաղորդիչ հանգույցների համար։

*Առանցքային բառեր.* SERDES, հաղորդիչ հանգույց, հոսանքի ռեժիմով տրամաբանություն, տակտային ազդանշան, ընդունիչ հանգույց, ազդանշանի քառամակարդակ մոդուլյացիա։

## А.Т. ГРИГОРЯН

# МЕТОД СОГЛАСОВАНИЯ ДАННЫХ С ТАКТОВЫМ СИГНАЛОМ В ВЫСОКОСКОРОСТНЫХ ЧЕТВЕРТЬЧАСТОТНЫХ ПЕРЕДАТЧИКАХ

Современные гипермасштабные центры обработки данных все чаще используют приемопередатчики на основе цифровой обработки сигнала для компенсации больших потерь в канале. Эти трансиверы, работающие на скоростях 56 и 112 *Гбит/с*, представляют собой жизнеспособную альтернативу аналоговым трансиверам с точки зрения требований к мощности и занимаемой площади, а также обеспечивают лучшую общую производительность. Процесс проектирования таких приемопередатчиков очень сложен. К наиболее чувствительным частям относятся драйвер и приемник, которые являются основными строительными блоками таких систем. В дополнение к этому архитектура тактирования и в целом согласование данных с тактовой частотой имеют решающее значение для достижения оптимальной производительности. В статье представлен метод выравнивания данных и тактового сигнала для высокоскоростных четвертьчастотных передатчиков.

*Ключевые слова:* SERDES, передатчик, логика тока, тактовый сигнал, приемник, РАМ4.