# Near-Field Coupling Integration Technology

128-Die Stacking

Tadahiro Kuroda Keio University, Japan IEEE Fellow

ISSCC2010, pp.440-441

August 31, 2016

Tadahiro Kuroda

10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016)

## Challenge to "Tyranny of Numbers"



- Invention of IC driven by "Tyranny of Numbers": Challenges implied by large number of components & interconnects.
- We face the same challenge again with end of Moore's Law and rise of IoT/big data.

#### Proposal: Near-Field Coupling

- Replace mechanical connections (wires, solders, connectors) by electrical ones (wireless by near-field coupling).
- Near-field coupling provides with invisible wires.



#### Near-Field Coupling Integration Technology

Proposed solution to "connections in very large system"



ThruChip Interface (TCI) 3D integration of chips for high performance Transmission Line Coupler (TLC) LEGO-type packaging of modules for high function



JST ACCEL Project (2015-2019):

Data Centric Computer (Ultra low power mobile computer in the era of IoT) Proof of Concept: 100GFLOPS/W (in 2019) Milestone: 512GB/s 8GB DRAM (in 2017)

# Outline

- Near-Field Coupling Integration Technology
- □ Transmission Line Coupler (TLC)
- ThruChip Interface (TCI)
- □ Challenges
  - Highly Doped Silicon Vias (HDSV)
  - TCI\_2.9D/2.5D/2.0D
- □ ACCEL
  - 100GFLOPS/W Computer and 512GB/s DRAM

# Transmission Line Coupler (TLC)



# Transmission Line Coupler (TLC)



# Applications of TLC



Memory Card High-speed:50x(12Gb/s) Low-power:1/500 Water proof (pad-less, sealed) ISSCC2013, pp.214-215



Display High-speed: 10x(6Gb/s) Low-energy: 1/10(16pJ/b) Thin (no mechanical structure) ISSCC2013, pp.200-201



Smartphone High-speed :5x(6Gb/s) Low-energy :1/24(6pJ/b) Modular design (electrical connection) ISSCC2015, pp.176-177





DIMM High speed:5x(12.5Gb/s) Multi-drop bus (impedance controlled) ISSCC2012, pp.52-53

In-vehicle LAN Light: 30% Strong EMC immunity (wide band) ISSCC2014, pp.496-497



Satellite Light: 60% Vibration immunity (contactless connection) ISSCC2015, pp.434-435

# Display/Camera Module

□ High speed, Low power, Low profile



ISSCC2013, pp.200-201

 Tadahiro Kuroda
 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016)
 9 of 40

#### Modular Design



Tadahiro Kuroda 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016) 10 of 40

#### Radiation Tolerance (EMC)



Tadahiro Kuroda 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016) 11 of 40

## Vibration Tolerance



#### **Other Possibilities**



# Outline

- Near-Field Coupling Integration Technology
- □ Transmission Line Coupler (TLC)
- ThruChip Interface (TCI)
- □ Challenges
  - Highly Doped Silicon Vias (HDSV)
  - TCI\_2.9D/2.5D/2.0D
- □ ACCEL
  - 100GFLOPS/W Computer and 512GB/s DRAM

# ThruChip Interface (TCI)



ISSCC2004, pp.142-143



Inductive coupling
 data communication through chips
 Transceiver: digital CMOS circuits





- Coil: multi-layer standard wires
  - Logic interconnections go across coil
  - Coil can be placed anywhere (above SRAM)

Digital CMOS circuit solution

Eventually zero cost

Tadahiro Kuroda10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016)15 of 40

# Performance of TCI

#### High Speed





11Gb/s/ch30Gb/s/ch8Tb/s(0.18μm)(65nm)(1000ch in 2.5mm²)ISSCC2008A-SSCC2010ISSCC2010

Aggregated data rate is raised by increasing number of channels.

#### Low Power



0.14pJ/b (90nm) ISSCC2007



0.01pJ/b (65nm) JSSC2011

ESD protection device (>0.5pJ/b) can be eliminated.

#### High Integration





Tadahiro Kuroda 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016) 16 of 40

# **TCI** Coil Design

Data rate goes up dramatically with smaller Z



Tadahiro Kuroda 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016) 17 of 40

# **TCI** Layout



- Similar to typical CMOS layout
- Coils of 100um size are formed by M9 and M10 for TX, M7 and M8 for RX, with power/signal lines crossing
- Accommodate circuits under the coil
- Coils are overlapped and accessed by PDMA to avoid crosstalk
  - at phase 1
  - at phase 2
  - at phase 3
  - at phase 4

# TSV vs. TCI

|                                                                                  | Koz**transistor<br>micro bump                                                                                       | TCI<br>Magnetic Field                                                       |
|----------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------|
| Solution                                                                         | Mechanical in package                                                                                               | Electrical on wafer                                                         |
| Wafer Technology<br>Package Technology<br>Miniaturization<br>Yield<br>Eco-system | Additional steps needed<br>OSAT <sup>*</sup> involved<br>Difficult<br>Low, difficult to improve<br>New model needed | Standard CMOS<br>Conventional<br>Easy<br>High (~100%)<br>Conventional model |
| Additional Cost                                                                  | > 40%                                                                                                               | A few %                                                                     |
| Placement                                                                        | Dedicated area w/KOZ**                                                                                              | Unconstrained                                                               |
| Speed                                                                            | < 512 GB/s                                                                                                          | > 512 GB/s                                                                  |
| ESD Protection                                                                   | Needed                                                                                                              | No need                                                                     |
| Power Dissipation                                                                | High                                                                                                                | Low                                                                         |

OSAT\*: Outsource Assembly and Test, KOZ\*\*: Keep Out Zone

Tadahiro Kuroda10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016)19 of 40

## **3D Scaling Scenario**

Cost/Performance will be improved by 3D scaling scenario.



Suppose 8mm-square 4 chips are stacked. When each die is thinned from 50um to 10um, number of on-chip coils are increased from 700 to 17,500, yielding 25x speed improvement.

Tadahiro Kuroda10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016)20 of 40

# Performance of TCI in 7nm CMOS

| Chip thickness                                      | <b>50 μm</b>           | <b>25</b> μ <b>m</b>    |
|-----------------------------------------------------|------------------------|-------------------------|
| Coil size                                           | 150 μm                 | 75 μm                   |
| Data rate per coil                                  | 50 Gb/s/coil           | 64 Gb/s/coil            |
| Area efficiency                                     | 2 Tb/s/mm <sup>2</sup> | 11 Tb/s/mm <sup>2</sup> |
| Power efficiency                                    | 30 fJ/bit              | 25 fJ/bit               |
| Aggregate data rate<br>when using 8mm x 1mm Si area | 18 Tb/s                | 91 Tb/s                 |
| Power dissipation<br>when using 8mm x 1mm Si area   | 0.5 W                  | 2.2 W                   |

SPICE simulation performed with Predictive Technology Model (http://ptm.asu.edu/)

#### 3-D NoC by TCI

□ JSPS project led by Prof. Amano, Prof. Matsutani

- A Study on Building-Block Computing Systems using TCI
- Inter-chip wireless inductive coupling techniques, selforganized network-on-chips, fault tolerant architectures, optimized power control, and a flexible operating system with virtualization facilities are investigated
- http://www.am.ics.keio.ac.jp/kaken\_s/
- □ 3-D NoC with TCI will be presented at IEEE A-SSCC2016
  - Collision detection scheme by sensing magnetic field
  - 44-bit packet transceiver of PER < 10<sup>-9</sup>

# Outline

- Near-Field Coupling Integration Technology
- □ Transmission Line Coupler (TLC)
- □ ThruChip Interface (TCI)
- □ Challenges
  - Highly Doped Silicon Vias (HDSV)
  - TCI\_2.9D/2.5D/2.0D
- □ ACCEL
  - 100GFLOPS/W Computer and 512GB/s DRAM

# **Remaining Challenges**

Proof with DRAM

- Influence of magnetic field to DRAM
- Influence of DRAM (plate, cylinder, power mesh) to magnetic field
- Power Supply
  - New way of power delivery to create synergy with TCI is expected.
  - Highly Doped Silicon Vias (HDSV) is proposed.
     Idea is received highly in IEDM but needs proof.

Heat Removal

- Heat keeps from die stacking.
- Inductive coupling for horizontal link (TCI\_2.5D/2D) is developed.

# Highly Doped Silicon Vias (HDSV)

#### IEDM2014, 18.6.



- A deeper and more highly doped well is used to make a low resistance HDSV.
- The HDSV on one die and electrodes on the next die are connected by pressure from a Room-Temperature Wafer Level Bonding machine to create larger stacks.
- TCAD indicates resistance < 3mΩ when substrate <5um, dose: 1x10<sup>16</sup> cm<sup>-2</sup>, implant: 200 keV, annealing: 50h, 1050°C.
- 0.7 mm<sup>2</sup> net area is required (can be divided), good only for power delivery.
- Low cost process by implants

#### Memory Stacking with TCI and HDSV



Hot Chips 2014

#### 128GB/s HBM Case Study

□ TCI reduced chip size by 13% than TSV.



#### TCI Can Use Whole Chip Area



Tadahiro Kuroda 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016) 28 of 40

# TCI\_2.9D, 2.5D, 2.0D for Heat Removal



(a) TCI\_2.9D packaging.

μ**bumps** Chip TSVs Si interposer C4bumps Package substrate Conventional 2.5D packaging

by Si interposer with ubumps and TSVs.

Chip

(b) TCI\_2.5D packaging with small Si interposer.



TCI can release mechanical constraints such as stress

(c) TCI\_2.0D packaging.

# Outline

- Near-Field Coupling Integration Technology
- □ Transmission Line Coupler (TLC)
- □ ThruChip Interface (TCI)
- □ Challenges
  - Highly Doped Silicon Vias (HDSV)
  - TCI\_2.9D/2.5D/2.0D
- □ ACCEL
  - 100GFLOPS/W Computer and 512GB/s DRAM

# JST ACCEL Project (2015-2019)

#### Goal

- Mobile supercomputer with world's best power efficiency of 100GFLOPS/W (2019)
- □ Milestone
  - 512GB/s 8GB 8-Stacked DRAM (2017)
- Technology
  - **3D** Integration using Near-Field Coupling Integration Technology
- Further Challenges
  - AI computer equipped with both a left brain and a right brain to explore a new paradigm of information processing
  - Left brain employing stored program system by 3D Integration
  - Right brain employing virtual hard-wired logic system by 4D
     Integration (3D + DRP with DNN and DL; not mentioned today)









#### 512GB/s 8GB TCI DRAM

□ Target of TCI DRAM is 3x faster than HBM and HMC.



#### 100GFLOPS/W Computer



# Summary

- Near-Field Coupling Integration Technology challenges to "Tyranny of Numbers" in post-Moore.
- Transmission Line Coupler (TLC) using electromagnetic coupling enables contactless connector for modular design.
- ThruChip Interface (TCI) using inductive coupling enables die stacking for 3D integration.
- □ Challenges
  - Proof with DRAM
  - Highly Doped Silicon Vias (HDSV) for power supply
  - TCI\_2.9D/2.5D/2.0D for heat removal
- ACCEL aims for 512GB/s DRAM (in 2017) and 100GFLOPS/W computer (in 2019).



#### Questions

Tadahiro Kuroda10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016)39 of 40

#### Key References

#### TCI

- [01] ISSCC 2004, pp.142-143.
- [02] Symp. VLSI Circuits 2004, pp. 246-249.
- [03] CICC 2004, pp.99-102.
- [04] *ISSCC 2005*, pp.264-265.
- [05] *ISSCC 2006*, pp.424-425.
- [06] ESSCIRC 2006, pp.3-6.
- [07] ISSCC 2007, pp.264-265.
- [08] *A-SSCC 2007*, pp.131-134.
- [09] *ISSCC 2008*, pp.298-299.
- [10] ISSCC 2009, pp.244-245.
- [11] ISSCC 2009, pp.480-481.
- [12] Symp. on VLSI Circuits 2009, pp. 256-257.
- [13] Symp. on VLSI Circuits 2009, pp. 94-95.
- [14] Symp. on VLSI Circuits 2009, pp. 92-93.
- [15] CICC 2009, pp. 449-452.
- [16] A-SSCC 2009, pp.305-308.
- [17] A-SSCC 2009, pp.301-304.
- [18] *ISSCC 2010*, pp.436-437.
- [19] *ISSCC 2010*, pp.440-441.
- [20] ISSCC 2010, ES3.
- [21] Symp. on VLSI Circuits 2010, pp. 201-202.
- [22] A-SSCC 2010, pp.81-84.
- [23] IEDM 2010, p.17.1.1.
- [24] ISSCC 2011, pp.490-491.
- [25] ISSCC 2013, pp. 258-259.
- [26] Hot Chips 2014.

#### TLC

- [01] ISSCC 2007, pp.266-267.
- [02] CICC 2007, pp.13-2007.
- [03] A-SSCC 2008, pp.113-116.
- [04] ISSCC 2009, pp.470-472.
- [05] Symp. on VLSI Circuits 2009, pp. 26-27.
- [06] *ISSCC 2010*, pp.264-265.
- [07] ISSCC 2011, pp. 492-493.
- [08] *A-SSCC 2011*, pp. 145-148.
- [09] ISSCC 2012, pp. 52-53.
- [10] *CICC 2012*, pp. 7.9.1-7.9.4.
- [11] ISSCC 2013, pp. 214-215.
- [12] ISSCC 2013, pp. 200-201.
- [13] ISSCC 2014, pp. 496-497.
- [14] ISSCC 2015, pp. 176-177.
- [15] ISSCC 2015, pp. 434-435.
- [16] Symp. on VLSI Circuits 2015, pp. C128-129.

Tadahiro Kuroda 10th IEEE/ACM International Symposium on Networks-on-Chip (NOCS 2016) 40 of 40