Hiroki Matsutani's homepage
Matsutani Lab's homepage (in Japanese)
Dept. of ICS, Faculty of Science and Technology, Keio University

Table of Contents

Machine Learning

On-Device Learning

OnDevice_Learning.png
  • As a neural-network based on-device learning, online sequential learning unsupervised anomaly detection and its low-cost implementation on FPGA were presented in [Tsukada_HeteroPar18].
  • We are working on an on-device learning chip by using the online sequential learning unsupervised anomaly detection. The performance and cost evaluations are reported in [Itsubo_COOLChips19].
    • [Tsukada_HeteroPar18] Mineto Tsukada, Masaaki Kondo, Hiroki Matsutani, "OS-ELM-FPGA: An FPGA-Based Online Sequential Unsupervised Anomaly Detector", Proc. of the 24th International European Conference on Parallel and Distributed Computing (Euro-Par'18) Workshops, pp.506-517, Aug 2018. [PDF]
    • [Itsubo_COOLChips19] Tomoya Itsubo, Mineto Tsukada, Hiroki Matsutani, "Performance and Cost Evaluations of Online Sequential Learning and Unsupervised Anomaly Detection Core", Proc. of the 22nd IEEE Symposium on Low-Power and High-Speed Chips and Systems (COOL Chips 22), pp.xxx-xxx, Apr 2019.

Machine Learning Accelerators

Outlier_detection.png
  • Outlier detection based on Mahalanobis distance was implemented in 10GbE FPGA NIC and achieved almost 10Gbps throughput in [Hayashi_CAN15].
  • Outlier detection based on k-nearest neighbor (k-NN) and local outlier factor (LOF) algorithms were implemented in 10GbE FPGA NIC in [Hayashi_PDP17].
  • Both change-point detection and outlier detection were accelerated in 10GbE FPGA NIC in [Iwata_HeteroPar18].
    • [Hayashi_CAN15] Ami Hayashi, Yuta Tokusashi, Hiroki Matsutani, "A Line Rate Outlier Filtering FPGA NIC using 10GbE Interface", ACM SIGARCH Computer Architecture News (CAN), Vol.43, No.4, pp.22-27, Sep 2015. [PDF]
    • [Hayashi_PDP17] Ami Hayashi, Hiroki Matsutani, "An FPGA-Based In-NIC Cache Approach for Lazy Learning Outlier Filtering", Proc. of the 25th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP'17), pp.15-22, Mar 2017. [PDF]
    • [Iwata_HeteroPar18] Takuma Iwata, Kohei Nakamura, Yuta Tokusashi, Hiroki Matsutani, "Accelerating Online Change-Point Detection Algorithm using 10 GbE FPGA NIC", Proc. of the 24th International European Conference on Parallel and Distributed Computing (Euro-Par'18) Workshops, pp.506-517, Aug 2018. [PDF]

Big Data Processing

Polyglot Persistence Accelerators

PolyglotPersistence.png
  • Key-value store NoSQL (e.g., memcached, Redis) was accelerated by combining FPGA-based 10GbE NIC and in-kernel cache in [Tokusashi_HOTI16]. The former is called L1 NoSQL cache and the latter is called L2 NoSQL cache.
  • Column-oriented NoSQL (e.g., HBase) was accelerated with FPGA-based 10GbE in [Hamada_ReConFig16] and in-kernel cache in [Tamura_FPGA4GPC16].
  • Document-oriented NoSQL (e.g., MongoDB) was accelerated with networked GPUs via a PCI-Express over 10GbE technology in [Morishima_HeteroPar16].
  • Graph database (e.g., Neo4j) was accelerated with GPU in [Morishima_CAN14].
    • [Tokusashi_HOTI16] Yuta Tokusashi, Hiroki Matsutani, "A Multilevel NOSQL Cache Design Combining In-NIC and In-Kernel Caches", Proc. of the 24th IEEE International Symposium on High Performance Interconnects (Hot Interconnects 24), pp.60-67, Aug 2016. [PDF]
    • [Hamada_ReConFig16] Akihiko Hamada, Hiroki Matsutani, "Design and Implementation of Hardware Cache Mechanism and NIC for Column-Oriented Databases", Proc. of the 11th International Conference on ReConFigurable Computing and FPGAs (ReConFig'16), pp.1-6, Nov 2016. [PDF]
    • [Tamura_FPGA4GPC16] Korechika Tamura, Hiroki Matsutani, "An In-Kernel NOSQL Cache for Range Queries Using FPGA NIC", Proc. of the 1st International Conference on FPGA Reconfiguration for General-Purpose Computing (FPGA4GPC'16), pp.13-18, May 2016. [PDF]
    • [Morishima_HeteroPar16] Shin Morishima, Hiroki Matsutani, "Distributed In-GPU Data Cache for Document-Oriented Data Store via PCIe over 10Gbit Ethernet", Proc. of the 22nd International European Conference on Parallel and Distributed Computing (Euro-Par'16) Workshops, pp.41-55, Aug 2016. [PDF]
    • [Morishima_CAN14] Shin Morishima, Hiroki Matsutani, "Performance Evaluations of Graph Database using CUDA and OpenMP-Compatible Libraries", ACM SIGARCH Computer Architecture News (CAN), Vol.42, No.4, pp.75-80, Sep 2014. [PDF]

Batch and Stream Processing Accelerators

DistributedGPUprocessing.png
  • Spark (batch processing) was accelerated with networked GPUs via a PCI-Express over 10GbE technology in [Ohno_ICPADS16].
  • Spark Streaming (stream processing) was accelerated with FPGA-based 10GbE NIC in [Nakamura_BigData16WS].
  • Message queuing system was accelerated by combining FPGA-based 10GbE NIC and in-kernel cache in [Mitsuzuka_FPT18].
    • [Ohno_ICPADS16] Yasuhiro Ohno, Shin Morishima, Hiroki Matsutani, "Accelerating Spark RDD Operations with Local and Remote GPU Devices", Proc. of the 22nd IEEE International Conference on Parallel and Distributed Systems (ICPADS'16), pp.791-799, Dec 2016. [PDF]
    • [Nakamura_BigData16WS] Kohei Nakamura, Ami Hayashi, Hiroki Matsutani, "An FPGA-Based Low-Latency Network Processing for Spark Streaming", Proc. of the 4th IEEE International Conference on Big Data (BigData'16) Workshops, pp.2410-2415, Dec 2016. [PDF]
    • [Mitsuzuka_FPT18] Koya Mitsuzuka, Yuta Tokusashi, Hiroki Matsutani, "MultiMQC: A Multilevel Message Queuing Cache Combining In-NIC and In-Kernel Memories", Proc. of the 17th IEEE International Conference on Field Programmable Technology (FPT'18), pp.xxx-xxx, Dec 2018. [PDF]

Blockchain Search Accelerators

Blockchain_FPGANICCache.png
  • For bitcoin and blockchain applications, to mitigate full node accesses from IoT devices (i.e., SPV nodes), FPGA-based blockchain transaction caching and processing were proposed in [Sakakibara_ISPA18].
  • Blockchain data was cached in GPU device memory and anomaly detection of the data was accelerated using the GPU in [Morishima_ISPA18].
    • [Sakakibara_ISPA18] Yuma Sakakibara, Yuta Tokusashi, Shin Morishima, Hiroki Matsutani, "Accelerating Blockchain Transfer System Using FPGA-Based NIC", Proc. of the 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA'18), pp.171-178, Dec 2018. [PDF]
    • [Morishima_ISPA18] Shin Morishima, Hiroki Matsutani, Shin Morishima, Hiroki Matsutani, "Acceleration of Anomaly Detection in Blockchain Using In-GPU Cache", Proc. of the 16th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA'18), pp.244-251, Dec 2018. [PDF]

Datacenter Networks

Random Datacenter Networks

FSO_datacenter.png
  • Random shortcut topologies were explored for HPC interconnects in [Koibuchi_ISCA12] and those that consider rack layout were proposed in [Koibuchi_HPCA13].
  • 40Gbps free-space optics (light beam) was exploited as shortcut links for HPC interconnects in [Fujiwara_HPCA15].
    • [Koibuchi_ISCA12] Michihiro Koibuchi, Hiroki Matsutani, Hideharu Amano, D. Frank Hsu, Henri Casanova, "A Case for Random Shortcut Topologies for HPC Interconnects", Proc. of the 39th ACM/IEEE International Symposium on Computer Architecture (ISCA'12), pp.177-188, Jun 2012. [PDF]
    • [Koibuchi_HPCA13] Michihiro Koibuchi, Ikki Fujiwara, Hiroki Matsutani, Henri Casanova, "Layout-conscious Random Topologies for HPC Off-chip Interconnects", Proc. of the 19th IEEE International Symposium on High-Performance Computer Architecture (HPCA'13), pp.484-495, Feb 2013. [PDF]
    • [Fujiwara_HPCA15] Ikki Fujiwara, Michihiro Koibuchi, Tomoya Ozaki, Hiroki Matsutani, Henri Casanova, "Augmenting Low-latency HPC Network with Free-space Optical Links", Proc. of the 21st IEEE International Symposium on High-Performance Computer Architecture (HPCA'15), pp.390-401, Feb 2015. [PDF]

Approximate Networks

  • Approximate datacenter network that optimizes low-latency while keeping accuracy was explored in [Fujiki_HPCA17].
  • In-switch approximation methods (e.g., proxy computation and response) for delayed tasks or stragglers in MapReduce were proposed in [Mitsuzuka_FPL17].
    • [Fujiki_HPCA17] Daichi Fujiki, Kiyo Ishii, Ikki Fujiwara, Hiroki Matsutani, Hideharu Amano, Henri Casanova, Michihiro Koibuchi, "High-Bandwidth Low-Latency Approximate Interconnection Networks", Proc. of the 23rd IEEE International Symposium on High-Performance Computer Architecture (HPCA'17), pp.469-480, Feb 2017. [PDF]
    • [Mitsuzuka_FPL17] Koya Mitsuzuka, Ami Hayashi, Michihiro Koibuchi, Hideharu Amano, Hiroki Matsutani, "In-Switch Approximate Processing: Delayed Tasks Management for MapReduce Applications", Proc. of the 27th International Conference on Field-Programmable Logic and Applications (FPL'17), pp.1-4, Sep 2017. [PDF]

Network-on-Chips

  • You can download parameterized NoC Generator that generates Verilog HDL model of on-chip routers and network.

Wireless 3D Network-on-Chips

3DWiNoC.png
  • Inductive-coupling based wireless 3D Network-on-Chip in which each chip or component can be added, removed, and swapped (called "field-stackable") was proposed in [Matsutani_NOCS11]. Its routing and topologies were explored in [Matsutani_ASPDAC13] and [Matsutani_DATE14].
  • The "field-stackable" concept was demonstrated in Cube-1 system in which the numbers of CPU chips and accelerator chips can be customized in [Miura_Micro13].
    • [Matsutani_NOCS11] Hiroki Matsutani, Yasuhiro Take, Daisuke Sasaki, Masayuki Kimura, Yuki Ono, Yukinori Nishiyama, Michihiro Koibuchi, Tadahiro Kuroda, Hideharu Amano, "A Vertical Bubble Flow Network using Inductive-Coupling for 3-D CMPs", Proc. of the 5th ACM/IEEE International Symposium on Networks-on-Chip (NOCS'11), pp.49-56, May 2011. [PDF]
    • [Miura_Micro13] Noriyuki Miura, Yusuke Koizumi, Eiichi Sasaki, Yasuhiro Take, Hiroki Matsutani, Tadahiro Kuroda, Hideharu Amano, Ryuichi Sakamoto, Mitaro Namiki, Kimiyoshi Usami, Masaaki Kondo, Hiroshi Nakamura, "A Scalable 3D Heterogeneous Multicore with an Inductive ThruChip Interface", IEEE Micro, Vol.33, No.6, pp.6-15, Dec 2013.
    • [Matsutani_ASPDAC13] Hiroki Matsutani, Paul Bogdan, Radu Marculescu, Yasuhiro Take, Daisuke Sasaki, Hao Zhang, Michihiro Koibuchi, Tadahiro Kuroda, Hideharu Amano, "A Case for Wireless 3D NoCs for CMPs", Proc. of the 18th Asia and South Pacific Design Automation Conference (ASP-DAC'13), pp.23-28, Jan 2013. (Best Paper Award) [PDF]
    • [Matsutani_DATE14] Hiroki Matsutani, Michihiro Koibuchi, Ikki Fujiwara, Takahiro Kagami, Yasuhiro Take, Tadahiro Kuroda, Paul Bogdan, Radu Marculescu, Hideharu Amano, "Low-Latency Wireless 3D NoCs via Randomized Shortcut Chips", Proc. of the 17th Design, Automation, and Test in Europe Conference (DATE'14), pp.1-6, Mar 2014. [PDF]

Low-Power On-Chip Routers

Lowpower_router_wave.png
  • Fine-grained power-gating router, where router components (e.g., input VC buffer, output buffer, VC mux, crossbar) are independently power-gated, was proposed in [Matsutani_ASPDAC08].
  • Ultrafine-grained power-gating router was implemented with a 65nm process in [Matsutani_TCAD11].
  • Variable-pipeline router that dynamically adjusts voltage and pipeline depth (not frequency like DVFS) was proposed in [Matsutani_ASPDAC12].
    • [Matsutani_ASPDAC08] Hiroki Matsutani, Michihiro Koibuchi, Daihan Wang, Hideharu Amano, "Run-Time Power Gating of On-Chip Routers Using Look-Ahead Routing", Proc. of the 13th Asia and South Pacific Design Automation Conference (ASP-DAC'08), pp.55-60, Jan 2008. [PDF]
    • [Matsutani_TCAD11] Hiroki Matsutani, Michihiro Koibuchi, Daisuke Ikebuchi, Kimiyoshi Usami, Hiroshi Nakamura, Hideharu Amano, "Performance, Area, and Power Evaluations of Ultrafine-Grained Run-Time Power-Gating Routers for CMPs", IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), Vol.30, No.4, pp.520-533. Apr 2011.
    • [Matsutani_ASPDAC12] Hiroki Matsutani, Yuto Hirata, Michihiro Koibuchi, Kimiyoshi Usami, Hiroshi Nakamura, Hideharu Amano, "A Multi-Vdd Dynamic Variable-Pipeline On-Chip Router for CMPs", Proc. of the 17th Asia and South Pacific Design Automation Conference (ASP-DAC'12), pp.407-412, Jan 2012. (Best Paper Candidate) [PDF]

Low-Latency Routers

  • One-cycle low-latency on-chip router that forwards packets based on "path prediction" was proposed in [Matsutani_HPCA09].
    • [Matsutani_HPCA09] Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, Tsutomu Yoshinaga, "Prediction Router: Yet Another Low Latency On-Chip Router Architecture", Proc. of the 15th IEEE International Symposium on High-Performance Computer Architecture (HPCA'09), pp.367-378, Feb 2009. [PDF]

On-Chip Network Topologies

FatHTree.png
  • A novel topology, called Fat H-Tree, that forms a torus structure by combining two H-Tree topologies (called Red and Black trees) was proposed in [Matsutani_IPDPS07].
  • A class of 3D network topologies based on 3D crossbar was proposed in [Matsutani_ICPP07].
  • A fault-tolerant mesh-based Network-on-Chip that includes an additional Hamiltonian ring path, called Default Backup Path, was proposed in [Koibuchi_NOCS08].
    • [Matsutani_IPDPS07] Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, "Performance, Cost, and Energy Evaluation of Fat H-Tree: A Cost-Efficient Tree-Based On-Chip Network", Proc. of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS'07), 10 pages, Mar 2007. [PDF]
    • [Matsutani_ICPP07] Hiroki Matsutani, Michihiro Koibuchi, Hideharu Amano, "Tightly-Coupled Multi-Layer Topologies for 3-D NoCs", Proc. of the 36th International Conference on Parallel Processing (ICPP'07), 10 pages, Sep 2007. [PDF]
    • [Koibuchi_NOCS08] Michihiro Koibuchi, Hiroki Matsutani, Hideharu Amano, Timothy M. Pinkston, "A Lightweight Fault-tolerant Mechanism for Network-on-chip", Proc. of the 2nd ACM/IEEE International Symposium on Networks-on-Chip (NOCS'08), pp.13-22, Apr 2008. [PDF]

Attach file: fileOnDevice_Learning.png 30 download [Information] fileBlockchain_FPGANICCache.png 339 download [Information] fileFatHTree.png 465 download [Information] fileLowpower_router_wave.png 460 download [Information] fileDistributedGPUprocessing.png 510 download [Information] fileOutlier_detection.png 472 download [Information] fileFSO_datacenter.png 468 download [Information] file3DWiNoC.png 477 download [Information] filePolyglotPersistence.png 470 download [Information]

Front page   Edit Unfreeze Diff Backup Upload Copy Rename Reload   New List of pages Search Recent changes   Help   RSS of recent changes
Last-modified: 2019-04-04 (Thu) 11:53:44 (16d)