Welcome to Matsutani Lab @ Dept of ICS, Keio University, Japan


Our research topics broadly cover computing infrastructures of various types and scales ranging from edge to cloud computing. Currently, we are working on on-device AI (Artificial Intelligence) and their implementations for resource-limited edge devices, in-network computing using network-attached FPGAs (Field-Programmable Gate Arrays) and GPUs (Graphics Processing Units), and highly-efficient accelerators for distributed machine learning and data processing. Below are some selected research topics.

A publication list is here and summary of selected papers is here.

On-device learning for field-trainable anomaly detection (2017-present)

Toward on-device learning, we are working on a neural-network based online sequential learning and field-trainable anomaly detection (OSL-UAD) algorithm and its related technologies. In real environments, noise pattern (e.g., vibration) fluctuates and status of products/tools varies with time. Our OSL-UAD learns normal patterns including noises in a placed environment extemporarily to detect unusual ones, so no prior training is needed. It can train neural networks on $4 controllers.

In the following, OSL-UAD is demonstrated with some selected applications.

  • Mineto Tsukada, et al., “A Neural Network Based On-device Learning Anomaly Detector for Edge Devices”, IEEE Transactions on Computers (TC), Jul 2020. (Featured Paper in July 2020) [Open Access]
  • Rei Ito, et al., “An On-Device Federated Learning Approach for Cooperative Model Update between Edge Devices”, IEEE Access, Jun 2021. [Open Access]
  • Hirohisa Watanabe, et al., “An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning”, The 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS’21) Workshops (RAW’21), May 2021. [Paper]

Highly-efficient FPGA-based SLAM accelerators (2019-present)

We are working on highly-efficient LiDAR (Light Detection And Ranging) SLAM (Simultaneous Localization and Mapping) accelerators for mobile robots, such as robot cleaners, UAVs, and wheel chairs.

In the following, as well-known 2D LiDAR SLAM methods, a particle filter based SLAM and a graph-based SLAM are accelerated by a low-cost PYNQ FPGA board.

  • Keisuke Sugiura, et al., “A Unified Accelerator Design for LiDAR SLAM Algorithms for Low-end FPGAs”, The 20th IEEE International Conference on Field Programmable Technology (ICFPT’21), Dec 2021. [Paper]

Distributed machine learning for Beyond 5G era (2021-present)

We are working on networked AI systems for Beyond 5G era, such as those for distributed deep learning, distributed deep reinforcement learning, and federated learning. Below is such a work that proposes network optimizations for the distributed deep reinforcement learning.

  • Masaki Furukawa, et al., “Accelerating Distributed Deep Reinforcement Learning by In-Network Experience Sampling”, The 30th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP’22), Mar 2022.

Highly-efficient FPGA-based CNN accelerators using Neural ODE (2020-present)

We are working on highly-efficient Convolutional Neural Network (CNN) inference accelerators for edge devices. Specifically, ordinary differential equation (ODE) based neural networks (Neural ODEs) are implemented on low-cost FPGA devices to reduce parameter size while keeping accuracy.

  • Hirohisa Watanabe, et al., “Accelerating ODE-Based Neural Networks on Low-Cost FPGAs”, The 35th IEEE International Parallel and Distributed Processing Symposium (IPDPS’21) Workshops (RAW’21), May 2021. [Paper]

High-performance in-network machine learning (2014-2021)

We worked on machine learning for high-bandwidth network traffic using FPGA-based high-speed network interface cards (network-attached FPGAs) for outlier detection (k-nearest neighbor, local outlier factor), change-point detection (SDAR), and anomaly behavior detection (online HMM). We also proposed an in-network acceleration of optimization algorithms (SGD, AdaGrad, Adam, and SMORMS3) for distributed deep learning by using a 10Gbps FPGA-based network switch.

  • Hiroki Matsutani, “Accelerating Anomaly Detection Algorithms on FPGA-Based High-Speed NICs”, The 18th International Forum on MPSoC for Software-defined Hardware (MPSoC’18), Invited Talk, Aug 2018. [Slide]

Accelerating data processing frameworks (2014-2019)

Big data processing system typically consists of various software components, such as message queuing, RPC, stream processing, batch processing, machine learning framework, and data stores. We proposed their performance acceleration methods by using network-attached FPGAs and network-attached GPUs.

In the following, Apache Spark is accelerated by network-attached GPUs via 10Gbit Ethernet. RDDs are cached in device memory of these remote GPUs.

  • Hiroki Matsutani, “Accelerator Design for Big Data Processing Frameworks”, The 17th International Forum on MPSoC for Software-defined Hardware (MPSoC’17), Invited Talk, Jul 2017. [Slide]

We also worked on rack-scale architecture using the network-attached FPGAs and GPUs. In the following, a remote GPU connected via 10Gbit Ethernet is pooled and used for virtual reality applications on demand.

Accelerating NoSQL data stores (2013-2019)

We proposed performance acceleration methods of various NoSQLs including key-value store, column-oriented store, document-oriented store, and graph-oriented store by using network-attached FPGAs and network-attached GPUs. We also worked on acceleration of bitcoin/blockchain search.

In the following, a key-value store is accelerated by a network-attached FPGA via 10Gbit Ethernet.

  • Hiroki Matsutani, “Accelerator Design for Various NOSQL Databases”, The 16th International Forum on MPSoC for Software-defined Hardware (MPSoC’16), Invited Talk, Jul 2016. [Slide]

Data center network with light beam (2012-2018)

In the following, a 40Gbps free-space optical link (light beam) is established between two computers and then virtual machine (VM) migration is performed using this “VM highway”.

  • Ikki Fujiwara, et al., “Augmenting Low-latency HPC Network with Free-space Optical Links”, The 21st IEEE International Symposium on High-Performance Computer Architecture (HPCA’15), Feb 2015. [Slide]

Wireless 3D Network-on-Chips for building-block 3D systems (2009-2019)

We proposed inductive-coupling based wireless 3D Network-on-Chips for building-block 3D systems in which each chip or component can be added, removed, and swapped. Our “field-stackable” concept was demonstrated in Cube-0, Cube-1, and Cube-2 systems in which the numbers of CPU chips and accelerator chips can be customized.

  • Hiroki Matsutani, “A Building Block 3D System with Inductive-Coupling Through Chip Interfaces”, The 36th IEEE VLSI Test Symposium (VTS’18), Special Session, Apr 2018. [Slide]

Our NoC (Network-on-Chip) generator that generates Verilog HDL model of NoC consisting of on-chip routers, called nocgen, is available at GitHub.


Department of Information and Computer Science, Keio University
3-14-1 Hiyoshi, Kouhoku-ku, Yokohama, JAPAN 223-8522


Rooms 26-207 and 26-210A at Yagami Campus


Yagami Campus Guide



Ph.D. Course Student

  • Keisuke Sugiura (JSPS Research Fellow DC1)
  • Mineto Tsukada (JSPS Research Fellow DC1)

2nd-Year Master Course Students

  • Hibiki Ito
  • Yuto Ozeki
  • Hiroki Kawakami
  • Takeya Yamada

1st-Year Master Course Students

  • Ryuto Kojima
  • Kenji Nemoto
  • Yuto Hoshino
  • Yujiro Yahata

4th-Year Bachelor Course Students

  • Ikumi Okubo
  • Daiki Oda
  • Naoki Shibahara
  • Kazuki Sunaga
  • Mizuki Yasuda