Self-Introduction: Hirokatsu Kataoka

4 min readDec 15, 2020

This is my first post on Medium.com. I would like to introduce myself in the aspects of as a researcher and principal investigator. Basically, I have reported our academic knowledge and achievements in Japanese. I have shared research papers, codes, datasets, and slides on my webpage and social media. Hereafter, I would like to post my idea to the world!

As a researcher

I’m mainly focusing on the field of computer vision and pattern recognition. Specifically, I’m interested in the futuristic technology of computer vision in terms of both methods and datasets. I’ve released baselines, analyses, and datasets such as Video Recognition (e.g., action prediction, human action recognition without human), Fashion Analysis, Formula-driven Supervised Learning. I describe these topics as follows.

Video Recognition. Our video recognition method, 3D-ResNet (CVPR 2018) is the most cited paper (600+; Dec. 2020) in my publication. The GitHub code is also gathered 2.5K+ stars (Dec. 2020). My colleague and I carried out whether 3D CNNs and video datasets improve the spatiotemporal recognition or not.

Spatiotemporal 3D-ResNet for video recognition.

Formula-driven image dataset. Proposed ‘Pre-training without Natural Images’ based on fractals, which is a natural formula existing in the real world. We automatically generate a large-scale labeled image dataset based on an iterated function system (IFS). The pre-training framework with Fractal geometry for feature representation learning. We can enhance natural image recognition by pre-training without natural images. Accuracy transition among ImageNet-1k, FractalDB-1k and training from scratch. If we could improve the concept, then the de-facto-standard ImageNet pre-trained model may be replaced so as to protect fairness, preserve privacy, and decrease annotation labor. Please see also our code [Link].

The concept of ‘Pre-training without Natural Images’. We won the ACCV 2020 Best Paper H. M. Award!

Fashion Analysis. Fashion Culture DataBase (FCDB) [paper] [GitHub] was the most largest dataset in this context. By using the data, we have explored the world’s fashion trends ever. Recently, as of 2019, we have been employing FCDB as a large-scale pre-trained dataset for improving feature representation of person detection. We achieved that the proposed model outperformed the baseline (ImageNet pre-trained detector) with +13% improvement [paper].

Fashion Culture DataBase (FCDB): the database has been collected in order to analyze world-wide fashion trends

I will describe the detailed contents at each paper in the following posts.

As a principle investigator

I am leading the cvpaper.challenge which contains academic paper survey and collaborative research projects in the field of computer vision, pattern recognition, and their related fields. Recently in 2020, over 500 people (researchers, engineers, and graduate students) are joining the large project. The cvpaper.challenge is a group mainly composed of members from AIST, University of Tsukuba, Keio University, Waseda University, Tokyo Denki University, Tokyo Institute of Technology, the University of Tokyo and so on. We aim to systematically summarize papers on computer vision, pattern recognition, and related fields. For this particular review, we focused on reading the all / comprehensive papers in an international conference. So far, we have tried to read CVPR 2015, 2018, 2019, 2020, ECCV 2018, 2020, and ICCV 2019, the premier annual computer vision events.

Reading international conference papers clearly provides various advantages other than gaining an understanding of the current standing of your own research, such as acquiring ideas and methods used by researchers around the world. In reality, however, although this input of knowledge is important, researchers and engineers are too busy to have time to do it, and the process takes a great amount of time and effort for undergraduate and graduate students (particularly master course students) who lack research experience and entails sacrificing their time for classes and research. Assigning this work, however, to non-experts who are not familiar with the field of computer vision, results in a great amount of time needed for interpreting the papers. As a way to address this problem, we believe that we can make it relatively easier to grasp advanced technologies if we share and systematize knowledge using the Japanese language. We therefore undertook to extensively read papers, summarize them, and share them with others working in the same field.

Also, CVPR is also known to comprehensively cover papers in the different fields in computer vision and pattern recognition. A number of prominent international researchers and research groups choose their research themes after a comprehensive grasp of almost all papers presented in premier conferences and an understanding of research trends. We believe that the accuracy by which research themes are chosen can be improved by constantly being updated on cutting-edge technologies and discussing these new technology trends within the research groups as part of their regular activities. Further, a survey of papers presented in premier conferences is also an essential way to gather tools needed for research. We therefore believe that gaining an understanding of papers presented in premier conferences is the best method for authors to comprehend the latest trends in computer vision, pattern recognition, and related fields. (Partially cited from our arXiv paper)

Now we are looking for members of research community. Join us! email: cvpaper[dot]challenge[at]gmail[dot]com

Self-Introduction: Hirokatsu Kataoka

Written by Hirokatsu Kataoka

No responses yet