profile pic
Ernie Chu
Research Assistant
Research Center for Information Technology Innovation
Academia Sinica
Email: shchu [at] citi.sinica.edu.tw
Quick links: CV / GitHub / Notes

I am a Research Assistant in CITI at Academia Sinica studying computer vision, generative models and AI, under the supervision of Professor Jun-Cheng Chen. I received my undergraduate degree in Computer Science and Engineering from National Sun Yat-sen University, where I got my start on research working with Professor Chia-Ping Chen.

My primary interest lies in machine learning and generative models. I'm currently working on video generation using image Diffusion Models. (Last updated on June 1, 2023)

Before join CITI, I also worked part-time at the Office of International Affairs, NSYSU as a full stack Web developer. I develop Web applications for exchange programs mostly using PHP, Express, Vue, and maintaining all websites across the OIA office.

I've also involved in subjects such as, computer graphics, socket programming, attribute-based encryption, data visualization, compiler design and chrome extension development during my undergraduate study.


Selected Publications

MeDM: Mediating Image Diffusion Models for Video-to-Video Translation with Temporal Correspondence Guidance

Ernie Chu, Tzuhsuan Huang, Shuo-Yen Lin, Jun-Cheng Chen

AAAI Conference 2024

This study introduces an efficient and effective method, MeDM, that utilizes pre-trained image Diffusion Models for video-to-video translation with consistent temporal flow. The proposed framework can render videos from scene position information, such as a normal G-buffer, or perform text-guided editing on videos captured in real-world scenarios. We employ explicit optical flows to construct a practical coding that enforces physical constraints on generated frames and mediates independent frame-wise scores. By leveraging this coding, maintaining temporal consistency in the generated videos can be framed as an optimization problem with a closed-form solution. To ensure compatibility with Stable Diffusion, we also suggest a workaround for modifying observed-space scores in latent-space Diffusion Models. Notably, MeDM does not require fine-tuning or test-time optimization of the Diffusion Models. Through extensive qualitative, quantitative, and subjective experiments on various benchmarks, the study demonstrates the effectiveness and superiority of the proposed approach.


Multi-Task Self-Blended Images for Face Forgery Detection

Po-Han Huang, Yue-Hua Han, Ernie Chu, Jun-Cheng Chen, Kai-Lung Hua

ACM Multimedia Asia 2023

Deepfake detection has attracted extensive attention due to widespread forged images on social media. Recently, self-supervised learning (SSL)-based Deepfake detection approaches have outperformed supervised methods in terms of model generalization. However, we notice that most SSL-based methods do not take the manipulation strength levels of synthesized forgery samples into consideration according to different synthesis parameters and result in suboptimal detection performances. To address this issue, we introduce several auxiliary losses to the state-of-the-art SSL-based method based on different synthesis sub-tasks during data generation by inferring their synthesis parameters where the ground-truth labels are obtained from the synthesis pipeline for free. With comprehensive evaluations on various benchmarks, our approach has achieved noticeable performance improvement. Specifically, for the cross-dataset evaluation, the proposed approach outperforms the state-of-the-art method in terms of AUC on various datasets with improvements of 3.4%, 1.47%, 1.56%, and 1.3% on the CDF, DFDC, DFDCP, and FFIW datasets and achieves competitive performance on the DFD dataset. This further demonstrates the effectiveness of the proposed approach in generalization.


Please refer to my CV and Google Scholar profile for a full list.


Selected Projects

LoLViZ.

  • A League of Legends visualizer that helps the pro players in the challenger league exploring their matches.
  • Use Vue.js along with D3.js to create flexible and informative visualizations.
  • Utilize the Riot API and Google Firebase to retrieve the latest player and match informations.

TSM-Net: Audio Time-Scale Modification with Temporal Compressing Networks

  • Use an autoencoder to compress the audio into 1024 times smaller latent representation for time-scale modification.
  • Use multi-scale discriminator to train the model for the best audio quality on all frequency bands.

NSYSU Captcha Solver

  • Use convolutional neural networks to autofill the captcha verification in the course explorer.
  • Use TensorFlow.js in Chrome extension to implement on-device inference.

Secure On-line Chatting Service

  • On-line chatting service with console-based UI using ncurses.
  • Socket programming and multi-threading using C++17
  • Use attribute-based encryption to provide secure channel for messages.

License Plate Verification and Warning System

  • Use ML-based object detection technology to track the license plate.
  • The barrier gate is no longer needed as the system can detect moving target.
  • All of the inferences can be done on end-device, for example Jetson Nano.