Agrusa Competition 2022

CSE Departmental Advisory Board Co-Chair Russ Agrusa flashes the thumbs-up sign with Agrusa Competition 2021 award winners.

In 2022, the Agrusa Competition awarded a total of $8,000 in prize money.

This year's competitors included 14 teams made up of 37 students who were advised by 12 faculty members.  

On this page:

Award Winners

Place Title Award
1st A Campus Prototype of Interactive Digital Twin in Cyber Manufacturing (Matthew Rubino and Michelle Weng) $4,000
2nd High-throughput Metagenomic Classification on Mobile Devices (Andrew Mikalsen) $2.500
3rd Improved Detection of Recyclable Plastics Using Multi-modal Sensing  (Vaishali Vitha Maheshkar) $1.000
Honorable Mention Dorsal Hand Vein (DHV) Biometrics (Sougato Bagchi) $250
Honorable Mention PRIMAL (Naresh Kumar Devulapally)


This is the complete list of student projects that competed for the prize money.

A Campus Prototype of Interactive Digital Twin in Cyber Manufacturing

The goal of this project was to develop a digital representation (digital twin) of a physical 3D printer. The system we created allows for the real time monitoring and control of a 3D printer located in the University at Buffalo’s Bell Hall. The printer is managed through a web interface, which is accessible from anywhere in the world. This interface displays sensor and video data associated with the printer in real time. This data is collected by microcontrollers attached to the printer, sent to a WebSocket server hosted on AWS, and relayed to the appropriate connections. The interface also includes animated 2D and 3D twin views. These views are synchronized with the state of the printer, mirroring the physical process. Finally, the interface has a control panel and G-code terminal, which are used to send commands to the machine remotely.

A smart home system for more fine-grained physical environment control

The project is designed to provide a better understanding of smart devices’ physical influences in the home environment. It tries to identify device-to-device physical interactions and prevent unwanted ones, e.g., a heater could raise temperature and trigger windows to open. The system also implements adaptive learning methods for the setup of users’ normal habits and normal devices’ specifications, e.g, the location and power of devices.  With a better understanding, the system can also provide more accuracy and efficient control for temporal physical attributes. For example, if users define the temperature should not exceed 78F, the system knows when to turn off a heater before reaching the targeted temperature, and makes sure the remaining heat cannot raise temperature to that degree.

AI Huddle

  1. The objective of this project is to develop an effective system that relies on coaches,smartphones and cloud, speech-to-text (STT), and natural language processing (NLP) technologies to gather statistics, which would then help the coaches understand potentialareas of improvement and tactics.
  2. The application allows coaches to record comments during the game using theirsmartphones.
  3. This audio is then sent to our speech-to-text model which processes it and returns atranscription of the audio in focus.
  4. After each play, the transcription is sent to another API to extract important data from thetext, which will be further used to generate reports about the game.

Audio Deepfake

State-of-the-art AI media synthesis methods can now create highly realistic still images and videos that can challenge the viewer’s ability to distinguish them from real media. While AI-synthesized still images and videos are currently in the spotlight of public attention, synthetic human voices have also undergone considerable development and are reaching unprecedented perceptual quality and generation efficiency. So far, we have accomplished the following:

  • We are the first to identify neural vocoders as a source of features to expose synthetic human voices.
  • We provide LibriVoC as a dataset of self-vocoding samples created with six state-of-the-art vocoders to highlight and exploit the vocoder artifacts.
  • We propose a new approach to detecting synthetic human voices based on exposing signal artifacts left by neural vocoders and trained with self-supervised representational learning;
  • We have modified and improved the RawNet2 baseline by adding multi-loss, lowering the error rate from 2.02% to 1.40%.

Convolution for Encrypted Images in FHE

With increasing cloud services and deep learning applications and with increasing concern for privacy, deep learning models for encrypted data can solve this challenge. Fully Homomorphic
Encryption (FHE) is an encryption method which allows computation over encrypted data but has a set of challenges and trade-offs. Any deep learning computer vision model can be broken down to the following components: convolution, subsampling, activation, and fully connected artificial neural network. Out of the four components listed above the convolution layer is not only the most time intensive but also computationally intensive.

Dorsal Hand Vein (DHV) Biometrics

Biometrics deals with recognizing humans from their physiological or behavioral characteristics.  In our project, we have implemented dorsal hand vein patterns as biometrics. We designed an end-to-end system that captures images of the dorsal side of our hand as well as an authentication system using neural networks. For capturing images, we have specifically used Near-Infrared (NIR-850nm) as the external source of light because hemoglobin is known to absorb NIR. This will result in the venous network of our hand appearing darker than other regions. Images are captured using a smartphone camera.


In this project, we built a novel framework HateGuard that practically addresses the issue of evolving online hate. We designed a centroid similarity sampling (CSS) and entailment-based few-shot learning approach which can utilize a limited amount of new data samples effectively to detect the evolving hate speech on social media. We used COVID-19 pademic as a case study and collected a related hate speech dataset from Twitter.  Compared to the state-of-the-art tools, our framework achieves 37.5% - 130% improvement in reducing online hate violations for the evolving online hate recently witnessed.

High-throughput Metagenomic Classification on Mobile Devices

I designed and implemented Coriolis, a metagenomic classifier (DNA analysis software) for mobile devices.  Metagenomic classification maps each DNA molecule in a sample to the species it’s from. To implement Coriolis, I introduced two novel, practical data structures for searching unstructured textual data stored on disk. First is the Patricia array, a space efficient way to determine a string’s rank among a set of strings stored on disk with only a single disk access. Second is the compact string B-tree (CSBT),
which searches for all occurrences of a string in a collection of texts stored on disk while being both I/O optimal and space efficient. Implementing a metagenomic classifier such as Coriolis is very complex.  To abstract these complexities, I introduced SMARTEn, a programming model and framework for expressing DNA analysis algorithms. SMARTEn is equipped with a runtime environment that automatically parallelizes and manages resources used by DNA analysis algorithms.

Improved Detection of Recyclable Plastics Using Multi-modal Sensing

The idea came about through the NSF funded project titled “Valorization of plastic waste viaadvanced separation and processing”. Previously, we have collected a large database of 16,000 colorimages of recyclable plastic items as well as a neural network-based pipeline for classification of theitems to the type of plastic with 95% accuracy. Separately, we have collected mid-IR spectra ofreal-world plastics along with algorithms for accurate recognition (98%) of the type of plastic from the spectrum. However, both these datasets are from ideal images as well as ideal spectra collectedin the lab respectively.

This project is developing a RNN-based multi-modal recognition system that combines image-based recognition along with mid-IR spectrum sensing for better recognition in the wild. We believe thatthe multi-modal solution will be better in the real-world where the plastics have impurities and deformations leading to individual modalities being less accurate.  The solution we propose in this project is the multi modal fusion of plastic images and spectra filesusing deep learning attention mechanisms and recurrent neural networks to capture the significantfeatures of the plastics and feed it to a Recurrent Neural Network to identify the type of plastic.  ACNN encoder captures the features from the images and RNN encoder is used on the spectra files.  The features from both the networks are fed to a mixed attention sensor fusion block which consistsof two stages, the first stage is self attention which captures the individual features and combines them using attention mechanism and the second stage is the cross attention module to deal withinter sensor modalities. The features from mixed sensor fusion blocks are then fed to a RNN networkto model the long term dynamics and predict the type of plastic.


In order to learn new skills or knowledge about a subject, learners (e.g., students, researchers, orworkers) often spend a good amount of time searching on the Web to find a set of related
articles and papers to read and learn. However, with the vast amount of scholarly articles
available online, learners often struggle to find the most relevant and authentic content to learn
effectively. Wouldn’t it be more effective if we can build an online ecosystem that connects
learners directly with subject matter experts so that subject matter experts can share their
recommendations of relevant articles in a clearly defined organization?

Northstar is a novel AI-driven data-rich platform to address this problem which provides abridge to connect aspiring and proficient researchers to get access to the right resources alongwith a well-defined pathway to study, learn and research new concepts anytime, anywhere withjust a few clicks.

It is an easy-to-use, cross-platform application built for iOS, Android, and the Web, where theusers can find learning roadmaps shared by domain experts from the community.


The project aims to provide a platform where people who use the application can easily view andmake appointments.  The businesses can register themselves and advertise their open appointment slotson the web app.  This will make the process of finding and booking appointments easy and hassle free.  Our application can support any kind of small, medium or large scale businesses and organizations (HealthCare, RealEstate, etc.).  We have developed and deployed an initial release of the project. You canaccess the web application here (  The main idea is to have one application for finding and managing all the appointments and this app intends to be a one-stop solution for users who are looking to find an appointment.


Human emotion recognition plays a pivotal role in building an intelligent conversational agent for providing real-time automated support service in various problem settings. For a comprehensive
understanding of the content and context of conversations from a video clip, we propose a PRIvacy-preserving Multi-modal Attentive Learning framework (PRIMAL) that derives the person-independent normalized facial action-unit based features to estimate the participants’ expression and keeps track of their spatio-temporal states and conversation dynamics in context of their surrounding environment to evaluate the speaker emotion. By designing a novel contrastive loss-based optimization framework to capture the self- and cross-modal correlation within a learned descriptor, PRIMAL exhibits promise in accurately identifying the emotion state of the individual speaker. For interpretability, it identifies top-k words in conversation, facial action-units and keyframe regions which influence the system decision.  The consistent superior performance over
other state-of-the-art works in large-scale public datasets, demonstrate the feasibility of our approach.

Privacy-preserving federated learning with ensemble attention distillation

Modern deep learning algorithms rely on massive annotated datasets for many practical applications. In most cases, however, this data is physically located across multiple disparate locations and regulated by different entities. This results in the challenges of centralizing the physically dispersed data, with the primary concerns being data privacy and network bandwidth issues. Consequently, federated learning (FL) has emerged as an important topic where a single centralized model is trained in a distributed, decentralized fashion using model fusion/distillation techniques. This is particularly relevant for clinicalapplications since patient data are usually not allowed to be transferred out of medical facilities, leading to the need for FL.

To address these problems, we propose a one-way offline distillation-based FL framework that can preserve privacy by design while consuming substantially fewer network communication resources compared to the existing FL methods. Our framework engages in inter-node communication using only publicly available and approved datasets, thereby giving explicit privacy control to the user. We demonstrate competitive performance on image classification, segmentation, and reconstruction tasks.

Understanding and Detecting Remote Infection on Linux-based IoT Devices

We conduct an empirical study on a large-scale dataset covering 403,464 samples collected from VirusShare and a large group of IoT honeypots to gain a deep insight into the characteristics of IoT malware remote infection.

Besides it, we also investigate the current state of fingerprinting methods of those commands and offer a taxonomy of shell commands by introducing the notion of infection capability.

Moreover, we develop an approach to detect ongoing remote infection activities based on infection capabilities.  And by deploying the honeypots with our infection detector word widely, the result shows that our detection approach can achieve a 99.22% detection rate for remote infections in the wild and introduce small performance overhead.