When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
Shiyao Cui,Xijia Feng,Yingkang Wang,Junxiao Yang,Zhexin Zhang,Biplab Sikdar,Hongning Wang,Han Qiu,Minlie Huang
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
Junxiao Yang,Jinzhe Tu,Haoran Liu,Xiaoce Wang,Chujie Zheng,Zhexin Zhang,Shiyao Cui,Caishun Chen,Tiantian He,Hongning Wang,Yew-Soon Ong,Minlie Huang
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen
Zhexin Zhang,Yuhao Sun,Junxiao Yang,Shiyao Cui,yuanchao zhang,Hongning Wang,Minlie Huang
Trust-Region Adaptive Policy Optimization
Mingyu Su,Jian Guan,Yuxian Gu,Minlie Huang,Hongning Wang (with Assoc. Prof. Hongning Wang)
Data Efficient RLVR via Off-Policy Influence Guidance
Erle Zhu, Dazhi Jiang, Yuan Wang, Xujun Li, Jiale Cheng, Yuxian Gu, Yilin Niu, Aohan Zeng, Jie Tang, Minlie Huang, Hongning Wang
Glyph: Scaling Context Windows via Text Visualization
Jiale Cheng, Yusen Liu, Xinyu Zhang, Yulin Fei, Wenyi Hong, Ruiliang Lyu, Weihan Wang, Zhe Su, Xiaotao Gu, Xiao Liu, Yushi Bai, Jie Tang, Hongning Wang, Minlie Huang
The Side Effects of Being Smart: Safety Risks in MLLMs’ Multi-Image Reasoning
Renmiao Chen, Yida Lu, Shiyao Cui, Xuan Ouyang, Victor Shea-Jay Huang, Shumin Zhang, Chengwei Pan, Han Qiu, Minlie Huang
LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
Junxiao Yang, Haoran Liu, Jinzhe Tu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Jiaqi Weng, Jialing Tao, Hui Xue, Hongning Wang, Han Qiu, Minlie Huang
S^4: Operationalizing Speech Act Theory for Strategic Semi-Structured Psychiatric Interview
Guanqun Bi, Zhoufu Liu, Zhuang Chen, Dazhen Wan, Xiyao Xiao, Minlie Huang
Holistic Evaluation for LLM’s Capability in Human-level Writing using Tree of Writing
Andrew Zhuoer Feng, Cunxiang Wang, Yu Luo, Lin Fan, Irene Zhou, ZikangWang, Xiaotao Gu, Jie Tang, Hongning Wang, Minlie Huang
How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study
Zhexin Zhang, Xian Qi Loye, Victor Shea-Jay Huang, Junxiao Yang, Qi Zhu, Shiyao Cui, Fei Mi, Lifeng Shang, Yingkang Wang, Hongning Wang, Minlie Huang
New Terms, New Toxicity: Consensus-based Chinese Neologism Toxicity Detection via Search-Augmented LLMs
Shiyao Cui, QingLin Zhang, Di Wang, Yida Lu, Zhexin Zhang, Jinhua Gao, Jinglin Yang, Min He, Han Qiu, Minlie Huang
IF-RewardBench: Benchmarking Judge Models for Instruction-Following Evaluation
Bosi Wen, Yilin Niu, Cunxiang Wang, Xiaoying Ling, Ying Zhang, Pei Ke, Hongning Wang, Minlie Huang
IF-CRITIC: Towards a Fine-Grained LLM Critic for Instruction-Following Evaluation
Bosi Wen, Yilin Niu, Cunxiang Wang, Pei Ke, Xiaoying Ling, Ying Zhang, Aohan Zeng, Hongning Wang, Minlie Huang