Publications

An up-to-date list is available on Google Scholar.

2025

  1. ICLR
    ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
    Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano, and Pulkit Agrawal
    In The Thirteenth International Conference on Learning Representations, 2025
  2. arXiv
    Humanity’s Last Exam
    Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, and 2 more authors
    arXiv preprint arXiv:2501.14249, 2025
  3. IEEE
    Vehicular Communication Security: Multi-Channel and Multi-Factor Authentication
    Marco De Vincenzi, Shuyang Sun, Chen Bo Calvin Zhang, Manuel Garcia, Shaozu Ding, Chiara Bodei, Ilaria Matteucci, Sanjay E Sarma, and Dajiang Suo
    IEEE Transactions on Vehicular Technology, 2025
  4. arXiv
    SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents
    Jonathan Kutasov, Yuqi Sun, Paul Colognese, Teun Weij, Linda Petrini, Chen Bo Calvin Zhang, John Hughes, Xiang Deng, Henry Sleight, and 3 more authors
    arXiv preprint arXiv:2506.15740, 2025
  5. arXiv
    Reliable Weak-to-Strong Monitoring of LLM Agents
    Neil Kale, Chen Bo Calvin Zhang, Kevin Zhu, Ankit Aich, Paula Rodriguez, Scale Red Team, Christina Q Knight, and Zifan Wang
    arXiv preprint arXiv:2508.19461, 2025
  6. arXiv
    TutorBench: A Benchmark To Assess Tutoring Capabilities Of Large Language Models
    Rakshith S Srinivasa, Zora Che, Chen Bo Calvin Zhang, Diego Mares, Ernesto Hernandez, Jayeon Park, Dean Lee, Guillermo Mangialardi, Charmaine Ng, and 2 more authors
    arXiv preprint arXiv:2510.02663, 2025
  7. arXiv
    Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
    Xingang Guo, Utkarsh Tyagi, Advait Gosai, Paula Vergara, Jayeon Park, Ernesto Gabriel Hernandez Montoya, Chen Bo Calvin Zhang, Bin Hu, Yunzhong He, and 2 more authors
    arXiv preprint arXiv:2510.12712, 2025
  8. arXiv
    MoReBench: Evaluating Procedural and Pluralistic Moral Reasoning in Language Models, More Than Outcomes
    Yu Ying Chiu, Michael S Lee, Rachel Calcott, Brandon Handoko, Paul Font-Reaulx, Paula Rodriguez, Chen Bo Calvin Zhang, Ziwen Han, Udari Madhushani Sehwag, and 2 more authors
    arXiv preprint arXiv:2510.16380, 2025
  9. arXiv
    ResearchRubrics: A Benchmark of Prompts and Rubrics for Evaluating Deep Research Agents
    Manasi Sharma, Chen Bo Calvin Zhang, Chaithanya Bandi, Clinton Wang, Ankit Aich, Huy Nghiem, Tahseen Rabbani, Ye Htet, Brian Jang, and 2 more authors
    arXiv preprint arXiv:2511.07685, 2025
  10. arXiv
    PRBench: Large-Scale Expert Rubrics for Evaluating High-Stakes Professional Reasoning
    Afra Feyza Akyürek, Advait Gosai, Chen Bo Calvin Zhang, Vipul Gupta, Jaehwan Jeong, Anisha Gunjal, Tahseen Rabbani, Maria Mazzone, David Randolph, and 2 more authors
    arXiv preprint arXiv:2511.11562, 2025

2023

  1. ICML
    HIP-RL: Hallucinated Inputs for Preference-based Reinforcement Learning in Continuous Domains
    Chen Bo Calvin Zhang, and Giorgia Ramponi
    In ICML 2023 Workshop: The Many Facets of Preference-Based Learning, 2023
  2. arXiv
    Zero-Shot Transfer in Imitation Learning
    Alvaro Cauderan, Gauthier Boeshertz, Florian Schwarb, and Chen Bo Calvin Zhang
    arXiv preprint arXiv:2310.06710, 2023