Preliminary work for fine-detail preserving medical VLM
Jongsu Youn
| Graduate School of Advanced Imaging Sciences, CAU
In the field of medical artificial intelligence, model performance and reliability are critical, and the complexity of data is higher than in general domains. Nevertheless, coarse-grained predictions provide little clinical utility, leading to a demand for highly complex tasks in real-world applications. Although advances in artificial intelligence have improved both task complexity and model performance, there remains a lack of vision-language models (VLMs) capable of handling fine-grained tasks specific to the medical domain. This paper presents preliminary work toward implementing fine-detail preserving vision-language pretraining tailored for medical applications.
Jongsu Youn is a Ph.D. candidate in the Department of Image Science at the Graduate School of Advanced Imaging, Chung-Ang University. He has a strong interest in multimodal learning, and during his master’s program, he conducted research on multimodal learning with audio–video and audio–image combinations. In his doctoral program, after publishing research on audio–image multimodal learning in a journal, he has been focusing on multimodal learning in the domain of medical artificial intelligence.He is particularly interested in the differences between general domains and specialized domains in multimodal learning, and his research aims to develop multimodal models that can provide practical benefits in specialized domains.