Annie Lee’s Post

View profile for Annie Lee, graphic

Assistant Professor at OntarioTechU

Congrats Bo-Han Lu on the fantastic presentation of our work at LREC-COLING 2024! A great collaboration with Richard Tzong-Han Tsai. We translated Hokkien-Mandarin/English using LLaMA 2-7B by standardizing the Han writing script and evaluating with GPT-4. Enhancing Hokkien Dual Translation by Exploring and Standardizing of Four Writing Systems https://lnkd.in/gjWPw7fT Bo-Han Lu, Yi-Hsuan Lin, En-Shiun Annie Lee, Richard Tzong-Han Tsai Machine translation focuses mainly on high-resource languages (HRLs), while low-resource languages (LRLs) like Hokkien are relatively under-explored. The study aims to address this gap by developing a dual translation model between Hokkien and both Traditional Mandarin Chinese and English. We employ a pre-trained LLaMA 2-7B model specialized in Traditional Mandarin Chinese to leverage the orthographic similarities between Hokkien Han and Traditional Mandarin Chinese. Our comprehensive experiments involve translation tasks across various writing systems of Hokkien as well as between Hokkien and other HRLs. We find that the use of a limited monolingual corpus still further improves the model's Hokkien capabilities. We then utilize our translation model to standardize all Hokkien writing systems into Hokkien Han, resulting in further performance improvements. Additionally, we introduce an evaluation method incorporating back-translation and GPT-4 to ensure reliable translation quality assessment even for LRLs. The study contributes to narrowing the resource gap for Hokkien and empirically investigates the advantages and limitations of pre-training and fine-tuning based on LLaMA 2.

  • No alternative text description for this image
San Lee

CAIO at SpassMed

1mo

I found that Llama 3 is amazing and performed better than expected. Cheers and look forward to hearing more from you Annie Lee 🙏

Like
Reply

To view or add a comment, sign in

Explore topics