Index of Urdu Journals, International Islamic University - Islamabad

Investigating the Linguistic Fingerprint of GPT-4o in Arabic-to-English Translation Using Stylometry

Article Detail

  • Article type:
  • Keywords: Stylometric analysis; Machine-generated text; Natural Language Processing ; GPT-4o; Authorship attribution
  • Subject: Language in Education
  • Language(s): English
  • Volume: Vol. 5 No. 3
  • Issue:
  • Pages: 65-83
  • Published: 30 Sep 2024

Author(s):

  • Banat, Maysaa

Read online

Abstract

This study explores the linguistic and stylistic characteristics of machine-generated texts, focusing on the output of GPT-4o. Using various natural language processing (NLP) techniques, including word frequency and stopword count analysis, readability and sentence structure metrics, lexical diversity measures, syntactic frequency analysis, and named entity recognition (NER), the research aims to uncover the stylometric fingerprints present in machine-generated content. The results reveal that GPT-4ogenerated texts exhibit moderate lexical diversity and syntactic complexity, with certain chapters reflecting higher readability and more varied sentence structures, while others lean toward simpler linguistic patterns. The findings also highlight thematic variation across chapters, as observed in the distribution of named entities, which contributes to understanding the model’s handling of different contextual content. The research suggests that while GPT-4o maintains a consistent style in its generated text, there are distinguishable characteristics that may serve as indicators of machine authorship. This provides valuable insights for stylometric analysis, authorship attribution, and the identification of machine-generated texts in various contexts. Future research could extend this work by exploring deeper stylometric features, conducting cross-model comparisons, and developing advanced authorship detection algorithms tailored for AI-generated content. Moreover, the ethical implications of stylometric analysis in the context of AI-generated texts warrant further investigation, particularly as machine-generated content becomes increasingly prevalent across different domains.

Journal Information

  • Journal: Journal of translation and language studies
  • ISSN (online): 2709-5681
  • Institute:
  • Publisher: Saba Publishing
  • Start year: 2020
  • Country: Kuwait
  • Review type: Double blind peer review
  • Date added: 14 Jan 2025
  • Last index: 14 Jan 2025