Spotlight on stylometric text analysis - The Avon Novels

From MandrakeWiki
Revision as of 20:05, 28 November 2020 by The Clay Camel (talk | contribs) (Created page with "===The Story of the Phantom=== The Story of the Phantom is a series of 15 novels, published by Avon Publications in the U.S. from 1972 to 1975, based on Lee Falk's Phantom sto...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

The Story of the Phantom

The Story of the Phantom is a series of 15 novels, published by Avon Publications in the U.S. from 1972 to 1975, based on Lee Falk's Phantom stories. When released the adaptor of issues 2 and 10 was not credited, and issue 15 was credited as Carson Bingham. Lee Falk did correct this using an "Author's note" in the books.

Adapted by issues note
Basil Copper 2, 3 #2 The adaptor is not credited
Carson Bingham 14
Frank S. Shawn 4, 5, 7, 8, 10, 11 #10 The adaptor is not credited
Lee Falk 1, 6, 9, 12, 15 #15 is wrongly credited as Carson Bingham
Warren Shanahan 13

Analysis

Stylometric analysis to see if Lee Falk's correction in the "Author's note" can been confirmed.

  • MFW - most frequent words
  • MFC - most frequent characters
  • n-Grams - Sample for character 2-grams: The Phantom said = th,he,e , p,ph,ha,an,nt, etc. Sample for word 2-grams: Hello, the Phantom said = hello the,the phantom,phantom said, etc.
  • Corpus - collection of text. Here the 15 novels, from chapter 1 to the end of the novel.
Using JGAAP

The novels were prepared adding the novels to each of the authors, leaving issues 2, 10 and 15 as unknown authors. Using character 4-grams with nearest neighbor driver with metric Cosine Distance, issues 2, 10 and 15 were compared to the known authors.

The result were that the most likly author for: #2 is Basil Cooper, #10 is Frank S Shawn and #15 is Lee Falk.

Using R

The novels were put into one corpus folder. Two analysis were done: first 0-902 MFW 2-gram and the second 0-902 MFC 3-grams. Both using the Boostrap Consensus Tree.

The result grouping the novels according to the table above, confirming Lee Falk's correction in the "Author's note".

Interesting the analysis grouping issues 13 and 14 with statistic similar style.