I'm looking for an R solution to the problem of parsing a text file of quotations (as below) giving a data.frame with one observation per quote, and variables text
and source
as described below.
DIAGRAMS are of great utility for illustrating certain questions of vital statistics by
conveying ideas on the subject through the eye, which cannot be so readily grasped when
contained in figures.
--- Florence Nightingale, Mortality of the British Army, 1857
To give insight to statistical information it occurred to me, that making an
appeal to the eye when proportion and magnitude are concerned, is the best and
readiest method of conveying a distinct idea.
--- William Playfair, The Statistical Breviary (1801), p. 2
Regarding numbers and proportions, the best way to catch the imagination is to speak to the eyes.
--- William Playfair, Elemens de statistique, Paris, 1802, p. XX.
The aim of my carte figurative is to convey promptly to the eye the relation not given quickly by numbers requiring mental calculation.
--- Charles Joseph Minard
Here, each quotation is a paragraph, separated from the next by "\n\n"
. Within the paragraph, all lines up to the one beginning ---
comprise the text
and what follows ---
is the source
.
I imagine I could solve this if I could first first split the text lines into paragraphs (separated by '\\n\\n+'
(2 or more blank lines), but I'm having trouble doing that.
Assuming your text file is
quote.txt
in working directory.R base solution: split it 2 times: (1) by
\n\n
and (2) by---
, then combine into data frame.输出量
This should do the bulk of what you need to achieve. I assume you already have the file in a length-1 character vector called
txt
:如果然后通过用空格替换换行符来整理文本,则会得到以下信息:
Assuming you have the initial text loaded in
rawText
variable