我有一个病人操作/程序的数据(如下图所示的示例),其中一行描述了病人的操作程序。有2级信息,
- the first being the operation details, i.e.
op_start_dt
,priority_operation
andasa_status
- the second being the procedure details, i.e.
proc_desc
andproc_table
An operation can have more than 1 procedures. In the example below, patient A
has 2 operations (defined by distinct op_start_dt
). In his first operation, he had 1 procedure (defined by distinct proc_desc
) and in his second, he had 2 procedures.
I would like to convert the data into a wide format, where a patient only has one row, and his information will be arranged operation by operation and within each operation, it will be arrange procedure by procedure, as shown below. So, proc_descxy
refers to the proc_desc
on xth operation and yth procedure.
数据:
df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT",
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective",
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99",
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")
所需的输出:
df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT",
no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective",
asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY",
proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct",
"POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective",
asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC",
proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION",
proc_table22 = "7C"), row.names = 1L, class = "data.frame")
My attempt:
I tried to work this out, but it gets confusing along the way, with pivot_wider
, pivot_longer
then pivot_wider
again.
df %>%
group_by(patient) %>%
mutate(no_op = n_distinct(op_start_dt)) %>% # Count the no. of operation for every patient
group_by(patient, op_start_dt) %>%
mutate(no_proc = n_distinct(proc_desc)) %>% # Count the no. of procedures within each operation for every patient
ungroup() %>%
mutate_all(as.character) %>% # Easier when converting to long format
pivot_longer(-c(patient, department, no_op, op_start_dt)) %>%
group_by_at(vars(patient:name)) %>%
mutate(proc_nth = row_number()) %>%
ungroup() %>%
unite("name", name, proc_nth, sep = "") %>%
pivot_wider(names_from = name, values_from = value) %>%
group_by(patient) %>%
mutate(op_nth = row_number()) %>%
pivot_longer(-c(patient, department, no_op, op_nth)) %>%
mutate(proc_nth = str_extract(name, "\\d$"),
name = str_extract(name, "[a-z_]+")) %>%
unite("name", name, op_nth, proc_nth, sep = "", na.rm = TRUE) %>%
pivot_wider(names_from = name, values_from = value)