如何将具有不同信息级别的数据转换为宽格式?

我有一个病人操作/程序的数据(如下图所示的示例),其中一行描述了病人的操作程序。有2级信息,

  1. the first being the operation details, i.e. op_start_dt, priority_operation and asa_status
  2. the second being the procedure details, i.e. proc_desc and proc_table

An operation can have more than 1 procedures. In the example below, patient A has 2 operations (defined by distinct op_start_dt). In his first operation, he had 1 procedure (defined by distinct proc_desc) and in his second, he had 2 procedures.

enter image description here

I would like to convert the data into a wide format, where a patient only has one row, and his information will be arranged operation by operation and within each operation, it will be arrange procedure by procedure, as shown below. So, proc_descxy refers to the proc_desc on xth operation and yth procedure.

enter image description here

数据:

df <- structure(list(patient = c("A", "A", "A"), department = c("GYNAECOLOGY /OBSTETRICS DEPT", 
"GYNAECOLOGY /OBSTETRICS DEPT", "GYNAECOLOGY /OBSTETRICS DEPT"
), op_start_dt = structure(c(1424853000, 1424870700, 1424870700
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), priority_operation = c("Elective", 
"Elective", "Elective"), asa_status = c(2, 3, 3), proc_desc = c("UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY", 
"KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC", 
"HEART, VARIOUS LESIONS, HEART TRANSPLANTATION"), proc_table = c("99", 
"6A", "7C")), row.names = c(NA, 3L), class = "data.frame")

所需的输出:

df <- structure(list(patient = "A", department = "GYNAECOLOGY /OBSTETRICS DEPT", 
    no_op = 2, op_start_dt1 = structure(1424853000, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), no_proc1 = 1, priority_operation1 = "Elective", 
    asa_status1 = 2, proc_desc11 = "UTERUS, MALIGNANT CONDITION, EXTENDED HYSTERECTOMY WITH/WITHOUT LYMPHADENECTOMY", 
    proc_table11 = "99", op_start_dt2 = structure(1424870700, class = c("POSIXct", 
    "POSIXt"), tzone = "UTC"), no_of_proc2 = 2, priority_operation2 = "Elective", 
    asa_status2 = 3, proc_desc21 = "KIDNEY AND URETER, VARIOUS LESIONS, NEPHROURETERECTOMY, LAPAROSCOPIC", 
    proc_table21 = "6A", proc_desc22 = "HEART, VARIOUS LESIONS, HEART TRANSPLANTATION", 
    proc_table22 = "7C"), row.names = 1L, class = "data.frame")

My attempt: I tried to work this out, but it gets confusing along the way, with pivot_wider, pivot_longer then pivot_wideragain.

df %>% 
  group_by(patient) %>% 
  mutate(no_op = n_distinct(op_start_dt)) %>% # Count the no. of operation for every patient
  group_by(patient, op_start_dt) %>% 
  mutate(no_proc = n_distinct(proc_desc)) %>% # Count the no. of procedures within each operation for every patient
  ungroup() %>% 

  mutate_all(as.character) %>% # Easier when converting to long format

  pivot_longer(-c(patient, department, no_op, op_start_dt)) %>% 

  group_by_at(vars(patient:name)) %>% 
  mutate(proc_nth = row_number()) %>% 
  ungroup() %>%  
  unite("name", name, proc_nth, sep = "") %>% 

  pivot_wider(names_from = name, values_from = value) %>%

  group_by(patient) %>% 
  mutate(op_nth = row_number()) %>%
  pivot_longer(-c(patient, department, no_op, op_nth)) %>% 

  mutate(proc_nth = str_extract(name, "\\d$"),
         name = str_extract(name, "[a-z_]+")) %>%

  unite("name", name, op_nth, proc_nth, sep = "", na.rm = TRUE) %>% 

  pivot_wider(names_from = name, values_from = value)
评论