so now you have the columns that you want to compared, converted into a list. now depending on what you want to do, there are a lot of ways to remove duplicates from the target list, for example list comprehension. if you want your x_min_2 to have ONLY the values which are already in x_min_1, you can do a list comprehension:
import pandas as pd
data1 = pd.read_csv('file1.csv',sep=';') # change the separator to the separator used
data2 = pd.read_csv('file2.csv',sep=';')
df_output = pd.DataFrame()
variable_want_to_compare = 'xmin'
for i in range(len(data1)):
a = data1[variable_want_to_compare].iloc[i]
for j in range(len(data2)):
b = data2[variable_want_to_compare].iloc[j]
if abs(a-b)<5: # tolerance of value differences
df_output = df_output.append(data2.iloc[j],ignore_index=True)
为了简化编码和计算速度,您可以采用以下方法:首先,将每列中的值转换为列表(可以使用pandas库)。例如:
so now you have the columns that you want to compared, converted into a list. now depending on what you want to do, there are a lot of ways to remove duplicates from the target list, for example list comprehension. if you want your
x_min_2
to have ONLY the values which are already inx_min_1
, you can do a list comprehension:很明显,您可以对要与任何数据集中的任何其他列进行比较的任何列进行此过程。最后,您将用新的编辑列表替换目标数据集列:
仅在这种情况下使用循环。我想只需要比较您数据的1列(例如比较min_x列),因为我看到您的数据看起来像每列(我的意思是每列与其他表相比具有相似的差值)。因此,您可以执行如下循环:
我不比较每列,因为数据将使排序变得复杂。如果要对每列进行排序,只需添加更多循环即可。但是我想您不需要它,因为每列的价值差异看起来相似(来自样本的基础)。