我正在一个迁移项目中,将一层Web服务器从python 2.7.8升级到python 3.6.3,在某些特殊情况下,我遇到了障碍。 当从客户端接收到请求时,有效载荷将使用pyzmq在本地传输,该pyzmq现在在python3中以“字节”而不是“ string”(以前在python2中)进行交互。
现在,我接收到的有效负载是使用iso-8859-1(latin-1)方案编码的,我可以轻松地将其转换为有效负载.decode('latin-1')字符串,并将其传递给下一个服务(svc- save-entity),它需要字符串参数。
但是,后续服务'svc-save-entity'期望latin-1字符(如果存在)将以ASCII字符引用(如é的é)而不是十六进制(如é的\ xe9)表示。 。
我正在努力寻找一种有效的方法来实现这种转换。 python专家可以在这里指导我吗?本质上,我需要一个函数的定义,例如decode_tostring():-
payload = b'Banco Santander (M\xe9xico)' #payload is in bytes
payload_str = decode_tostring(payload) #function to convert into string
payload_str == 'Banco Santander (Mé9xico)' #payload_str is a string in ASCII Character Reference
请定义decode_tostring。 :)
The
encode()
anddecode()
methods accept a parameter callederrors
which allows you to specify how characters which are not representable in the specified encoding should be handled. The one you're looking for is XML entity reference replacement, which is fortunately one of the standard handlers provided in thecodecs
module.现在,以您想要的方式进行替换有点复杂,因为用编码而不是解码期间发生的操作将非ASCII字符替换为它们对应的XML实体引用的操作。毕竟,编码是一个吸收字符并发出字节的过程,因此只有在编码过程中,您才能知道您是否拥有不属于ASCII的字符。目前,我想到的最干净的方法是进行解码,重新编码和重新解码,并在编码步骤中应用XML实体引用替换。
I wouldn't be surprised if there is a method somewhere out there that will replace all non-ASCII characters in a string with their XML entity refs and give you back a string, and if so, you could use it to replace the encoding and the second decoding. But I don't know of one. The closest I found at the moment was
xml.sax.saxutils.escape()
, but that only acts on certain specific characters.