正则表达式在Golang中命名组

I need help with integrating a regex with golang. I want to parse logfiles and created a regex which looks quite fine on https://regex101.com/r/p4mbiS/1/

一条日志行如下所示:

57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"

正则表达式是这样的:

(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"

命名组的结果应如下所示:

ip: 57.157.87.86

localtime: 06/Feb/2020:00:11:04 +0100

request: parammore=1&customer_id=1&...HTTP/1.1

ref: https://www.somewebsite.com/more/andheresomemore/

agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0)...

regex101.com生成对我不起作用的golang代码。我试图改进它,但没有成功。

golang代码仅返回整个字符串而不是组。

package main

import (
    "regexp"
    "fmt"
)

func main() {
    var re = regexp.MustCompile(`(?P<ip>([^\s]+)).+?\[(?P<localtime>(.*?))\].+?GET\s\/\?(?P<request>.+?)\".+?\"(?P<ref>.+?)\".\"(?P<agent>.+?)\"`)
    var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`

    if len(re.FindStringIndex(str)) > 0 {
        fmt.Println(re.FindString(str),"found at index",re.FindStringIndex(str)[0])
    }
}

find the fiddle here https://play.golang.org/p/e0_8PM-Nv6i

评论
  • 菊花开
    菊花开 回复

    Since you defined capturing groups and need to extract their values, you need to use You .FindStringSubmatch:

    package main
    
    import (
        "regexp"
        "fmt"
    )
    
    func main() {
        var re = regexp.MustCompile(`(?P<ip>\S+).+?\[(?P<localtime>.*?)\].+?GET\s/\?(?P<request>.+?)".+?"(?P<ref>.+?)"\s*"(?P<agent>.+?)"`)
        var str = `57.157.87.86 - - [06/Feb/2020:00:11:04 +0100] "GET /?parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1" 204 0 "https://www.somewebsite.com/more/andheresomemore/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"`
        match := re.FindStringSubmatch(str)
        fmt.Printf("IP: %s\nLocal Time: %s\nRequest: %s\nRef: %s\nAgent: %s", match[1],match[2], match[3], match[4], match[5])
    }
    

    输出:

    IP: 57.157.87.86
    Local Time: 06/Feb/2020:00:11:04 +0100
    Request: parammore=1&customer_id=1&version=1.56&param=meaningful&customer_name=somewebsite.de&some_id=4&cachebuster=1580944263903 HTTP/1.1
    Ref: https://www.somewebsite.com/more/andheresomemore/
    Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
    

    请注意,然后不需要命名的捕获组,只需使用编号的捕获组即可:

    ^(\S+)[\s-]+\[([^][]*)]\s+"GET\s+/\?([^"]+)"[^"]+"([^"]+)"\s+"([^"]+)"$
    

    See this regex demo. It is not a good idea to use .+? so often in the pattern as it decreases performance, thus I replaced those dot patterns with negated character classes and tried to make the pattern a bit more verbose.