Goroutine没有按预期运行

I'm still learning Go and was doing the exercise of a web crawler as linked here. The main part I implemented is as follows. (Other parts remain the same and can be found in the link.)

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher) {
    // TODO: Fetch URLs in parallel.
    // TODO: Don't fetch the same URL twice.
    // This implementation doesn't do either:
    if depth <= 0 {
        return
    }
    body, urls, err := fetcher.Fetch(url)
    cache.Set(url)
    if err != nil {
        fmt.Println(err)
        return
    }
    fmt.Printf("found: %s %q\n", url, body)

    for _, u := range urls {
        if cache.Get(u) == false {
            fmt.Println("Next:", u)
            Crawl(u, depth-1, fetcher) // I want to parallelize this
        }
    }
    return
}

func main() {
    Crawl("https://golang.org/", 4, fetcher)
}

type SafeCache struct {
    v   map[string]bool
    mux sync.Mutex
}

func (c *SafeCache) Set(key string) {
    c.mux.Lock()
    c.v[key] = true
    c.mux.Unlock()
}

func (c *SafeCache) Get(key string) bool {
    return c.v[key]
}

var cache SafeCache = SafeCache{v: make(map[string]bool)}

当我运行上面的代码时,结果是预期的:

found: https://golang.org/ "The Go Programming Language"
Next: https://golang.org/pkg/
found: https://golang.org/pkg/ "Packages"
Next: https://golang.org/cmd/
not found: https://golang.org/cmd/
Next: https://golang.org/pkg/fmt/
found: https://golang.org/pkg/fmt/ "Package fmt"
Next: https://golang.org/pkg/os/
found: https://golang.org/pkg/os/ "Package os"

However, when I tried to parallelize the crawler (on the line with a comment in the program above) by changing Crawl(u, depth-1, fetcher) to go Crawl(u, depth-1, fetcher), the results were not as I expected:

found: https://golang.org/ "The Go Programming Language"
Next: https://golang.org/pkg/
Next: https://golang.org/cmd/

I thought directly adding a go keyword is as straightforward as it seems, but I'm not not sure what went wrong and confused on how I should best approach this problem. Any advice would be appreciated. Thank you in advance!

评论
初吻给了烟
初吻给了烟

Your program is most likely exiting before the crawlers finish doing their work. One approach would be for the Crawl to have a WaitGroup where it waits for all of it's sub crawlers to finish. For example

import "sync"

// Crawl uses fetcher to recursively crawl
// pages starting with url, to a maximum of depth.
func Crawl(url string, depth int, fetcher Fetcher, *wg sync.WaitGroup) {
    defer func() {
        // If the crawler was given a wait group, signal that it's finished
        if wg != nil {
            wg.Done()
        }
    }()

    if depth <= 0 {
        return
    }

    _, urls, err := fetcher.Fetch(url)
    cache.Set(url)
    if err != nil {
        fmt.Println(err)
        return
    }

    fmt.Printf("found: %s %q\n", url, body)

    var crawlers sync.WaitGroup
    for _, u := range urls {
        if cache.Get(u) == false {
            fmt.Println("Next:", u)
            crawlers.Add(1)
            go Crawl(u, depth-1, fetcher, &crawlers)
        }
    }
    crawlers.Wait() // Waits for its sub-crawlers to finish

    return 
}

func main() {
   // The root does not need a WaitGroup
   Crawl("http://example.com/index.html", 4, nil)
}
点赞
评论