我是編程新手,我需要幫助。試圖在golang上編寫gitlab scraper。 當我試圖在多線程模式下獲取有關項目的信息時,出現了一些問題。gitlab通過golang抓取的問題
下面是代碼:
func (g *Gitlab) getAPIResponce(url string, structure interface{}) error {
responce, responce_error := http.Get(url)
if responce_error != nil {
return responce_error
}
ret, _ := ioutil.ReadAll(responce.Body)
if string(ret) != "[]" {
err := json.Unmarshal(ret, structure)
return err
}
return errors.New(error_emptypage)
}
...
func (g *Gitlab) GetProjects() {
projects_chan := make(chan Project, g.LatestProjectID)
var waitGroup sync.WaitGroup
queue := make(chan struct{}, 50)
for i := g.LatestProjectID; i > 0; i-- {
url := g.BaseURL + projects_url + "/" + strconv.Itoa(i) + g.Token
waitGroup.Add(1)
go func(url string, channel chan Project) {
queue <- struct{}{}
defer waitGroup.Done()
var oneProject Project
err := g.getAPIResponce(url, &oneProject)
if err != nil {
fmt.Println(err.Error())
}
fmt.Printf(".")
channel <- oneProject
<-queue
}(url, projects_chan)
}
go func() {
waitGroup.Wait()
close(projects_chan)
}()
for project := range projects_chan {
if project.ID != 0 {
g.Projects = append(g.Projects, project)
}
}
}
這裏是輸出:
┌[ nb-kondrashov: ~/Gitlab/gitlab-auditor ]
└[ skondrashov ]-> ./gitlab-auditor
latest project = 1532
Gathering projects...
.......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................Get https://gitlab.example.com/api/v4/projects/563&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/558&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/531&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/571&private_token=SeCrEt_ToKeN: unexpected EOF
.Get https://gitlab.example.com/api/v4/projects/570&private_token=SeCrEt_ToKeN: unexpected EOF
..Get https://gitlab.example.com/api/v4/projects/467&private_token=SeCrEt_ToKeN: unexpected EOF
Get https://gitlab.example.com/api/v4/projects/573&private_token=SeCrEt_ToKeN: unexpected EOF
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
每當它的不同的項目,但它的id是大約550
當我試圖從輸出中扼制鏈接,我得到正常的JSON。當我試圖用queue := make(chan struct{}, 1)
(單線程)運行這段代碼時 - 一切都很好。
它可能是什麼?
嘗試限制同時連接的數量。 –
我想在這裏做:'queue:= make(chan struct {},N)'(N =連接數),並且它有部分幫助,但是我失去了性能。這是gitlab,golang或我的電腦的問題嗎? – sergkondr
我想50是太多了。可能存在某種限制連接數量的DDoS保護。嘗試減少,例如5或10. –