python
  • parsing
  • beautifulsoup
  • 2017-02-20 100 views 1 likes 
    1

    我最近開始瞭解更多關於Python的知識以及如何使用BeautifulSoup解析網站。從BeautifulSoup解析獲取特定值

    我現在面臨的問題是我似乎被卡住了。

    HTML代碼(後作出的湯):

    <div class="mod-3-piece-app__visual-container__chart"> 
        <div class="mod-ui-chart--dynamic" data-chart-config='{"chartData":{"periods":[{"year":2013,"period":null,"periodicity":"A","icon":null},{"year":2014,"period":null,"periodicity":"A","icon":null},{"year":2015,"period":null,"periodicity":"A","icon":null},{"year":2016,"period":null,"periodicity":"A","icon":null},{"year":2017,"period":null,"periodicity":"A","icon":null},{"year":2018,"period":null,"periodicity":"A","icon":null}],"forecastRange":{"from":3.5,"to":5.5},"actualValues":[5.6785,6.45,9.22,8.31,null,null],"consensusData":[{"y":5.6307,"toolTipData":{"low":5.5742,"high":5.7142,"analysts":34,"restatement":null}},{"y":6.3434,"toolTipData":{"low":6.25,"high":6.5714,"analysts":35,"restatement":null}},{"y":9.1265,"toolTipData":{"low":9.02,"high":9.28,"analysts":40,"restatement":null}},{"y":8.2734,"toolTipData":{"low":8.17,"high":8.335,"analysts":40,"restatement":null}},{"y":8.9304,"toolTipData":{"low":8.53,"high":9.63,"analysts":41,"restatement":null}},{"y":10.1252,"toolTipData":{"low":8.63,"high":11.61,"analysts":42,"restatement":null}}]}}'> 
         <noscript> 
          <div class="mod-ui-chart--static"> 
           <div class="mod-ui-chart--sprited" style="width:410px; height:135px; background:url('/data/Charts/EquityForecast?issueID=36276&amp;height=135&amp;width=410') 0px -270px no-repeat;"> 
           </div> 
          </div> 
         </noscript> 
        </div> 
    </div> 
    

    我的代碼:

    from bs4 import BeautifulSoup 
    import urllib.request 
    
    
    data = [] 
    List = ['AAPL'] 
    
    # Iterates Through List 
    for i in List : 
        # The webpage which we wish to Parse 
        soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml') 
    
        # Gathering the data 
        Values = soup.find_all("div", {"class":"mod-3-piece-app__visual-container__chart"})[4] 
        print(Values) 
    
        # Getting desired values from data 
    

    我希望獲得的是價值後{"y" ....,因此數字5.6307,6.3434,9.1265, 8.2734, 8.9304 and 10.1252,但我不能爲我的生活想出瞭如何。我試過Values.get_text以及Values.text,但這只是空白(可能是因爲所有的代碼都在列表或類似內容中)。

    如果我可以在「toolTipData」之後得到數據,那也可以。

    有沒有人介意幫助我?

    如果我錯過了任何內容,請提供反饋意見,以便我將來可以提出更好的問題。

    謝謝

    回答

    1

    不久,您想要獲取位於屬性標記內的一些信息。

    我所要做的就是:

    1. 打開網頁源瞭解哪來位於您的信息
    2. 使用find_all尋找合適的類屬性mod-ui-chart--dynamic
    3. 使用find_all位於每一個元素,取其屬性內容使用.get()
    4. 在屬性內容字符串中搜索術語'actualValues'
    5. 如果找到'actualValues',然後加載json並瀏覽它的值。

    請嘗試以下一段代碼。我評論過它,所以你應該能夠理解它在做什麼。

    代碼:

    from bs4 import BeautifulSoup 
    import urllib.request 
    import json 
    
    data = [] 
    List = ['AAPL'] 
    
    # Iterates Through List 
    for i in List: 
        # The webpage which we wish to Parse 
        soup = BeautifulSoup(urllib.request.urlopen('https://markets.ft.com/data/equities/tearsheet/forecasts?s=AAPL:NSQ').read(), 'lxml') 
    
        # Gathering the data 
        elemList = soup.find_all('div', {'class':'mod-ui-chart--dynamic'}) 
    
        #we will get the attribute info of each `data-chart-config` tag, inside each `div` with `class=mod-ui-chart--dynamic` 
        for elem in elemList: 
    
         elemID = elem.get('class') 
         elemName = elem.get('data-chart-config') 
    
         #if there's no value in elemName, pass... 
         if elemName is None: 
          pass 
    
         #if the term 'actualValues' exists in elemName 
         elif 'actualValues' in elemName: 
          #print('Extracting actualValues from:\n') 
          #print("Attribute id = %s" % elemID) 
          #print() 
          #print("Attribute name = %s" % elemName) 
          #print() 
    
          #reading `data-chart-config` attribute as a json 
          data = json.loads(elemName) 
    
          #print(json.dumps(data, indent=4, sort_keys=True)) 
          #print(data['chartData']['actualValues']) 
    
          #fetching desired info 
          val1 = data['chartData']['actualValues'][0] 
          val2 = data['chartData']['actualValues'][1] 
          val3 = data['chartData']['actualValues'][2] 
          val4 = data['chartData']['actualValues'][3] 
    
          #printing desired values 
          print(val1, val2, val3, val4) 
    
          print('-'*15) 
    

    輸出:

    1.9 1.42 1.67 3.36 
    --------------- 
    5.6785 6.45 9.22 8.31 
    --------------- 
    50557000000 42358000000 46852000000 78351000000 
    --------------- 
    170910000000 182795000000 233715000000 215639000000 
    --------------- 
    

    p.s.1:,如果你願意,你可以取消註釋elif loopprint()功能理解程序。

    p.s.2:如果你願意,你可以在val1 = data['chartData']['actualValues'][0]'consensusData'

    +0

    謝謝你改變了'actualValues',這當我嘗試將其他資產(IBM爲例)val1-完全適用於1資產情況(僅AAPL),但val4得到過分誇大。我會盡我所能找到一種方法將這本詞典拆分成一個列表,然後在每次運行時追加它。 –

    相關問題