2013-04-16 54 views
0

我想從python中使用feedparser解析來自url的RSS提要。無法解析rss提要

>>> import feedparser 
>>> d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801') 
>>> d 
{'feed': {'summary': u'<span><h1>Server Error in \'/mobile\' Application.<hr color="silver" size="1" width="100%" /></h1>\n\n    
<h2> <i>Attempted to divide by zero.</i> </h2></span>\n\n   <font face="Arial, Helvetica, Geneva, SunSans-Regular, sans-serif ">\n\n   <b> Description: </b>An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.\n\n   <br /><br />\n\n   <b> Exception Details: </b>System.DivideByZeroException: Attempted to divide by zero.<br /><br />\n\n    
<b>Source Error:</b> <br /><br />\n\n   <table bgcolor="#ffffcc" width="100%">\n    <tr>\n     <td>\n      <code>\n\nAn unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.</code>\n\n     </td>\n    </tr>\n   </table>\n\n   <br />\n\n   <b>Stack Trace:</b> <br /><br />\n\n   <table bgcolor="#ffffcc" width="100%">\n    <tr>\n     <td>\n      <code><pre>\n\n[DivideByZeroException: Attempted to divide by zero.]\n System.Decimal.FCallDivide(Decimal&amp; d1, Decimal&amp; d2) +0\n System.Decimal.Divide(Decimal d1, Decimal d2) +17\n Martjack.CMS.PageControlsModelComp.GetPluginDataEnt(PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, PageControlModel&amp; objPageControlModel, ProductEnt_RE ProductEnt, String MobileVersion) +2324\n 
Martjack.CMS.PageControlsModelComp.GetPageControlOutputData(PageModel pagemodel, PageControlEnt objPageControlEnt, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, ProductEnt_RE ProductEnt, String siteurl) +694\n Martjack.CMS.PageControlsModelComp.GetPageControlModels(PageModel Pagemodel, MerchantENT MerchantEnt, String seocid, String combiType, String MobileVersion, DNDPageControlViewCollection objDNDPageControlViewCollection, Boolean isdndrequest, Int64 pgcontrolid, String siteurl) +919\n Martjack.CMS.PageModelComp.GetPageModel(MerchantENT MerchantEnt, Int32 predefinedPageId, Boolean isPredefined, ChannelType channel, String seocid, String Bid, String combiType, String MobileVersion, Boolean isDndRequest, 
DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +1717\n MartJack.Facade.CMSFacade.GetPageModel(MerchantENT MerchantEnt, Int32 PageId, Boolean isPredefined, ChannelType channel, String seocid, String bid, String combitype, String mobileversion, Boolean isDndRequest, DNDPageControlViewCollection ObjDNDPageControlViewCollection, Boolean ControlsInfo, Int64 pgcontrolid) +119\n MobileECommerce.MobileECommerce.ProductsController.GetPageModelByRequest(String seoid, String bid) +227\n MobileECommerce.MobileECommerce.ProductsController.Index(String id, String seobrand, String category, String categoryparent) +54\n lambda_method(Closure , ControllerBase , Object[]) +272\n 
System.Web.Mvc.ActionMethodDispatcher.Execute(ControllerBase controller, Object[] parameters) +17\n System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +212\n System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +239\n System.Web.Mvc.&lt;&gt;c__DisplayClass15.&lt;InvokeActionMethodWithFilters&gt;b__12() +56\n System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodFilter(IActionFilter filter, ActionExecutingContext preContext, Func`1 continuation) +282\n System.Web.Mvc.&lt;&gt;c__DisplayClass17.&lt;InvokeActionMethodWithFilters&gt;b__14() +20\n System.Web.Mvc.ControllerActionInvoker.InvokeActionMethodWithFilters(ControllerContext controllerContext, IList`1 filters, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +201\n System.Web.Mvc.ControllerActionInvoker.InvokeAction(ControllerContext controllerContext, String actionName) +351\n System.Web.Mvc.Controller.ExecuteCore() +99\n System.Web.Mvc.ControllerBase.Execute(RequestContext requestContext) +94\n System.Web.Mvc.ControllerBase.System.Web.Mvc.IController.Execute(RequestContext requestContext) +10\n 
System.Web.Mvc.&lt;&gt;c__DisplayClassb.&lt;BeginProcessRequest&gt;b__5() +43\n System.Web.Mvc.Async.&lt;&gt;c__DisplayClass1.&lt;MakeVoidDelegate&gt;b__0() +21\n System.Web.Mvc.Async.&lt;&gt;c__DisplayClass8`1.&lt;BeginSynchronous&gt;b__7(IAsyncResult _) +12\n System.Web.Mvc.Async.WrappedAsyncResult`1.End() +53\n System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +28\n System.Web.Mvc.Async.AsyncResultWrapper.End(IAsyncResult asyncResult, Object tag) +15\n System.Web.Mvc.&lt;&gt;c__DisplayClasse.&lt;EndProcessRequest&gt;b__d() +34\n System.Web.Mvc.SecurityUtil.&lt;GetCallInAppTrustThunk&gt;b__0(Action f) +7\n System.Web.Mvc.SecurityUtil.ProcessInApplicationTrust(Action action) +23\n System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +68\n 
System.Web.Mvc.MvcHandler.System.Web.IHttpAsyncHandler.EndProcessRequest(IAsyncResult result) +9\n System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +714\n System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean&amp; completedSynchronously) +240\n</pre></code>\n\n     </td>\n    </tr>\n   </table>\n\n   <br />\n\n    
<hr color="silver" size="1" width="100%" />\n\n   <b>Version Information:</b>\xa0Microsoft .NET Framework Version:4.0.30319; ASP.NET Version:4.0.30319.272\n\n   </font>'}, 'status': 302, 'version': u'', 'encoding': u'utf-8', 'bozo': 1, 'headers': {'content-length': '11348', 'x-powered-by': 'ASP.NET', 'set-cookie': 'SERVERID=HAS14; path=/', 'originserver': 'HAS14', 'server': 'Microsoft-IIS/7.5', 'connection': 'close', 'cache-control': 'private', 'date': 'Tue, 16 Apr 2013 08:03:59 GMT', 'content-type': 'text/html; charset=utf-8', 'x-aspnet-version': '4.0.30319'}, 'href': 
u'http://www.shop.inonit.in/mobile/Products//NA/NA/0', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('not well-formed (invalid token)',)} 

我什麼也沒得到輸出,而如果你去鏈接(http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801)它顯示了一大堆的東西! 也許它會將我重定向到其他一些不存在的頁面(因爲我試圖使用scrapy抓取本網站的各個頁面,但無法執行,因爲我被重定向到了一些不存在的url)。

對此的任何幫助將是偉大的。謝謝!

+0

你是什麼意思「在輸出什麼」呢? '>>> len(d ['feed'] ['summary'])5601',那裏有一個很好的'被零除'的信息。 ' –

+1

ah對不起,我的意思是沒有任何相關的東西,如元素(標題,價格等),顯然它不能讀取飼料,但如果你打開鏈接,你會看到所有的數據 –

回答

1

你使用代理服務器嗎? 如果你是,做這種方式 -

import urllib2, feedparser 
proxy = urllib2.ProxyHandler({"http":"proxy:port"}) 
d = feedparser.parse('http://www.shop.inonit.in/RSSFeedDetails.aspx?PID=801', handlers = [proxy])