幫助使用wget和sed的linux shell腳本

嗨有人可以幫助我設置一個執行以下操作的shell腳本嗎？幫助使用wget和sed的linux shell腳本

wget來http://site.com/xap/wp7?p=1
查看HTML從提取的所有產品名稱的所有權之間= 「免費送貨產品名稱」> ...例如：標題= 「免費送貨HD7-Case001」>，HD7-Case001是提取。
輸出到products.txt
然後循環執行步驟1的過程。url http://site.com/xap/wp7?p=1其中「1」是頁號最多爲50的數字。 http://..wp7?p=1，http://..wp7?p=2，http://..wp7?p=3

我已經做了我自己的一些研究，有這麼多的代碼編寫自己...肯定需要大量的工作

#! /bin/sh 
... 

while read page; do 
wget -q -O- "http://site.com/xap/wp7?p=$page" | 
sed ... 

done < "products.txt"

來源

2011-01-28 acctman

http://xmlstar.sourceforge.net/ – 2011-01-28 07:47:31

是否有某個特定您需要使用wget和sed來解決這個問題？ – 2011-01-28 07:55:06

#/bin/bash 

for page in {1..50} 
do 
    wget -q "http://site.com/xap/wp7?p=$page" -O - \ 
    | tr '"' '\n' | grep "^Free Shipping " | cut -d ' ' -f 3 > products.txt 
done

的TR轉彎每個雙引號爲換行，所以TR的輸出將是這樣的：

<html> 
... 
... <tag title= 
Free Shipping [Product] 
> ...

基本上，這是將每個產品放在一條線上的一種方式。

接下來，的grep試圖扔掉所有其他行除了免運費開始的，所以其輸出應該是這樣的：

Free Shipping [Product1] 
Free Shipping [Product2] 
...

接下來，切正在提取出第三個「列」（由空格分隔），所以輸出應該是：

[Product1] 
[Product2] 
...

來源

2011-01-28 08:59:00

你可以用PHP相結合，爲XML解析

wget的bash腳本

#/bin/bash 

for page in {1..50} 
do 
    wget -q -O /tmp/$page.xml "http://site.com/xap/wp7?p=$page" 
    php -q xml.php $page >> products.txt 
done

xml.php

<? 
$file = '/tmp/'.$argv[1].'.xml'; 
// assumeing the following format 
//<Products><Product title="Free Shipping ProductName"/></Products> 

$xml = simplexml_load_file($file); 
echo $xml->Product->attributes()->title; 
/* you can make any replacement only parse/obtain the correct node attribute */ 
?>

不是一個好主意，但PHP simplexml提供一些簡單的方法來解析XML。
希望這可以是一些踢開始想法

來源

2011-01-28 08:41:16 ajreal

幫助使用wget和sed的linux shell腳本

回答

相關問題