的PowerShell：從文件

-2

刪除類似的思路考慮文件tbl.txt（150萬線），建這樣的：的PowerShell：從文件

Num1 ; Num2 ; 'Value' ; 'Attribute'

所以tbl.txt樣子：

 
    63 ; 193 ; 'Green' ; 'Color' 
152 ; 162 ; 'Tall' ; 'Size' 
230 ; 164 ; '130lbs' ; 'Weight' 
249 ; 175 ; 'Green' ; 'Color'  *duplicate on 'Value' and 'Attribute'* 
420 ; 178 ; '8'  ; 'Shoesize' 
438 ; 172 ; 'Tall' ; 'Size'  *duplicate on 'Value' and 'Attribute'*

我怎樣才能keept 'Value'和'Attribute' 上的第一個唯一行，並在上刪除以下重複行和'Attribute'？

結果應該是這樣的：

 
    63 ; 193 ; 'Green' ; 'Color' 
152 ; 162 ; 'Tall' ; 'Size' 
230 ; 164 ; '130lbs' ; 'Weight' 
420 ; 178 ; '8'  ; 'Shoesize'

任何幫助深表感謝。

來源

2017-10-17 SamNorton

你嘗試過什麼，以及如何有你試過失敗了怎麼辦？理想情況下，你應該提供一個你嘗試過的[MCVE]，並且包含錯誤消息和/或錯誤輸出的具體信息。 SO不是代碼寫入服務;最好的問題是提供有用信息的問題，以便那些回答問題的人可以指導你設計自己的正確答案。見[問]。 –

使用上面的搜索框並查看一些關於唯一值的現有問題，他們應該幫助您指出正確的方向。像這樣：[Powershell - 過濾唯一值]（// stackoverflow.com/q/9825060） –

最初我正在修改'cat tbl.txt | Get-Unique'，但沒有找到解決方案。 @James我還沒有看到我可以如何變成'Foreach-Object {$ _。Substring（0,2）} | Select-Object -unique'作爲行的長度變化的適當解決方案。 – SamNorton

通過Get-Content遍歷文本文件，通過字符串操作分開列'Value' ; 'Attribute'，然後使用HashMap，以檢查是否已經處理過類似的線 - 如果不是，輸出線一次。在代碼：

$map = @{}; 
Get-Content tbl.txt | ` 
      %{ $key = $_.Substring($_.IndexOf(';',$_.IndexOf(';')+1)+1); ` 
       If(-not $map.ContainsKey($key)) { $_; $map[$key] = 1 } ` 
       }

可選地，如在評論所提到的，可以使用group和應用相同的子串作爲分組繞圈，最後採取各組的第一個元素：

Get-Content tbl.txt | group {$_.Substring($_.IndexOf(';',$_.IndexOf(';')+1)+1)} ` 
        | %{$_.Group[0]}

來源

2017-10-17 17:17:23 davidhigh

假設你的數據是無頭：

Import-CSV "C:\folder\data.txt" –Delimiter ";" -Header Num1,Num2,Value,Attribute | Sort-Object -Property Value -Unique

使你所需的輸出：

Num1 Num2 Value  Attribute 
---- ---- -----  --------- 
230 164 '130lbs' 'Weight' 
420 178 '8'  'Shoesize' 
63 193 'Green' 'Color' 
152 162 'Tall' 'Size'

您可以使用導出，CSV導出結果：

Import-CSV "C:\folder\data.txt" –Delimiter ";" -Header Num1,Num2,Value,Attribute | Sort-Object -Property Value -Unique | Export-CSV "C:\folder\data2.txt" –Delimiter ";" -NoTypeInformation

來源

2017-10-18 07:53:26

的PowerShell：從文件

回答

相關問題