如何從json文件中使用java中的apache spark創建嵌套列

我有多個json文件。我必須使用apache spark來解析它。它嵌套了關鍵的init。我必須打印所有欄和嵌套鍵。如何從json文件中使用java中的apache spark創建嵌套列

這些文件也有嵌套鍵。我想要獲取所有列名稱以及嵌套的列名稱。我怎麼能得到它。

我想這樣的：在文件

String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json"; 

String[] jsonFiles = jsonFilePath.split(","); 

Dataset<Row> people = sparkSession.read().json(jsonFiles);

JSON結構：

{ 
    "Name":"Vipin Suman", 
    "Email":"[email protected]", 
    "Designation":"Programmer", 
    "Age":22 , 
    "location": 
      { 
      "City":"Ahmedabad", 
      "State":"Gujarat" 
      } 
}

我得到的結果：

people.show(50, false); 

Age | Designation | Email   | Name  | Location 
------------------------------------------------------------ 
22 |Programmer |[email protected] | Vipin Suman|[Ahmedabad,Gujarat]

我要像數據：

Age | Designation | Email   | Name  | City  | State 
------------------------------------------------------------ 
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad |Gujarat

或類似： -

Age | Designation | Email   | Name  | Location 
--------------------------------------------------------------- 
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad,Gujarat

如果scema這個樣子

root 
|-- Age: long (nullable = true) 
|-- Company: struct (nullable = true) 
| |-- Company Name: string (nullable = true) 
| |-- Domain: string (nullable = true) 
|-- Designation: string (nullable = true) 
|-- Email: string (nullable = true) 
|-- Name: string (nullable = true) 
|-- Test: array (nullable = true) 
| |-- element: string (containsNull = true) 
|-- location: struct (nullable = true) 
| |-- City: struct (nullable = true) 
| | |-- City Name: string (nullable = true) 
| | |-- Pin: long (nullable = true) 
| |-- State: string (nullable = true)

和JSON結構

{ 
    "Name":"Vipin Suman", 
    "Email":"[email protected]", 
"Designation":"Trainee Programmer", 
"Age":22 , 
"location": 
    {"City": 
      { 
      "Pin":324009, 
      "City Name":"Ahmedabad" 
      }, 
    "State":"Gujarat" 
    }, 
"Company": 
      { 
      "Company Name":"Elegant", 
      "Domain":"Java" 
      }, 
"Test":["Test1","Test2"] 

}

那又怎麼能找到嵌套的關鍵。並表示在適當的formet表

來源

2017-04-24 Vpn_talent

請準備好：輸入數據樣本，你做了什麼，有什麼問題？ –

要在以上預期的格式顯示數據，可以使用下面的代碼：

people.select("*", "location.*").drop("location").show

它會給下面的輸出：

+---+-----------+-----------------+----------+---------+-------+ 
|Age|Designation|   Email|  Name|  City| State| 
+---+-----------+-----------------+----------+---------+-------+ 
| 22| Programmer|[email protected]|VipinSuman|Ahmedabad|Gujarat| 
+---+-----------+-----------------+----------+---------+-------+

來源

2017-04-24 13:09:50 himanshuIIITian

非常感謝@himanshuIIITian的回覆。我可以再問你一個問題嗎？如果我不知道什麼關鍵是嵌套的我怎麼能找到它。或者如果我有多個嵌套列，那麼我怎麼才能找到並解決這種情況。 –

@Vpn_talent這是不可能的，因爲如果我們不知道數據框的模式，那麼我們不知道它是否嵌套。 – himanshuIIITian

@Vpn_talent這個答案解決了你的問題嗎？ – himanshuIIITian

如何從json文件中使用java中的apache spark創建嵌套列

回答

相關問題