0

我有多個json文件。我必須使用apache spark來解析它。它嵌套了關鍵的init。我必須打印所有欄和嵌套鍵。如何從json文件中使用java中的apache spark創建嵌套列

這些文件也有嵌套鍵。 我想要獲取所有列名稱以及嵌套的列名稱。我怎麼能得到它。

我想這樣的:在文件

String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-01.json,/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-02.json"; 

String[] jsonFiles = jsonFilePath.split(","); 

Dataset<Row> people = sparkSession.read().json(jsonFiles); 

JSON結構:

{ 
    "Name":"Vipin Suman", 
    "Email":"[email protected]", 
    "Designation":"Programmer", 
    "Age":22 , 
    "location": 
      { 
      "City":"Ahmedabad", 
      "State":"Gujarat" 
      } 
} 

我得到的結果:

people.show(50, false); 

Age | Designation | Email   | Name  | Location 
------------------------------------------------------------ 
22 |Programmer |[email protected] | Vipin Suman|[Ahmedabad,Gujarat] 

我要像數據:

Age | Designation | Email   | Name  | City  | State 
------------------------------------------------------------ 
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad |Gujarat 

或類似: -

Age | Designation | Email   | Name  | Location 
--------------------------------------------------------------- 
22 |Programmer |[email protected] | Vipin Suman| Ahmedabad,Gujarat 

如果scema這個樣子

root 
|-- Age: long (nullable = true) 
|-- Company: struct (nullable = true) 
| |-- Company Name: string (nullable = true) 
| |-- Domain: string (nullable = true) 
|-- Designation: string (nullable = true) 
|-- Email: string (nullable = true) 
|-- Name: string (nullable = true) 
|-- Test: array (nullable = true) 
| |-- element: string (containsNull = true) 
|-- location: struct (nullable = true) 
| |-- City: struct (nullable = true) 
| | |-- City Name: string (nullable = true) 
| | |-- Pin: long (nullable = true) 
| |-- State: string (nullable = true) 

和JSON結構

{ 
    "Name":"Vipin Suman", 
    "Email":"[email protected]", 
"Designation":"Trainee Programmer", 
"Age":22 , 
"location": 
    {"City": 
      { 
      "Pin":324009, 
      "City Name":"Ahmedabad" 
      }, 
    "State":"Gujarat" 
    }, 
"Company": 
      { 
      "Company Name":"Elegant", 
      "Domain":"Java" 
      }, 
"Test":["Test1","Test2"] 

} 

那又怎麼能找到嵌套的關鍵。並表示在適當的formet表

+1

請準備好:輸入數據樣本,你做了什麼,有什麼問題? –

回答

1

要在以上預期的格式顯示數據,可以使用下面的代碼:

people.select("*", "location.*").drop("location").show 

它會給下面的輸出:

+---+-----------+-----------------+----------+---------+-------+ 
|Age|Designation|   Email|  Name|  City| State| 
+---+-----------+-----------------+----------+---------+-------+ 
| 22| Programmer|[email protected]|VipinSuman|Ahmedabad|Gujarat| 
+---+-----------+-----------------+----------+---------+-------+ 
+0

非常感謝@himanshuIIITian的回覆。 我可以再問你一個問題嗎? 如果我不知道什麼關鍵是嵌套的我怎麼能找到它。 或者如果我有多個嵌套列,那麼我怎麼才能找到並解決這種情況。 –

+0

@Vpn_talent這是不可能的,因爲如果我們不知道數據框的模式,那麼我們不知道它是否嵌套。 – himanshuIIITian

+0

@Vpn_talent這個答案解決了你的問題嗎? – himanshuIIITian

相關問題