Convert JSON lines to JSON array using jq

JayBay2279 picture JayBay2279 · Aug 2, 2017 · Viewed 9.1k times · Source

Firstly, I'm new to jq, like 1 day new, I'm also new to JSON, I'm an SQL guy so I'm learning fast but can't get my head around this ... so please bear with me.

I'm running Windows, using jq v1.5 on PowerShell.

I have multiple JSON files downloaded and they look like this:

{"Header":{"AssetClass":"Commodities","InstrumentType":"Forward","UseCase":"Forward","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"EUR","ExpiryDate":"2018-01-01","ReturnorPayoutTrigger":"Contract for Difference (CFD)","DeliveryType":"CASH","BaseProduct":"AGRI","TransactionType":"FUTR","FinalPriceType":"ARGM","ReferenceRate":"10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN","SubProduct":"GROS","AdditionalSubProduct":"FWHT"},"ISIN":{"ISIN":"EZX27M86B860","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Agriculture","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Forward AGRI GROS FWHT EUR 20180101","ShortName":"NA/Fwd AGRI FWHT EUR 20180101","ClassificationType":"JTAXCC"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Swap","UseCase":"Basis_Swap","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"SOS","ExpiryDate":"2208-12-10","ReturnorPayoutTrigger":"Total Return","DeliveryType":"OPTL","TransactionType":"ORIT","FinalPriceType":"IHSM","ReferenceRate":"NATURAL GAS-MONTHLY INDEX S. TEXAS (TETCO)-GAS DAILY PRICE GUIDE","OtherReferenceRate":"NATURAL GAS-MONTHLY INDEX W. LOUISIANA (TETCO)-GAS DAILY PRICE GUIDE","BaseProduct":"OTHR","OtherBaseProduct":"OTHR","SubProduct":"","AdditionalSubProduct":"","OtherSubProduct":"","OtherAdditionalSubProduct":""},"ISIN":{"ISIN":"EZBBH1XR9GV6","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Multi Commodity","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Swap Basis_Swap OTHR   OTHR   SOS 22081210","ShortName":"NA/Swap OTHR   SOS 22081210","ClassificationType":"STQTXE"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Swap","UseCase":"Multi_Exotic_Swap","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"LRD","ExpiryDate":"2200-01-31","ReturnorPayoutTrigger":"Contract for Difference (CFD)","DeliveryType":"CASH","TransactionType":"TAPO","FinalPriceType":"EXOF","UnderlyingInstrumentIndex":["BCOMF6","BCOMNG3"]},"ISIN":{"ISIN":"EZ286HJVY4Q2","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"UnderlyingAssetType":"Multi Commodity","BaseProduct":"MCEX","SubProduct":"","AdditionalSubProduct":"","FullName":"Commodities Multi_Exotic_Swap MCEX LRD 22000131","ShortName":"NA/Swap MCEX LRD 22000131","ClassificationType":"STQCXC"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Option","UseCase":"Option","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"TND","ExpiryDate":"2209-10-18","OptionType":"OPTL","OptionExerciseStyle":"AMER","ValuationMethodorTrigger":"Asian","DeliveryType":"CASH","TransactionType":"OTHR","FinalPriceType":"IHSM","ReferenceRate":"NATURAL GAS-NGPL (NICOR, NIPSCO, PGLC CITYGATE), NBPL-NICOR-ICE/10X MONTHLY","BaseProduct":"OTHR","SubProduct":"","AdditionalSubProduct":""},"ISIN":{"ISIN":"EZ2TK5CWL9Y4","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","UnderlyingAssetType":"Other","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"FullName":"Commodities Option OTHR   TND 22091018","ShortName":"NA/O OTHR  OPTL TND 22091018","ClassificationType":"HTMHAC"}}
{"Header":{"AssetClass":"Commodities","InstrumentType":"Option","UseCase":"Multi_Exotic_Option","Level":"InstRefDataReporting"},"Attributes":{"NotionalCurrency":"SOS","ExpiryDate":"2209-10-18","UnderlyingInstrumentIndex":["BCOMSI2","BCOMPR3T"],"OptionType":"CALL","OptionExerciseStyle":"AMER","ValuationMethodorTrigger":"Other Path Dependent","DeliveryType":"CASH","TransactionType":"ORIT","FinalPriceType":"BLTC"},"ISIN":{"ISIN":"EZ82L36B6225","Status":"New"},"TemplateVersion":1,"Derived":{"CommodityDerivativeIndicator":"TRUE","IssuerorOperatoroftheTradingVenueIdentifier":"NA","PriceMultiplier":1,"UnderlyingAssetType":"Multi Commodity","BaseProduct":"MCEX","SubProduct":"","AdditionalSubProduct":"","FullName":"Commodities Multi_Exotic_Option MCEX SOS 22091018","ShortName":"NA/O MCEX Call SOS 22091018","ClassificationType":"HTQBPC"}}

The files can be upwards of 1 GB in size.

For me to use these effectively, I need to turn the JSON lines into JSON Arrays, wrapping the file, prepending "[" and appending "]" and for each line separate with a comma (,).

Making the file look like this (in theory):

[
{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
  }
},
{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
}
}
]

So I've found jq and from my understanding i can run this

jq --slurp 'map(select(. >= 2))' Inputfile.json > OutputFile.json

This works but when running files larger than 200 MB i get "system out out of memory" error (if i use ISE) and if i use standard Powershell or CMD it takes ages. minutes (5+)

If i take --slurp out of the command, it works, it's quicker but the result looks like this:

[
{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
  }
}]
[{
"Header": {
  "AssetClass": "Commodities",
  "InstrumentType": "Swap",
  "UseCase": "Basis_Swap",
  "Level": "InstRefDataReporting"
},
"Attributes": {
  "NotionalCurrency": "EUR",
  "ExpiryDate": "2017-08-31",
  "ReturnorPayoutTrigger": "Contract for Difference (CFD)",
  "DeliveryType": "CASH",
  "BaseProduct": "AGRI",
  "OtherBaseProduct": "AGRI",
  "TransactionType": "FUTR",
  "FinalPriceType": "ARGM",
  "ReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "OtherReferenceRate": "10PPM ULTRA LOW SULPHUR DIESEL-CARGOES CIF NWE/BASIS ARA-PLATTS EUROPEAN",
  "SubProduct": "GROS",
  "AdditionalSubProduct": "FWHT",
  "OtherSubProduct": "GROS",
  "OtherAdditionalSubProduct": "FWHT"
},
"ISIN": {
  "ISIN": "EZ68CZDRFYY7",
  "Status": "New"
},
"TemplateVersion": 1,
"Derived": {
  "CommodityDerivativeIndicator": "TRUE",
  "UnderlyingAssetType": "Multi Commodity",
  "IssuerorOperatoroftheTradingVenueIdentifier": "NA",
  "PriceMultiplier": 1,
  "FullName": "Commodities Swap Basis_Swap AGRI GROS FWHT AGRI GROS FWHT EUR 20170831",
  "ShortName": "NA/Swap AGRI FWHT FWHT EUR 20170831",
  "ClassificationType": "STQCXC"
}
}
]

It creates an array for each line but the array is not comma separated, and thats not what I wanted.

So, how can I process large files, of multiple JSON lines, and create one file for the input file that is generated as a single array, comma separated, without using slurp?

I've read about inputs but not sure if that is relevant to what I need to do?

Answer

Shahbaz Ali picture Shahbaz Ali · May 18, 2020

Maybe, I am late but this is what you are looking for!

jq -s '.' in.json > out.json