Portfolio Holdings Data: Content Retrieval
- Claude Paugh
- Apr 13
- 3 min read
Updated: Jun 22
As I mentioned in an earlier post, I choose Couchbase to warehouse the documents due to its SQL-like access, ability to have multiple nodes for horizontal scalability, and built in analytics features as well. I had intended to load company specific filings (10-Q, 10-K, etc.) and try the full-text search capability, but I have not been able to do that yet.
If you're a software engineer, there are various SDK's and connectors available. On the other hand if you just want to look at document content, either the built-in "Query" section on the Couchbase console, or a third-party tool that has a driver to connect. You can also buy drivers from a partner of Couchbase, and use those with any tool that supports Couchbase.
I use DataGrid from Jetbrains (bought a license), and they provide their own drivers:


As you can see in the result set above, the "columns" are keys from the underlying JSON document, and the values correspond to the data in the grid. References are the same when using DataGrid or Couchbase UI Query.
The "columns" are the keys, if you have nesting in the document, then it's the path within the JSON structure, e.g. first.second.third is the "column". The "table" in the from clause consists of bucket-name.scope.collection. You must use ticks( ` ) to surround the bucket-name, and can use them for the scope and collection, like I have above. There "where" clause and some functions like count, avg, min, max are the same.
Example SQL++ to get Funds
select distinct gi.regFileNumber as SECFileNumber,
gi.regStateConditional.regState as State,
gi.regStateConditional.regCountry as County,
gi.IsFinalFiling as IsFinalFiling,
gi.seriesName as SeriesName,
gi.seriesLei as SeriesLEI,
gi.seriesId as SeriesId,
gi.regName as FundCompany,
gi.PeriodEndDate as PeriodEndDate,
gi.ReportAuthorizedDate as ReportAuthorizedDate,
gi.ParentLEI as ParentLEI,
gi.regCik as CIK,
gi.regCity as City,
gi.regLei as LEI,
gi.regPhone as Phone,
gi.regZipOrPostalCode as PostCode,
round(tonumber(fp.cshNotRptdInCorD), 2) as CashNotReported,
round(tonumber(fp.assetsInvested), 2) as AssetsInvested,
round(tonumber(fp.assetsAttrMiscSec), 2) as Assets_Attributed_MiscSecurities,
round(tonumber(fp.totAssets), 2) as TotalAssets,
round(tonumber(fp.totLiabs), 2) as TotalLiabilities,
round(tonumber(fp.netAssets), 2) as NetAssets
FROM `us-fund-filings`.`Funds`.GeneralInformation gi JOIN `us-fund-filings`.`AssetManagers`.FundParents fp ON (fp.CIK = gi.regCik and gi.PeriodEndDate = fp.PeriodEndDate) WHERE gi.PeriodEndDate = '2023-02-28;'
Index Creation
If the number of documents being stored is large, index creation is a must. Since I have approx. 5+ years worth of portfolio filings, I created a number of them. Since Couchbase does not have user controlled partitions, I created indexes with WHERE clauses for date range control. The Couchbase Query optimizer does attempt to partition data, but it's upon execution, and the indexes are much more effective in implementing the concept of partitioning.
CREATE INDEX `adv_issuerCat_PeriodEndDate_assetCat2024` ON `us-fund-filings`.`Portfolios`.`Investments`(`PeriodEndDate`,`assetCat`) WHERE ("2024-01-01" <= `PeriodEndDate`);
CREATE INDEX `adv_PeriodEndDate_derivativeInfo_optionSwaptionWarrantDeriv_derivCat_2024_1` ON `us-fund-filings`.`Portfolios`.`Investments`(`PeriodEndDate`,((`derivativeInfo`.`optionSwaptionWarrantDeriv`).`derivCat`)) WHERE (("2024-01-01" <= `PeriodEndDate`) and (`PeriodEndDate` <= "2024-03-31"));Why did I choose to store this data?
I thought the consolidation of this data could serve a few different use cases:
Trending Across asset managers or within individual asset managers
Pattern analysis on what positions changed quarter over quarter, and mapping to news events could detect habits or patterns which could be an advantage
Accuracy of filings: I wanted to know if the filings matched the investor reports that the asset managers product quarterly. The answer should be yes, but if its no, why?
Currency exposure over periods of time, and which asset mangers have more or less exposure
Monthly fund flows within an asset manager complex, or across the industry in the U.S. This could be analyzed across asset types to determine which characteristics are popular at the moment.
