top of page

AI Investing Machine: Part I

Several years ago I started with an idea of creating a knowledge graph around mutual funds, their holdings, and what legal entities were backing holdings. So I started to build a graph in Neo4J based on what I was storing for SEC filings in my Couchbase warehouse.


I had amassed several years of filings for Mutual Fund and ETF holdings, but their quality improved significantly starting in 2021. In approx. 2023, I had enough filings that I could build a knowledge graph, but my use case was not exactly knowledge graph friendly.


Why? I had historical data that I wanted to maintain for ML cases in addition to my initial use case. I decided to go forward and store it all in Neo4J anyway to follow KISS for my initial setup. Since Neo4J tried to load much of what it does into memory, it can be a challenge with large node numbers.


view from neo4j bloom
View from Neo4J Bloom

When I initially scoped loading the filings, mostly N-PORT forms, from the SEC, I wanted to restructure them from XML to JSON, to reduce memory and storage consumption in Couchbase. This meant an inventory of keys that I would search the JSON, which also flowed into the node structures in Neo4J. The filings has dates the filing occurred, the backing CIK (central index key from SEC), and your typical security/instrument identifiers: ISIN, CUSIP, SEDOL, and internal identifiers from the trades. But the SEC added the most import new key, which was LEI, or Legal Entity Identifier.


There was now a source to resolve relationships between different held instruments and what the legal structure where backing them. So I decided to ingest the LEI information from the only real public source: https://www.gleif.org/en.


Images displaying the nodes, relationships, and attributes of Knowledge Graph


The knowledge graph was constructed with the following nodes types and relationships, to answer specific questions. This is not a complete list of nodes or relationships below. It's to provide an idea of scope.


Nodes

Common Equity

Floating Rate Debt

Futures

Funds

Preferred Equity

Collateralized Debt

SWAPs

Fund Parent Company

Corporate Debt

Asset Backed Debt

SWAPTIONS

Fund Borrowers

Mortgage Backed Debt

Warrants

Country

Legal Entity

Municipal Debt

Options

Currency

Legal Entity LEI ISIN Reference

Convertible Debt

Forwards

Region

Legal Entity Relationships

Relationships

Asset Type -> Fund Holdings

Asset Type -> Legal Entity

Asset Type -> Legal Jurisdiction

Fund Series -> Fund Series Classes

Asset Type -> Country

Fund -> Legal Entity

Asset Type -> Headquarters Country

Fund -> Fund Flows

Asset Type -> Currency

Fund Parent -> Legal Entity

Asset Type -> Location Country

Fund -> Fund Performance

Country -> Region

Fund -> SubFunds

Asset Type -> Index Member

Common Equity -> Company Profile

There are other relationships that resolve many-to-many Legal Entities that are explicitly defined as well.


The use case objective for the combination of this data initially was:

  • risk measurement of currency exposures

  • position risk to institutions and markets

  • legal entity tracing for multi-national corporations

  • what is the impact chain for asset purchases


To demonstrate a portion of this, I started with expanding a fund in Neo4J Bloom. The fund has Loans and SWAPtions as asset holdings:



ree
ree

Next, the Legal Entity (LE) of the underlying SWAP asset wrapped by the SWAPtion, was a Counter Party for the SWAP traced to a company called Citigroup Global Markets:


ree

Which itself was under another Citigroup LE, called Citigroup Financial Products:


ree


Then the Citigroup company above was expanded, which revealed it as the Legal Entity and Not Centrally Cleared (NCC) Counter Party for several Repurchase agreements, and the Legal Entity for Floating Rate Debt:


ree

Looking into the Repurchase agreements a little further, they eventually link to the asset holder, which in this specific case are two mutual funds -> the light green objects with bank icons. We also see the Repurchase agreement currency/collateral currency, which is the Euro.


This means that the Citigroup linked company has exposure to the Euro currency, and could be impacted by larges changes in exchange rates on these assets. Its the same situation with the two mutual funds regarding Euro currency exposure. Many mutual funds and ETF's offset their currency risks by purchasing forward currency contracts. Had I expanded the funds in this case and they were based in the United States, we would probably see those 'forwards' as assets held by the fund(s).


ree

Following along another path from the Legal Entity of Citigroup Financial Products, leads us to additional firms that are operating in other countries, within another holding company. The first orange circle is a fund company with various other funds connected to it (four orange circles on right)


ree

Finally we trace the Citigroup Financial Products LE which is responsible for the above assets and funds, back to the Citigroup Inc. LE that is at the top of the hierarchy.


ree

As you can see, the ability to map Legal Entity Identifiers (LEI) between different organizations and their underlying structures can have a significant impact on our ability to navigate, assess, trace, and understand risks that are not easily discovered. I started with this as the foundation keystone to start building my AI application, since it offers powerful capabilities to quantify impacts across organizations and financial systems. Next I will dive into the architecture of the application and broaden its scope to reflect more features I was looking for.

bottom of page