Process for Estimating Tax
A peek into tax estimation process for redeeming units that are outside STCG zone
In the previous chapter, we've seen how to compute total number of LTCG taxation eligible units, if equity transaction history has been provided.
Picking up from where we’d left off in in part 1, we can start with planning for potential tax computations
At this point, we’ve computed units outside of STCG zone (purchased more than a year ago, that’s still in investor’s portfolio, eligible for LTCG taxation if redeemed). STCG stands for Short Term Capital Gain.
We need to compute these:
- tax liability, if investor were to sell all these units.
- total number of units present, where net tax liability is equal to or below 1L, if investors were to sell those
Tax is computed on realized gain, or booked profits. Realized gain, can be written as
SUM(selling price of units - purchase price of units).
Say, an investor purchased
U1units, on date
t1, at price
U2units, on date
t2, at price
U3units, on date
t3, at price
Assume selling price (NAV on day of selling) is
This investor has
(U1 + U2 + U3)available for selling. Realized gain is:
[(U1 * (P - P1) + U2 * (P - P2) + U3 * (P - P3)]
Which can be written as:
(U1 + U2 + U3) * P - (U1 * P1 + U2 * P2 + U3 * P3)
In previous section, we’ve computed effective value of
(U1 + U2 + U3), for any number of transactions.
P3; for all those transactions.
Only missing information is selling price,
What we need, is selling price as on today (the given day when investor is attempting the sell transaction).
In other words, we need a way to get access to latest NAV of these funds.
Because this NAV would change every day (equity mutual funds update NAVs at the end of the day, every business day); it’d be better to programmatically access it, like we used
GOOGLEFINANCE()for historic data.
GOOGLEFINANCE()can be one option here. But as we’ve learned in the previous section, it has issues (remember seeing
#NAall over the place?)
Instead, we’d go to the source!
Every AMC has a regulatory requirement to update their NAV daily with AMFI (Association of Mutual Funds in India). And AMFI makes this data available for parsing, in a CSV format, on their website.
Remember when you were thinking learning
TRANSPOSE()would’ve been of no use? Well, as you’re about to see, not all CSV data come in text files.
Google Sheets has an
IMPORTDATA()function, that’d do just fine. We provide it a URL for the AMFI NAV endpoint, and it’d fetch that latest raw NAV data.
Then we can split that raw NAV feed into cells, as we see fit.
This AMFI URL has ~20k entries. To be able to find the funds as in the CSV transaction dump, our excel matchers would be fine.
But the problem is this: the name of fund can appear differently in that list, as opposed to how it appears in the CSV.
Finding fund by matching strings would lead to misses.
A better option would be to find fund the way it’s usually done - by a unique identifier.
This unique identifier is already present in the AMFI NAV feed URL:
ISIN (International Securities Identification Numbers) is unique across global securities market. Not only can there be no two funds with same ISIN in India; there can be no two securities with duplicate ISINs in regulated markets, across the world!
We have to find ISIN for each of the funds in CSV transaction, manually, and add a column in the second sheet, next to each fund, to add its ISIN.
List of ISIN is also present in AMFI NAV feed.
This is actually a straight-forward, one-time exercise. It should also make it clear to the reader why having too many funds to portfolio can be cumbersome.
Let’s start by adding the ISIN column first
In the second sheet (not the one named
data), add a column next to fund’s name. You can use Insert Column.
Insert ISINs as follows (copy-paste from below):
Final results, should look like this
After ISIN has been added - Dark Mode
After ISIN has been added - Light Mode
Follow these steps
- Create a new sheet, name it AMFI NAV (Raw)
- In a cell, enter the AMFI URL:
IMPORTDATA()function, and pass the cell ID of the URL above, as argument
- Given it few seconds, it’d fetch latest NAV from AMFI for all mutual funds. It should add some 18k+ rows.This data is in CSV format, separated by semicolon, per line.
- Create another new sheet, name it AMFI NAV (Processed)
- In this sheet, add somewhere this formula
=IF(ISBLANK(<cell ID>), , SPLIT(<cell ID>, ";"), where
<cell ID>points to a row of semicolon-separated data, in the sheet AMFI NAV (Raw).
ISBLANK(<cell ID>)checks if the cell is empty. As you can see from the data, there can be plenty of empty cells. Invoking
SPLIT()on these, would result in an error, it’d show up as a
#VALUEerror on spread sheet.
IF(condition, arg1, arg2)checks if
conditionis true for a cell (in this case, if the cell is empty). If so, the first argument
arg1in inserted in the result cell (not the cell on which condition is being evaluated). Otherwise,
=IF(ISBLANK(<cell ID>), , SPLIT(<cell ID>, ";")basically means if
<cell Id>is empty, don’t put anything, else split the content of the cell by semicolon, and put the results in cells.
- Use autofill drag to fill out all 18k+ rows in new sheet.
Technically, you won't need to fill out all 18k+ rows. As long as you fill out enough number of rows, such that all funds in that other sheet appear in those rows, it's good enough.
Refer to the following video(s) for reference
Your sheets would now have latest available NAV from all mutual funds, in a nice table format.
We've introduced a few new Google Sheets functions in this section:
In this section, we’d compute net redemption value of units older than 1 year.
This is total available units older than one year (already computed earlier), multiplied by latest NAV.
We can look up latest NAV using
VLOOKUPwith ISIN as search query.
VLOOKUP()stands for vertical lookup or row-wise lookup, across columns. It’s a way of searching for matching data.
LOOKUP(); but for our use case,
VLOOKUP()would work fine for now.
Follow these steps:
- Come back to the main sheet, which has information on available units and sold units.
- Add a column at the end, titled Latest Value
VLOOKUP()for the top-most row as follows:VLOOKUP(<cell ID that has ISIN for the fund>,<range of data from processed NAV sheet>,<column index, that has NAV for the fund>false)1st argument,
<cell ID that has ISIN for the fund>is ISIN, but we’re doing it this way, to be able to drag and auto-fill.2nd argument,
<range of data from processed NAV sheet>is a bit interesting.
VLOOKUP()does not work if the column we are searching for is not the first column of a range. In our processed NAV sheet, the ISIN is the second column.We have to pick a range, leaving out the first column.This range is a combination of rows and cells. For instance, if your processed NAV sheet starts at
Scheme Codetext in it; then, our range would be something like
C9:F18123(leaving out the last, date column).Since we’d be using drag & auto-fill, we can lock these cells:
<column index, that has NAV for the fund>, is column index of the column for the result.We’re looking for NAV when ISIN matches, which is the 4th column in our range (ISIN is the first column in this range). Hence, we can put this value as 4.Final argument, is
false. It tells
VLOOKUP()that the dataset in the range is not sorted. Our data set is not sorted on the basis of ISIN (sorted alphabetically on AMC name, actually), hence
falseis the right value to pass here.
Refer to this video to follow along steps:
Final outcome in the newly added column, should be similar to these screenshots:
Latest Valuation of Available Units - Dark Mode
Latest Valuation of Available Units - Light Mode
On the surface, this sounds easier than previous steps.
All one has to do, is go and sum on units and NAV column, where date is much before the date 1 year ago.
In fact, investment amount is provided in the CSV transaction dump, so multiplication isn’t even needed (invested amount = unit x NAV on that date).
Except, it’s not that simple.
We have realized that even if units were acquired more than one year ago, doesn’t mean some of those weren’t sold in last 1 year, or before.
Remember that available units is NOT sum of all units purchased more than a year ago. We had to subtract the sold units (sold at any point in time, not just more than a year ago) to arrive at LTCG-eligible units.
In other words, we need to find unit purchase price for only these units. Not that of all units.
Another way to look at it: keep adding purchase price, until total number of units added is same as LTCG-eligible units.
An example to illustrate this idea:
Say, an investor is buying 10 units every month, from 2018 January, to 2020 December. That's a total 36 month period.
At the end of December 2020,
units purchased between January 2018 and December 2019, are eligible for LTCG.
Now assume that in 2020, at some point, this investor has also sold 54 units.
Then, at that point, they’ve
units available for LTCG-eligible redemption.
What were the purchase price(s) for these 186 units?
More importantly, when were these 186 units acquired?
We know that investor purchased 10 units every month, from Jan 2018 to Dec 2020. Investor has also sold 54 units some time in 2020.
Redemption works in a FIFO (First In First Out) manner in folio. Meaning, earliest purchased units are to be redeemed first.
We first need to account for 54 units that were sold. These units were acquired as follows -
Once we’ve depleted the earliest-purchased-earliest-sold scheme for already redeemed units, we can start to account for remaining 186 units.
These 186 LTCG-eligible units were acquired as follows:
When computing purchase price for these 186 units, we need to use purchase price from these months in the table; and not purchase price from January 2018.
In other words, this is going to be the most complex part of the computation.
To recap, we compute purchase cost / acquisition cost as follows:
- Account for existing sell transactions, depleting units, starting from initial purchase transaction; until total units add up to all units sold so far.
- Start adding up units from this date, till it adds up to total available LTCG-eligible units.
Luckily, we have a special scenario on our hands.
In our case, if you notice the units sold and LTCG-eligible units in the column, investor has either never sold any units from their current holdings in a specific fund; or they’ve completely sold off all their units in that specific fund.
Which means, for funds where there are LTCG eligible units; we do NOT need to account for previous sell transactions.
In other words, one can start adding up the purchase transaction costs from the very first purchase.
This won’t be the case for a typical CSV transaction dumps for most investors.
We’ve so far carefully avoided discussing the grandfathering clause in equity LTCG computation.
Union budget 2018, defined purchase price of equity, for LTCG computation as higher of price as on 31st January 2018, or actual purchase price - if units were bought before this date.
To factor this in, we’d need information on NAV price as on 31st January, for each of the funds present in the CSV transaction dump, which has LTCG-eligible units.
The AMFI URL for latest NAV won’t do.
Instead, we can use this URL for fetching historic data:
This URL returns data in CSV format as before. However, it’s specific to each AMC, identified by an AMC code.
An AMC code is a number, between 1 and 100. However, not all numbers between 1 and 100 correspond to a valid AMC code. You’d have to try out each number, and see which ones result in valid URLs that return data.
For example, Nippon AMC is identified by AMC code 21. This URL would return NAV history of all Nippon AMC funds, as on 31st January 2018:
You can follow similar steps as earlier to incorporate data from this URL into your calculations.
We won’t be covering the last bit of calculation here, as it’d be quite complex.
It’s left as an exercise to the reader; as it won’t be of much value to do these with Excel / Spreadsheet, for most investors.
To achieve such conditional summing-up, one would most likely need to use
QUERY()function, writing something akin to SQL queries.
It’d also be much harder to update or understand. for another person going through same excel sheet or spreadsheet; which would have such esoteric constructs.
At that point, using a for-loop equivalent in commonly used programming languages (Python / Node / Java / Ruby / Go etc.) would be much easier to reason about.
Phew! We covered a lot of ground in these three chapters.
We can only hope power of spreadsheets are becoming more and more apparent to you. Simultaneously, you've also started seeing the rough edges.
Excel / spreadsheets with complex formulas can be hard to maintain - it's tough to understand and update if needed.
We'd gradually address these in upcoming chapters and series.