Big Data and AI-driven Approaches forSDG NowcastingRichard V. RothenbergExecutive Director, Global A.I. Corporation; ResearchAffiliate, Lawrence Berkeley National Laboratory

Classification of Alternative Data Sources

Alternative Data FeaturesBased on Doug Laney, 2001

Classification of Machine Learning Techniques

Natural Language Processing AnalysisData Input News and Social Media Geo-tagged Data NGO Reports Websites Crowdsourced data Company reports, SECLevel 1:Data PreparationLevel 2:ModellingLevel 3:Insight (Output)Tokenization(words & sentences)TF-IDFSentimentIndexWord VectorsWordLemmatizationN-gram ModelsGovernment &Multilateral DataLDA Topic ModelsKeywordNetworkEdgar WordCloudPart of Speech TaggingDeep Learning, BERTText Summarization

Non-Financial Data: From ESG to SDGsAbout ESG Environmental, Social and Governance (ESG) is used toscreen investments Based largely self-reported data the lack of ESG standards and metrics results in significant'green-washing' and data biases. ESG metrics are updated infrequently, typically on anannual basis. Due to the lack of agreed ESG standards majordiscrepancies exist across company ESG ratings This has led to significant noise and a lack of useful ESGdata for investment purposes.Comparison of ESG Scores from FTSE vs MSCI6

Non-Financial Data: From ESG to SDGsAbout the SDGsThe SDGs are emerging as the new standard to measure thesustainability footprint of Companies, Countries, and other investableassets.According to the UN Principles for Responsible Investment, the SDG’srelevance to responsible investors can be summarized in 5 categories:1. The SDGs are a critical part of investor’s Fiduciary Duty2. Macro Risks: the SDGs are an unavoidable consideration for“Universal Owners”.3. Macro Opportunities: the SDGs will drive Global Economic Growth.4. Micro Risks: the SDGs as a Risk Framework5. Micro Opportunities: the SDGs as a Capital Allocation Guide.7

UNCTAD-ISAR Global Core Indicators The UNCTAD-ISAR is the Intergovernmental Working Group of Experts on InternationalStandards of Accounting and Reporting (ISAR), the United Nations focal point on accountingand corporate governance matters, as well as sustainability standards for companies. Our UNCTAD Global Core Indicators (GCI) Ratings are based on the UNGC-ISAR set of indicatorswhich have resulted from several years of multi-stakeholder discussions among Governments,leading Regulators and Standard-setting agencies such as GRI, FRSB, among others. Theseindicators incorporate the G20’s FSB Task Force on Climate Disclosure (TFCD) indicators, amongothers. The GCI enable availability of comparable indicators at a company level on the rational use ofresources such as water, energy, land; on emissions and waste reduction; good governance,human resource development and gender equality. Consistent with financial reporting requirements and in alignment with the SDG macroindicators on the use of financial, natural and human resources at a national level Core SDG indicators for companies are instrumental for measuring the only SDG target wherethe private business sector is mentioned: 12.6.1. ”number of companies publishingsustainability reports”, which UNCTAD is developing jointly with UN Environment as cocustodians of this indicator.8

UNCTAD-ISAR Global Core ty/9

UNCTAD-ISAR Global Core ty/10

Risk-Return-Impact FrameworkSource: UN PRI, SDG Investment Case

SDG Footprint: Entity Network MappingEnvironmentPublic SectorCivil SocietyParent CompanySubsidiariesSupplyChain12

Net SDG Footprint: Use of NLP and Alternative DataDifferences in terms ofESG 2.0/SDG footprintbetweentheselfreported data from acompany and alt data.The score indicates thedegree of positivity andnegativity in relationshipto each SDG.Forexample, for the SDG #13(climateaction),acompany would get amore negative score aftera chemical spill thatpollutesanentireecosystemthanacompany that increasesits carbon emissions by5%.The scores areadjusted by Sector.SDG SCORESSelf-Reported DataSDG SCORESAlternative Data13

SDG Footprint: Materiality Analysis

SDG Footprint: Portfolio vs. BenchmarkThe following graph showsthe SDG footprint of aportfolioversusthebenchmark. This includesboth positive and negativeSDG scores and enablesthe assessment of the netSDG footprint of theportfolio.15

SDG Footprint: SDG Scores & Company ValuationThere is a statisticallysignificantrelationshipbetween our SDG scoresand companies valuationand fundamental ratios.For example, the followinggraph shows that industrysectors with high SDGscores across all sectorstend to have highervaluations and lower costof capital.Earnings Yield16

Global SDG Private Sector Footprint: Regional TrendsThis is the output of a global analysis of the SDG footprint of the private sector. The results incorporate data from19,819 companies across Africa, Asia, Europe, Oceania and the Americas. The following illustrates the results for theSDG # 8- Decent Work and Economic Growth.17

Country SDG Footprint: Non-Financial Country RiskIdentify trends for each SDG at the Country level which serves as a proxy for non-financial Country Risk.This scores can be linked to asset price returns using liquid securities.18

WEF Global Risks & Inter-linkagesGlobal Economic Forum’s Global Risks Report, 2017

SDG Country Scores: Time Series Analysis– HUNGER–Germany– GENDER– Singapore– ECONOMY– Switzerland

SDG Inter-linkages Analysis: Bayesian Network ModelA Bayesian Network (BN)approach can be used tomodeltheinterlinkagesbetween the SDGs at theCountry level. This approachcan be helpful to analyze theinterdependencies of theSDGs.21

Richard V. RothenbergExecutive Director, Global A.I. Corporation; ResearchAffiliate, Lawrence Berkeley National [email protected] 1(917)399-7840


SDG Company Footprint: IntroductionBACKGROUNDThe lack of ESG standards and metrics based primarily on self-reported data by companies has ledto significant 'green-washing' and data biases. A new alternative based on the intersection ofArtificial Intelligence and the United Nations Sustainable Development Goals (SDGs) can helpovercome existing shortcomings of ESG to measure the sustainability footprint of companies,including a more standardized taxonomy and the use a large-scale unstructured data that canprovide more comprehensive and timely insights.There are significant practical challenges to quantify the SDG footprint of companies. One majorissue is that in contrast to ESG approaches which only have 3 categories, the SDGs contain 17categories with more than one hundred goals and indicators. Furthermore, the vast majority ofdata available on US and International companies is unstructured and highly fragmented. Inaddition, most of the data is not available in English but in local languages - particularly in Europe,Asia and Latin America - and this presents enormous challenges for institutional investors toextract, analyze and quantify complex, fragmented data associated with the SDGs.Another issue with existing ESG metrics is that they are updated infrequently. This makes themlargely irrelevant for investors who need to react quickly to emerging negative sustainability issues.The capacity of new technologies to quantify and track thousands of SDG factors and eventsglobally in a more timely manner can contribute to make SDG indicators more relevant forinvestors and provide more up-to-date signals that can be used for both ESG and mainstreamstrategies, including tactical and strategic asset allocation, bottom-up equity selection based onSDG scores, long-short equity and other investment strategies.For this purpose, it is important to leverage Big Data and Artificial Intelligence technologies toextract, process and analyze large-scale structured and unstructured data on SDG-related factors,which can then enable the integration of SDG factors into the decision-making of global investors.Typically, companies carry out voluntary reporting on their sustainability performance in order toassure their shareholders and investors of their compliance to regulations.However, as more companies are wary of the adverse impact of negative sustainabilityperformance on investor decisions, they may fail to disclose negative information. With regardto environmental issues, greenwashing, where companies use deceptive marketing to appearmore eco-friendly, has been on the rise. Big Data enhances reported data with “alternativedata” using artificial intelligence, machine learning and natural language processing (NLP) tocull through tens of thousands of news items, social media and reports in dozens of languages,providing up to date information going beyond what is present in unaudited, self-reportedannual firm reports, or firms’ marketing efforts.Moreover, Big Data can make this information available on a daily basis for investors,governments and all stakeholders – not just annually when a firm reports an unauditedsustainability report. Thus, a Big Data approach significantly reduces self-reporting bias and‘greenwashing’ and can show which firms are effectively having a positive SDG footprint. Ofcourse, there are scenarios in which the technology can go wrong or provide imperfectinformation; relying on publicly available information such as newspaper articles, may lead tofalse or biased scores, for example. Other issues include fake news, articles that commemoratenegative events from the past, major discrepancies between reported and third-party data,among others. For these reasons, it is necessary to perform extensive manual verification ofdata to evaluate if the analysis corresponds to reality and implement preventive measures.The SDG footprint can show how companies can have an either positive or negative net impacton SDGs and potentially reveal hidden risks. This creates incentives for corporations toquantify and increase their net SDGs contributions and SDG ratings in order to become moreattractive for investors concerned with sustainable investments, which control trillions inassets under management. It can also provide increased transparency for investorengagement strategies. Finally, for investors and companies alike, such measured SDGfootprints can help quantify how investing in SDGs contributes to long-term investmentperformance. Building on an institutional investment framework which incorporates andmeasures the net SDG impact of public and private entities and prices their long-term effectsas externalities, can then incentivize public corporations and investors to mobilize capitaltowards the SDGs at the scale needed, and ultimately contribute to long-term economicgrowth.

SDG Company Footprint: BackgroundABOUT GLOBAL AI DATAGlobal AI technology uses state-of-the-art Big Data and Artificial Intelligencetechniques to access massive amounts of structured and unstructured data frommore than 100,000 sources across more than 150 Countries and 60 languages toreplace dated, slow and expensive manual processes used for sustainability andmateriality analysis.These new technologies can be used to mitigate ‘SDG washing’ by not restricting datato self-reported documents from companies and instead extracting data from tens ofthousands of sources from around the world. Publicly available sources include news,social media, regulatory filings, government reports, blogs, twitter, industry-specificpublications, sustainability reports, NGOs, among others.For this report, we use Global AI’s firm specific SDG scores and ratings. The companyprovides raw scores, a short-term and long-term rating. While we use the short -termratings, the information is averaged over a year, so their measurement represents arelatively long-term measurement of SDG footprint. As a background to the GlobalAI scores, the company extracts, filters and cleans massive amounts both structuredand unstructured data, including self-reported company data, news articles, blogs,NGO Reports, Social Media, etc. Specialized algorithms map the raw data to specificcompanies and associated entities such as subsidiaries, using different combinationsof company names, abbreviations, tickers, ISINs and subsidiaries. Proprietarytechnology then ranks and filters content by relevance using domain-specifictaxonomies based on the UN Sustainable Development Goals. Examples of SDGtaxonomies include the Global Core Indicators (GCI), which resulted from extensivemulti-stakeholder consultations led by the UNCTAD Intergovernmental WorkingGroup of Experts on International Standards of Accounting and Reporting (ISAR);other examples include the United Nations Global Compact 10 Principles, which aredivided in major categories such as human rights, labour rights, environment and anticorruption.The algorithms subsequently analyze the filtered content at a daily level, recording the numberof relevant news items, providing a sentiment score per news item, which thus reflects bothpositive and negative SDG related issues, and also tracks volume and dispersion of sentimentacross news items. This information is then aggregated into daily company specific scores,which are further aggregated in 7 day and 180 day ratings. The raw scores represent aggregatesentiment of the SDG data. The mapping from scores to ratings aggregates data from 7 days ofinformation, uses statistics on the precision of the scores and the volume of the news sources,accommodates sparsity in the data and depends most heavily on recent information. Scoresand ratings are available for each of the 17 SDGs and the system also provides an overall score,measuring the overall SDG footprint of a company. The ratings can be interpreted roughly as “zscores”, varying mostly between -1 and 1, and have a standard deviation of roughly 1.The higher the score, the more positive the text is in relationship to each SDG, and vice versa.Thus, the sign represents positive or negative and the score indicates the degree of positivityand negativity. For example, for the SDG #5 (gender equality) the system would give a betterscore to a company that doubles the number of women on their board of directors from 20% to40% than a large company that announces the hire of two female analysts. For the SDG #13(climate action), a company would get a more negative score after a chemical spill that pollutesan entire ecosystem than a company that increases its carbon emissions by 5%. The scores areadjusted by Sector.Furthermore, the combination of positive and negative SDG scores can be used to better assessnon-financial risks and calculate a 'net' SDG footprint that account for the netting effect ofpositive and negative externalities at both long and short-term frequencies. This enables thealgorithm to better identify both positive and negative trends in companies.Thus, an AI-driven approach can help uncover hidden material risks, substantially reducepositive biases and uncover negative scores resulting from an adverse SDG footprint. This canimprove the investment process and enhance Asset Owner’s engagement strategies by helpinginvestors identify negative issues that might not have been reported by the company in atransparent manner.

NLP & Sentiment Analysis Financial markets are affected by sentiment, and bearish sentiment canmake a down market worse and lessen the impact of positive news Firms which take advantage of sentiment information quickly can gain anedge Sentiment can be discovered in news articles, social media, blogs and othersources across multiple languages and regions Computers analyzing sentiment can work at millisecond speeds and processmore information than human analysts NLP-driven approaches can be applied for both Companies and Countries The use of taxonomies and deep learning enable the decomposition ofSentiment analysis into multiple risk factors which can be tracked separately

SDG Footprint: Relevance of Multi-language Sources The majority of the newsavailable worldwide is not inEnglish, particularly in LatinAmerica, Europe, Africa andAsia. In many cases, there is atime lag between the timethe news is reported in thelocal language and when itis published in mainstreamEnglish-based media The following shows anegative event in El Salvadorwhich was not available inEnglish on the day it wasreleased.

NLP for SDG / Event MonitoringEvent:A US Federal judge dismisses some but not all criminal charges against Fedex Corp. in a casealleging it knowingly shipped illegal prescription drugs.Sample NewsDuring Event PeriodSentiment Scores:Negative OutliersWorld CloudNegative Keywords

Country Risk Monitoring: WEF Taxonomy WEF-based Risk scores canbe used as proxies toidentify emerging risksand trends at the CountryLevel to better assessCountry risks.WEF taxonomy is used togenerate Country-specificNLP-based Risk Signalsacross 5 major categoriesand 30 sub-categories chnological, and Societalrisks– based in geotaggeddatafrom100,000 sources in over than 60languages.Risk scores for eachcategory at the Countrylevel.29

For this purpose, it is important to leverage Big Data and Artificial Intelligence technologies to extract, process and analyze large-scale structured and unstructured data on SDG-related factors, which can then enable the integration of SDG factors into the decision-making of global investors.