Building The Case For The B2B Commerce Cloud – Business Intelligence Info

DanMcCaffreyhasanambitiousgoal:solvingtheworld’sloomingfoodshortage。

AsvicepresidentofdataandanalyticsatTheClimateCorporation(Climate),whichisasubsidiaryofMonsanto,McCaffreyleadsateamofdatascientistsandengineerswhoarebuildinganinformationplatformthatcollectsmassiveamountsofagriculturaldataandappliesmachine-learningtechniquestodiscovernewpatterns。

Theseanalysesarethenusedtohelpfarmersoptimizetheirplanting。

“By2050,theworldisgoingtohavetoomanypeopleatthecurrentrateofgrowth。

Andwithshrinkingamountsoffarmland,wemustfindmoreefficientwaystofeedthem。

Soscienceisneededtohelpsolvethesethings,”McCaffreyexplains。

“That’swhatexcitesme。



“Thedeeperwecangointoprovidingrecommendationsonfarmingpractices,themorevaluewecanofferthefarmer,”McCaffreyadds。

Buttodeliverthatinsight,Climateneedsdata—andlotsofit。

ThatmeansusingremotesensingandothertechniquestomapeveryfieldintheUnitedStatesandthencombiningthatinformationwithclimatedata,soilobservations,andweatherdata。

Climate’sanalystscanthenproduceamassivedatastorethattheycanqueryforinsights。

Meanwhile,precisiontractorsstreamdataintoClimate’sdigitalagricultureplatform,whichfarmerscanthenaccessfromiPadsthrougheasydataflowandvisualizations。

Theygaininsightsthathelpthemoptimizetheirseedingrates,soilhealth,andfertilityapplications。

Theoverallgoalistoincreasecropyields,whichinturnboostsafarmer’smargins。

ClimateisattheforefrontofapushtowardderivingvaluablebusinessinsightfromBigDatathatisn’tjustbig,butvast。

Companiesofalltypes—fromagriculturethroughtransportationandfinancialservicestoretail—aretappingintomassiverepositoriesofdataknownasdatalakes。

Theyhopetodiscovercorrelationsthattheycanexploittoexpandproductofferings,enhanceefficiency,driveprofitability,anddiscovernewbusinessmodelstheyneverknewexisted。

Theinternetdemocratizedaccesstodataandinformationforbillionsofpeoplearoundtheworld。

Ironically,however,accesstodatawithinbusinesseshastraditionallybeenlimitedtoachosenfew—untilnow。

Today’sadvancesinmemory,storage,anddatatoolsmakeitpossibleforcompaniesbothlargeandsmalltocosteffectivelygatherandretainahugeamountofdata,bothstructured(suchasdatainfieldsinaspreadsheetordatabase)andunstructured(suchase-mailsorsocialmediaposts)。

Theycanthenallowanyoneinthebusinesstoaccessthismassivedatalakeandrapidlygatherinsights。

It’snotthatcompaniescouldn’tdothisbefore;theyjustcouldn’tdoitcosteffectivelyandwithoutalengthydevelopmenteffortbytheITdepartment。

Withtoday’smassivedatastores,line-of-businessexecutivescangeneratequeriesthemselvesandquicklychurnoutresults—andtheyareincreasinglydoingsoinrealtime。

Datalakeshavedemocratizedboththeaccesstodataanditsroleinbusinessstrategy。

Indeed,datalakesmovedatafrombeingatacticaltoolforimplementingabusinessstrategytobeingafoundationfordevelopingthatstrategythroughascientific-stylemodelofexperimentalthinking,queries,andcorrelations。

Inthepast,companies’curiositywaslimitedbytheexpenseofstoringdataforthelongterm。

Nowcompaniescankeepdataforaslongasit’sneeded。

Andthatmeanscompaniescancontinuetoaskimportantquestionsastheyarise,enablingthemtofuture-prooftheirstrategies。

PrescriptiveFarming

Climate’sMcCaffreyhasmanyquestionstoansweronbehalfoffarmers。

Climateprovidesseveraltypesofanalyticstofarmersincludingdescriptiveservices,whicharemetricsaboutthefarmanditsoperations,andpredictiveservicesrelatedtoweatherandsoilfertility。

Buteventuallythecompanyhopestoprovideprescriptiveservices,helpingfarmersaddressallthemanydecisionstheymakeeachyeartoachievethebestoutcomeattheendoftheseason。

DatalakeswillprovidetheanswersthatenableClimatetofollowthroughonitsstrategy。

BehindthescenesatClimateisadeep-sciencedatalakethatprovidesinsights,suchaspredictingthefertilityofaplotoflandbycombiningmanydatasetstocreateaccuratemodels。

ThesemodelsallowClimatetogivefarmerscustomizedrecommendationsbasedonhowtheirfarmisperforming。

“Machinelearningreallystartstoworkwhenyouhavethebreadthofdatasetsfromtillagetosoiltoweather,planting,harvest,andpesticidespray,”McCaffreysays。

“Themoredatasetswecanbringin,thebettermachinelearningworks。



Thedeep-scienceinfrastructurealreadyhasterabytesofdatabutispoisedforsignificantgrowthasithandlesafloodofmeasurementsfromfield-basedsensors。

“That’sreallyscalingupnow,andthat’swhat’salsogivingusanadvantageinourabilitytoreallypersonalizeouradvicetofarmersatadeeperlevelbecauseoftheinformationwe’regettingfromsensordata,”McCaffreysays。

“Aswerollthatout,ourscaleisgoingtoincreasebyseveralmagnitudes。



Alsoonthehorizonismorereal-timedataanalytics。

Currently,Climatereceivesreal-timedatafromitsapplicationthatstreamsdatafromthetractor’scab,butmostofitsanalyticsapplicationsarerunnightlyorevenseasonally。

InAugust2016,Climateexpandeditsplatformtothird-partydeveloperssootherinnovatorscanalsocontributedata,suchasdrone-captureddataorimagery,tothedeep-sciencelake。

“Thathelpsusinalotofways,inthatwecangetmoredatatohelpthegrower,”McCaffreysays。

“It’sthemachinelearningthatallowsustofindtheinsightsinallofthedata。

Machinelearningallowsustotakemathematicalshortcutsaslongasyou’vegotenoughdataandenoughbreadthofdata。



PredictiveMaintenance

GrowthisessentialforU。

S。

railroads,whichreinvestasignificantportionoftheirrevenuesinmaintenanceandimprovementstotheirtracksystems,locomotives,railcars,terminals,andtechnology。

Withaneyeongrowingitsbusinesswhilealsokeepingitscostsdown,CSX,atransportationcompanybasedinJacksonville,Florida,isadoptingastrategytomakeitsfreighttrainsmorereliable。

Inthepast,CSXmaintaineditsfleetoflocomotivesthroughregularlyscheduledmaintenanceactivities,whichpreventfailuresinmostlocomotivesastheytransportfreightfromshippertoreceiver。

Toachieveevenhigherreliability,CSXistappingintoadatalaketopowerpredictiveanalyticsapplicationsthatwillimprovemaintenanceactivitiesandpreventmorefailuresfromoccurring。

Beyondimprovingcustomersatisfactionandraisingrevenue,CSX’snewstrategyalsohasmajorcostimplications。

Trainsareexpensiveassets,andit’scriticalforrailroadstodriveuputilization,limitunplanneddowntime,andpreventcatastrophicfailurestokeepthecostsofthoseassetsdown。

That’swhyCSXisputtingallthedatarelatedtotheperformanceandmaintenanceofitslocomotivesintoamassivedatastore。

“Wearethenapplyingpredictiveanalytics—or,morespecifically,machine-learningalgorithms—ontopofthatinformationthatwearecollectingtolookforfailuresignaturesthatcanbeusedtopredictfailuresandprescribemaintenanceactivities,”saysMichaelHendrix,technicaldirectorforanalyticsatCSX。

“We’rereallylookingtobettermanageourfleetandthemaintenanceactivitiesthatgointothatsowecanrunamoreefficientnetworkandutilizeourassetsmoreeffectively。



“Inthepastwewouldhavetobuyaspecialstoragedevicetostorelargequantitiesofdata,andwe’dhavetodeterminecostbenefitstoseeifitwasworthit,”saysDonnaCrutchfield,assistantvicepresidentofinformationarchitectureandstrategyatCSX。

“Sowewereeitherlettingthedatadienaturally,orwewereonlystoringthedatathatwasdeterminedtobethemostimportantatthetime。

Buttoday,withthenewtechnologieslikedatalakes,we’reabletostoreandutilizemoreofthisdata。



CSXcannowcombinemanydifferentdatatypes,suchassensordatafromacrosstherailnetworkandothersystemsthatmeasuremovementofitscars,anditcanlookforcorrelationsacrossinformationthatwasn’tpreviouslyanalyzedtogether。

OneofthelargerdatasetsthatCSXiscapturingcomprisesthefindingsofits“wheelhealthdetectors”acrossthenetwork。

Thesedevicescapturedifferentsignalsaboutthebearingsinthewheels,aswellasthehealthofthewheelsintermsofimpact,sound,andheat。

“Thatvolumeofdataisprettysignificant,andwhatwewouldtypicallydoisjustlookforsignalsthattolduswhetherthewheelwasbadandifweneededtosetthecarasideforrepair。

Wewouldonlykeeptherawdatafor10daysbecauseofthevolumeandthenpurgeeverythingbutthealerts,”Hendrixsays。

Withitsdatalake,CSXcankeepthewheeldataforaslongasitlikes。

“Nowwe’restartingtocapturethatdataonadailybasissowecanstartapplyingmoremachine-learningalgorithmsandpredictivemodelsacrossalargerhistory,”Hendrixsays。

“Byhavingthefulldataset,wecanbetterlookfortrendsandpatternsthatwilltellusifsomethingisgoingtofail。



AnotherkeyingredientinCSX’sdatasetislocomotiveoil。

Byanalyzingoilsamples,CSXisdevelopingbetterpredictionsoflocomotivefailure。

“We’vebeenabletodeterminewhenalocomotivewouldfailandpredictitfarenoughinadvancesowecouldsenditdownformaintenanceandpreventitfromfailingwhileinuse,”Crutchfieldsays。

“Betweenthelocomotives,thetracks,andthefreightcars,wewillbelookingatvariouswaystopredictthosefailuresandpreventthemsowecanimproveourassetallocation。

Thenwewon’tneedasmanyassets,”sheexplains。

“It’slikeanairport。

Ifaplanehasafailureandit’sduetoconnectatanotherairport,allthepassengershavetobereassigned。

Afailureaffectsthesystemlikedominoes。

It’sasimilarcasewitharailroad。

Anyfailurealongtheroadaffectsouroperations。

Fewerfailuresmeanmoreassetutilization。

Themoreoptimizedthenetworkis,thebetterwecanservicethecustomer。



DetectingFraudThroughCorrelations

Traditionally,businessstrategyhasbeenaveryconsciouspractice,presumedtoemanatemainlyfromthemindsofexperiencedexecutives,daringentrepreneurs,orhigh-pricedconsultants。

Butdatalakestakestrategyoutofthatrarefiedrealmandputitintheenvironmentwherejustabouteverythinginbusinessseemstobegoingthesedays:math—specifically,thecorrelationsthatemergefromapplyingamathematicalalgorithmtohugemassesofdata。

TheFinancialIndustryRegulatoryAuthority(FINRA),anonprofitgroupthatregulatesbrokerbehaviorintheUnitedStates,usedtorelyontheexperienceofitsemployeestocomeupwithstrategiesforcombatingfraudandinsidertrading。

Itstilldoesthat,butnowFINRAhasaddedadatalaketofindpatternsthatahumanmightneversee。

Overall,FINRAprocessesoverfivepetabytesoftransactiondatafrommultiplesourceseveryday。

Byswitchingfromtraditionaldatabaseandstoragetechnologytoadatalake,FINRAwasabletosetupaself-serviceprocessthatallowsanalyststoquerydatathemselveswithoutinvolvingtheITdepartment;searchtimesdroppedfromseveralhoursto90seconds。

Whiletraditionaldatabasesweregoodatdefiningrelationshipswithdata,suchastrackingallthetransactionsfromaparticularcustomer,thenewdatalakeconfigurationshelpusersidentifyrelationshipsthattheydidn’tknowexisted。

Leveragingitsdatalake,FINRAcreatesanenvironmentforcuriosity,empoweringitsdataexpertstosearchforsuspiciouspatternsoffraud,marketingmanipulation,andcompliance。

Asaresult,FINRAwasabletohandout373finestotalingUS$134。

4millionin2016,anewrecordfortheagency,accordingtoLaw360。

DataLakesDon’tEndComplexityforIT

Thoughdatalakesmakeaccesstodataandanalysiseasierforthebusiness,theydon’tnecessarilymaketheCIO’slifeabedofroses。

Implementationscanbecomplex,andcompaniesrarelywanttowalkawayfrominvestmentsthey’vealreadymadeindataanalysistechnologies,suchasdatawarehouses。

“Therehavebeensomanymillionsofdollarsgoingtodatawarehousingoverthelasttwodecades。

Theideathatyou’rejustgoingtomoveitallintoadatalakeisn’tgoingtohappen,”saysMikeFerguson,managingdirectorofIntelligentBusinessStrategies,aUKanalystfirm。

“It’sjustnotcompellingenoughofabusinesscase。

”ButFergusondoesseedatalakeefficienciesfreeingupthecapacityofdatawarehousestoenablemorequery,reporting,andanalysis。

Datalakesalsodon’tfreecompaniesfromtheneedtocleanupandmanagedataaspartoftheprocessrequiredtogaintheseusefulinsights。

“Thedatacomesinveryraw,anditneedstobetreated,”saysJamesCurtis,senioranalystfordataplatformsandanalyticsat451Research。

“Ithastobepreppedandcleanedandready。



Companiesmusthavestrongdatagovernanceprocesses,aswell。

Customersareincreasinglyconcernedaboutprivacy,andrulesfordatausageandcompliancehavebecomestricterinsomeareasoftheglobe,suchastheEuropeanUnion。

Companiesmustcreatedatausagepolicies,then,thatclearlydefinewhocanaccess,distribute,change,delete,orotherwisemanipulateallthatdata。

Companiesmustalsomakesurethatthedatatheycollectcomesfromalegitimatesource。

Manycompaniesarerespondingbyhiringchiefdataofficers(CDOs)toensurethatasmoreemployeesgainaccesstodata,theyuseiteffectivelyandresponsibly。

Indeed,researchcompanyGartnerpredictsthat90%oflargecompanieswillhaveaCDOby2019。

Datalakescanbeconfiguredinavarietyofways:centralizedordistributed,withstorageonpremiseorinthecloudorboth。

Somecompanieshavemorethanonedatalakeimplementation。

“Alotofmyclientstrytheirbesttogocentralizedforobviousreasons。

It’smuchsimplertomanageandtogatheryourdatainoneplace,”saysFerguson。

“Butthey’reoftenplaguedsomewheredownthelinewithmuchmoreaddedcomplexityandrealizethatinmanycasesthedatalakehastobedistributedtomanagedataacrossmultipledatastores。



Meanwhile,themassivecapacitiesofdatalakesmeanthatdatathatonceflowedthroughamanageablespigotisnowblastingatcompaniesthroughafirehose。

“We’renowdealingwithdatacomingoutatextremevelocityorinverylargevolumes,”Fergusonsays。

“Theideathatpeoplecanmanuallykeeppacewiththenumberofdatasourcesthatarecomingintotheenterprise—it’sjustnotrealisticanymore。

Wehavetofindwaystotakecomplexityaway,andthattendstomeanthatweshouldautomate。

Theexpectationisthattheinformationmanagementsoftware,likeaninformationcatalogforexample,canhelpacompanyacceleratetheonboardingofdataandautomaticallyclassifyit,profileit,organizeit,andmakeiteasytofind。