Saturday 14 October 2017

Flytting Gjennomsnitt Type Token Forholdet


Tradisjonell trefasade for ditt hjem vil alltid være det foretrukne valget i en hvilken som helst sidesporingsinstallasjon. Men du trenger den rette siding entreprenøren for å hjelpe deg å velge den beste tre sidespor med minst mulig vedlikehold av sidespor. Det er mange forskjellige typer trefasade skaper laget av sub-par tre materialer. Don8217t la din nexthellip Les mer Profesjonell taktekking og takreparasjon til rimelige priser Hvis du har taklekkproblemer, må du ikke kontakte en taktekkingskontraktør. Langvarig taklekkasje virker ufarlig i begynnelsen, men kan forvandle seg til alvorlige problemer raskt. Å ha vedvarende taklekkasje kan forårsake muggvekst, skadet tak, møbler. Ring oss i dag hvis you8217re trenger shellip Les mer Når du prøver å redusere home8217s varmekjølingsregning, er det ikke noe bedre å spare penger enn å ha nye vinyl - eller treutskiftningsvinduer installert. Fremfor alt, med den nye føderale skatterabatten på opptil 1500 der8217er ingen grunn til ikke. Å ha en vakker dekk vil alltid ha en positiv avkastning på eiendommens verdi. Å legge et dekk er en billig måte å utvide ditt boareal på. Den gjennomsnittlige kostnaden for å bygge en dekk er omtrent 7000 og gir en avkastning på ca 15.000 ved salg av hjemmet ditt 8211 Ikke til dårlig, rett Så vær så snill å ha ahellip Les mer Velge riktig sidespredningsentreprenør er avgjørende for enhver sidesporingsinstallasjon. Enten det er å installere vinylsider over sidespor eller helt fjerne din eksisterende sidespor for ny sidespor. Å ha en profesjonell siding entreprenør som kan gi deg den beste løsningen for den jevneste siding installasjon vil spare deg for mye hodepine, tid andhellip Les mer Hva våre kunder sier Veldig fornøyd Jeg ville bare uttrykke hvor fornøyd var med vårt nye tak og sømløse takrenner. Mike og hans arbeidere er veldig hyggelige og godt mannered å være rundt. Jeg kunne ikke tro hvor fort de fullførte taktekking vårt hjem og garasje. De forlot stedet renere enn da de startet og beskyttet våre busker, planter som lovet. Var så fornøyd med takinstallasjonen, at vi vil ha dem tilbake til vinduesskift. Takk Mike Se deres hjem: taktekking Melrose MA mdash Robert Patricia Quinn, Melrose, MA MBM Construction er vurdert 5 5 basert på 3 vurderinger. Å finne den rette entreprenøren bør ikke være smertefullt Å gjøre det rette hjemmeforbedrings - eller ombyggingsprosjektet kan legge til ekte verdi for alle typer hjem, hvis det gjøres riktig og effektivt av en lisensiert og forsikret profesjonell. Bruke materialer av høy kvalitet som er energieffektive, tiltalende og viktigst, pålitelige som lavt vedlikeholdsvindu. vindu utskiftninger. Shingle tak og tilpassede dekk vil legge til ekte verdi. I de fleste tilfeller kan du forvente og umiddelbar avkastning på investeringen etter at du har gjort disse forbedringer i hjemmet. Velg en entreprenør som vil få jobben, gå gjennom hvert trinn i prosjektet fra start til slutt uten noen skjulte tillegg. Våre hjem forbedring tjenester har gitt oss en fordel over andre hjem remodeling selskaper. Å være en av områdene toppleverandører i sidespor, taktekking, utskiftingsvinduer og romtilsetninger. gir oss stor kjøpekraft gjennom våre leverandører og i retur kan passere langs besparelsene til deg Så hvorfor velge oss som din boligbygging entreprenør Vi lytter til dine behov. Vi bruker ikke høytrykkssalg, eller prøver å selge deg noe du ikke trenger eller vil ha. Kommunikasjon er nøkkelen til enhver form for remodeling-prosjekt, og vi vil at prosjektet skal ha best mulig utfall. Du får et detaljert prosjekt beregnet sluttidspunkt - uten hodepine Du vil også få en av de beste garantiene i bedriftsforbedringsbransjen, bør du noen gang trenge å bruke den. Hva slags hjem forbedringstjenester du ser etter Velg en taktekking entreprenør som vil jobbe i din beste interesse og ikke i hvor mye fortjeneste han kan gjøre ved å kutte hjørner. Som takarbeidere tror vi på å bruke de beste takbjelkene og underlaget for å gi våre kunder ro i sinnet. For mer info om taktekkingstjenester, besøk: Commercial Flat Roofing eller Residential taktekking Ikke alle vinyl og tre sidespor er de samme. Velg en profesjonell siding entreprenør som vil hjelpe deg å forstå hva slags vinyl sidespor for å unngå, og hva vil gi deg det beste bang for pengene dine på lang sikt. Fra en standard trykkbehandlet dekk, mahagony dekk, eller en kompositt dekk - vi har fått deg dekket. Les mer om Decks and Porches Deck buildersCrowdsourcing er et veldig populært middel for å skaffe seg de store mengdene merkede data som moderne maskininnlæringsmetoder krever. Selv om det er billig og rask å få tak i folkemerkede etiketter, lider det av betydelige mengder feil, og derved forringer ytelsen til maskinopplæringsoppgaver i nedstrøms. Med målet om å forbedre kvaliteten på de merkede dataene, søker vi å redusere de mange feilene som oppstår på grunn av dumme feil eller utilsiktede feil ved crowdsourcing-arbeidere. Vi foreslår en to-trinns innstilling for crowdsourcing hvor arbeideren først svarer på spørsmålene, og får da endre svarene sine etter å ha sett et (støyende) referansesvar. Vi formulerer matematisk denne prosessen og utvikler mekanismer for å stimulere arbeidstakere til å handle hensiktsmessig. Våre matematiske garantier viser at vår mekanisme stimulerer arbeiderne til å svare ærlig i begge trinn, og avstå fra å svare tilfeldig i første fase eller bare kopiere i det andre. Numeriske eksperimenter avslører en betydelig forbedring i ytelse som en slik 8220self-correction8221 kan gi når man bruker crowdsourcing til å trene maskinlæringsalgoritmer. Det finnes ulike parametriske modeller for å analysere parvise sammenligningsdata, inkludert Bradley-Terry-Luce (BTL) og Thurstone-modellene, men deres avhengighet av sterke parametriske antagelser er begrensende. I dette arbeidet studerer vi en fleksibel modell for parvise sammenligninger, der sannsynlighetene for utfall bare kreves for å tilfredsstille en naturlig form for stokastisk transitivitet. Denne klassen inkluderer parametriske modeller, inkludert BTL og Thurstone modeller som spesielle tilfeller, men er betydelig mer generelt. Vi gir ulike eksempler på modeller i denne bredere stokastisk transitive klassen, for hvilke klassiske parametriske modeller gir dårlig passform. Til tross for denne større fleksibiliteten viser vi at matrisen av sannsynligheter kan estimeres i samme takt som i standard parametriske modeller. På den annen side, i motsetning til i BTL - og Thurstone-modellene, er beregning av den minimalaksimale optimal estimatoren i den stokastisk transitive modellen ikke-trivial, og vi undersøker ulike beregningsmessige traktable alternativer. Vi viser at en enkel enverdigærskelgrenseralgoritme er statistisk konsistent, men ikke oppnår minimaksatsen. Vi foreslår og studerer algoritmer som oppnår minimax-hastigheten over interessante underklasser av den fulle stokastisk transitive klassen. Vi utfyller våre teoretiske resultater med grundige numeriske simuleringer. Vi viser hvordan en hvilken som helst binær parvismodell kan bli opprotet til en helt symmetrisk modell, der originale singleton-potensialer blir transformert til potensialer på kanter til en ekstra variabel, og deretter reetorert til en ny modell på det opprinnelige antall variabler. Den nye modellen er i hovedsak lik den opprinnelige modellen, med samme partisjonsfunksjon og tillater gjenoppretting av de opprinnelige marginalene eller en MAP-kongurasjon, men kan ha svært forskjellige beregningsegenskaper som gir mye mer effektiv innføring. Denne meta-tilnærmingen utdyper vår forståelse, kan brukes på en hvilken som helst eksisterende algoritme for å gi forbedrede metoder i praksis, generaliserer tidligere teoretiske resultater, og avslører en bemerkelsesverdig tolkning av triplet-konsistent polytop. Vi viser hvordan dype læringsmetoder kan brukes i sammenheng med crowdsourcing og unupervised ensemble learning. Først beviser vi at den populære modellen til Dawid og Skene, som antar at alle klassifiseringen er betingelsesmessig uavhengige, er en Restricted Boltzmann Machine (RBM) med en enkelt skjult knute. Derfor, under denne modellen, kan de bakre sannsynlighetene for de sanne etikettene estimeres i stedet via en utdannet RBM. Deretter for å løse det mer generelle tilfellet, hvor klassifiserende brukere sterkt bryter mot betingelsen om uavhengighet, foreslår vi å bruke RBM-basert Deep Neural Net (DNN). Eksperimentelle resultater på ulike simulerte og virkelige datasett viser at vår foreslåtte DNN-tilnærming overgår andre state-of-the-art metoder, spesielt når dataene bryter med betingelsen om uavhengighet. Revisiting Semi-Supervised Læring med Graph Embeddings Zhilin Yang Carnegie Mellon University. William Cohen CMU. Ruslan Salakhudinov U. of Toronto Paper Abstract Vi presenterer en semi-supervised læringsramme basert på graf embeddings. Gitt en kurve mellom forekomster, trener vi en integrering for hver forekomst for å forutsi klassemerket og nabolagets kontekst i grafen. Vi utvikler både transduktive og induktive varianter av vår metode. I den transduktive varianten av metoden bestemmes klassemerkene av både de lærte innlejringene og inngangsfunksjonsvektorene, mens i den induktive varianten er embeddingene definert som en parametrisk funksjon av funksjonsvektorer, slik at spådommer kan gjøres på forekomster ikke sett under treningen. På et stort og mangfoldig sett med benchmarkoppgaver, inkludert tekstklassifisering, fjernt kontrollert enhetens utvinning og enhetsklassifisering, viser vi forbedret ytelse over mange av de eksisterende modellene. Forsterkning læring kan skaffe seg komplekse oppgaver fra høyt nivå spesifikasjoner. Men å definere en kostnadsfunksjon som kan optimaliseres effektivt og koder for den riktige oppgaven er utfordrende i praksis. Vi undersøker hvordan invers optimal kontroll (IOC) kan brukes til å lære atferd fra demonstrasjoner, med applikasjoner til momentstyring av høydimensjonale robotsystemer. Metoden retter seg mot to sentrale utfordringer i invers optimal kontroll: For det første behovet for informative funksjoner og effektiv regularisering for å pålegge strukturen på bekostning, og det andre problemet med å lære kostnadsfunksjonen under ukjent dynamikk for høydimensjonale kontinuerlige systemer. For å løse den tidligere utfordringen presenterer vi en algoritme som er i stand til å lære vilkårlige, ikke-lineære kostnadsfunksjoner, som nevrale nettverk, uten grundig funksjonsteknikk. For å løse sistnevnte utfordring formulerer vi en effektiv prøvebasert tilnærming for MaxEnt IOC. Vi evaluerer metoden vår på en rekke simulerte oppgaver og robuste manipulasjonsproblemer i real-world, noe som viser betydelig forbedring i forhold til tidligere metoder både når det gjelder oppgavekompleksitet og utvalgseffektivitet. Ved å lære latente variable modeller (LVMs), er det viktig å effektivt fange sjeldne mønstre og krympe modellstørrelse uten å ofre modelleringskraft. Ulike studier har blitt gjort for å 8220diversify8221 en LVM, som tar sikte på å lære et mangfoldig sett av latente komponenter i LVMs. De fleste eksisterende studier faller inn i en regulariseringsramme for frekvenistisk stil, hvor komponentene læres via punktestimering. I dette papiret undersøker vi hvordan du kan 8220diversify8221 LVMs i paradigmet av bayesisk læring, som har fordeler som er komplementære til punktestimering, for eksempel lette overfitting via modell gjennomsnittlig og kvantifiserende usikkerhet. Vi foreslår to tilnærminger som har komplementære fordeler. Den ene er å definere diversitetsfremmende gjensidige vinkelprioriteter som tilordner større tetthet til komponenter med større gjensidige vinkler basert på Bayesian-nettverk og von Mises-Fisher-distribusjon og bruker disse priorene til å påvirke den bakre delen via Bayes-regelen. Vi utvikler to effektive omtrentlige bakre inferansalgoritmer basert på variasjonsferdighet og Markov kjede Monte Carlo sampling. Den andre tilnærmingen er å pålegge diversifikasjonsfremmende regularisering direkte over postdatafordelingen av komponenter. Disse to metodene brukes til Bayesian blanding av eksperter modell for å oppmuntre 8220experts8221 til å være mangfoldig og eksperimentelle resultater demonstrere effektiviteten og effektiviteten av våre metoder. Høydimensjonal ikke-parametrisk regresjon er et iboende vanskelig problem med kjente nedre grenser som avhenger eksponentielt i dimensjon. En populær strategi for å lindre denne forbannelsen av dimensjonalitet har vært å bruke additivmodeller av emph, som modellerer regresjonsfunksjonen som en sum av uavhengige funksjoner på hver dimensjon. Selv om det er nyttig å kontrollere variansen i estimatet, er slike modeller ofte for restriktive i praktiske innstillinger. Mellom ikke-tilsetningsmodeller som ofte har store varianser og første rekkefølge additivmodeller som har stor bias, har det vært lite arbeid å utnytte avviket i midten via additivmodeller av mellomordre. I dette arbeidet foreslår vi salsa, som bryter dette gapet ved å tillate samspill mellom variabler, men styrer modellkapasiteten ved å begrense rekkefølgen av interaksjoner. salsas minimerer gjenværende summen av firkanter med kvadratiske RKHS norm straff. Algoritmisk kan den ses som Kernel Ridge Regression med en additivkjerne. Når regresjonsfunksjonen er additiv, er overskytelsesrisikoen bare polynom i dimensjon. Ved å bruke Girard-Newton formlene summerer vi effektivt et kombinatorisk antall vilkår i additivutvidelsen. Via en sammenligning på 15 ekte datasett viser vi at metoden vår er konkurransedyktig mot 21 andre alternativer. Vi foreslår en utvidelse av Hawkes prosesser ved å behandle nivåene av selvutrykk som en stokastisk differensialligning. Vår nye punktprosess gir bedre tilnærming i applikasjonsdomener der hendelser og intensiteter akselererer hverandre med korrelerte nivåer av smitte. Vi generaliserer en nylig algoritme for å simulere trekk fra Hawkes-prosesser hvis nivå av eksitasjon er stokastiske prosesser, og foreslå en hybrid Markov-kjede Monte Carlo-tilnærming for modellmontering. Våre prøvetakingsprosedyrer skaleres lineært med antall nødvendige hendelser og krever ikke stasjonering av punktprosessen. En modulær inngrips-prosedyre som består av en kombinasjon mellom Gibbs og Metropolis Hastings-trinnene, blir fremført. Vi gjenoppretter forventningsmaksimering som et spesielt tilfelle. Vår generelle tilnærming er illustrert for smitte etter geometrisk brunisk bevegelse og eksponentiell Langevin-dynamikk. Rangsamlingssystemer samler ordinære preferanser fra enkeltpersoner for å produsere en global rangering som representerer den sosiale preferansen. For å redusere den beregningsmessige kompleksiteten ved å lære global rangering, er en vanlig praksis å bruke rangering. Individuelle preferanser brytes inn i parvise sammenligninger og påføres deretter effektive algoritmer skreddersydd for uavhengige parvise sammenligninger. På grunn av de ignorerte avhengighetene kan imidlertid naive rang-breaking tilnærminger føre til inkonsekvente estimater. Nøkkelidéen til å produsere objektive og nøyaktige estimater er å behandle de parallelle resultatene ujevnt, avhengig av topologien til de innsamlede dataene. I dette papiret gir vi den optimale rangerende estimatoren, som ikke bare oppnår konsistens, men også oppnår den beste feilbunden. Dette gjør at vi kan karakterisere den grunnleggende avstanden mellom nøyaktighet og kompleksitet i enkelte kanoniske scenarier. Videre identifiserer vi hvordan nøyaktigheten avhenger av spektralgapet i en tilsvarende sammenligningsgraf. Dropout destillasjon Samuel Rota Bul FBK. Lorenzo Porzi FBK. Peter Kontschieder Microsoft Research Cambridge Paper AbstractDropout er en populær stokastisk regulariseringsteknikk for dype nevrale nettverk som fungerer ved tilfeldig tapt (dvs. nullstilling) enheter fra nettverket under trening. Denne randomiseringsprosessen tillater implisitt å trene et ensemble av eksponentielt mange nettverk som deler samme parametrizering, som skal gjennomsnittes på testtid for å levere den endelige prediksjonen. En typisk løsning for denne ugjennomtrengelige gjennomsnittlige operasjonen består i å skalere lagene som går under utfallssvikt. Denne enkle regelen kalt 8216standard dropout8217 er effektiv, men kan forringe presisjonens nøyaktighet. I dette arbeidet introduserer vi en ny tilnærming, som utgjør 8216dropout destillation8217, som gjør det mulig for oss å trene en prediktor på en måte å bedre tilnærming til den ugjennomtrengelige, men foretrukne, gjennomsnittlige prosessen, samtidig som den holder kontrollen sin beregningsmessige effektivitet. Vi er dermed i stand til å konstruere modeller som er like effektive som standard utfall, eller enda mer effektive, samtidig som de er mer nøyaktige. Eksperimenter på standard benchmark datasett viser gyldigheten av metoden vår, noe som gir konsistente forbedringer i forhold til konvensjonell utfall. Metadata-bevisst anonym meldingstjeneste Giulia Fanti UIUC. Peter Kairouz UIUC. Sewoong Oh UIUC. Kannan Ramchandran UC Berkeley. Pramod Viswanath UIUC Paper AbstractAnonymous meldingsplattformer som Whisper og Yik Yak tillater brukere å spre meldinger over et nettverk (for eksempel et sosialt nettverk) uten å avsløre meldingsforfatterskap til andre brukere. Spredningen av meldinger på disse plattformene kan modelleres av en diffusjonsprosess over en graf. Nylige fremskritt i nettverksanalyse har vist at slike diffusjonsprosesser er sårbare for forfatterdeanonymisering av motstandere med tilgang til metadata, for eksempel timinginformasjon. I dette arbeidet spør vi det grunnleggende spørsmålet om hvordan å formidle anonyme meldinger over en graf for å gjøre det vanskelig for motstandere å avlede kilden. Spesielt studerer vi ytelsen til en meldingsformeringsprotokoll kalt adaptiv diffusjon introdusert i (Fanti et al., 2015). Vi beviser at når motstanderen har tilgang til metadata i en brøkdel av korrupte grafnoder, oppnår adaptiv diffusjon asymptotisk optimal kildemulighet og vesentlig overgår standard diffusjon. Vi demonstrerer videre empirisk at adaptiv diffusjon skjuler kilden effektivt på virkelige sosiale nettverk. Teaching Dimension of Linear Learners Ji Liu University of Rochester. Xiaojin Zhu University of Wisconsin. Hrag Ohannessian University of Wisconsin-Madison Paper AbstractTeaching-dimensjonen er en lærteoretisk mengde som spesifiserer minimumsopplæringsstørrelsen for å undervise en målmodell til en elev. Tidligere studier om læringsdimensjonen fokuserte på versjon-space elever som opprettholder alle hypoteser i tråd med treningsdataene, og kan ikke brukes til moderne maskinlærere som velger en bestemt hypotese via optimalisering. Dette papiret presenterer den første kjente undervisningsdimensjonen for ryggenregresjon, støttevektormaskiner og logistisk regresjon. Vi viser også optimale treningssett som passer til disse læringsdimensjonene. Vår tilnærming generaliserer seg til andre lineære elever. Sannferdig Univariate Estimates Ioannis Caragiannis University of Patras. Ariel Procaccia Carnegie Mellon University. Nisarg Shah Carnegie Mellon University Paper Abstract Vi ser på det klassiske problemet med å estimere populasjonsmiddelet av en ukjent endimensjonal distribusjon fra prøver, og tar et teoretisk synspunkt. I vår innstilling leveres prøver av strategiske agenter, som ønsker å trekke estimatet så nært som mulig til egen verdi. I denne innstillingen gir prøvemiddelet opphav til manipulasjonsmuligheter, mens prøven median ikke gjør det. Vårt sentrale spørsmål er om prøven medianen er den beste (i form av gjennomsnittlig kvadratfeil) sannferdig estimator for populasjonsmiddelet. Vi viser at når den underliggende fordelingen er symmetrisk, er det sannferdige estimatorer som dominerer medianen. Vårt hovedresultat er en karakterisering av worst-case optimal sannferdige estimatorer, som tilsynelatende overgår medianen, for muligens asymmetriske distribusjoner med begrenset støtte. Hvorfor Regularized Auto-Encoders lære sparsom representasjon Devansh Arpit SUNY Buffalo. Yingbo Zhou SUNY Buffalo. Hung Ngo SUNY Buffalo. Venu Govindaraju SUNY Buffalo Paper AbstractSparse distribuert representasjon er nøkkelen til å lære nyttige funksjoner i dype læring algoritmer, fordi det ikke bare er en effektiv modus for data representasjon, men også 8212 enda viktigere 8212 det fanger generasjonsprosessen av de mest virkelige verdensdata. Mens en rekke regulerte auto-kodere (AE) håndhever sparsity eksplisitt i deres lærte representasjon og andre, har det vært liten formell analyse på hva som oppmuntrer sparsity i disse modellene generelt. Vårt mål er å formelt studere dette generelle problemet for regulerte auto-kodere. Vi gir tilstrekkelige betingelser for både regularisering og aktiveringsfunksjoner som oppmuntrer sparsity. Vi viser at flere populære modeller (de-noising og kontraktive auto-kodere, for eksempel) og aktiveringer (rektifisert lineær og sigmoid, for eksempel) tilfredsstiller disse betingelsene, slik at våre forhold bidrar til å forklare sparsity i deres lærte representasjon. Dermed våre teoretiske og empiriske analyser kaster sammen lys av egenskapene til regulariseringaktivering som er ledende til sparsity og forener en rekke eksisterende auto-encoder-modeller og aktiveringsfunksjoner under samme analytiske rammeverk. k-variater: flere plusser i k-betyr Richard Nock Nicta 038 ANU. Raphael Canyasse Ecole Polytechnique og The Technion. Roksana Boreli Data61. Frank Nielsen Ecole Polytechnique og Sony CS Labs Inc. Paper Abstractk-midler såing har blitt en de facto-standard for hard clustering-algoritmer. I dette papiret er vårt første bidrag en toveis generalisering av denne seeding, k-variates, som inkluderer prøvetaking av generelle tettheter i stedet for bare et diskret sett med Dirac-tettheter forankret på punktstedene, en generalisering av den velkjente Arthur-Vassilvitskii (AV) tilnærmelsesgaranti, i form av en tekst-tilnærming bundet til teksten optimal. Denne tilnærmingen utviser en redusert avhengighet av 8220noise8221-komponenten med hensyn til det optimale potensial 8212 som faktisk nærmer seg den statistiske nedre bundet. Vi viser at k-varierer tekstit til effektive (forspent seeding) clustering algoritmer skreddersydd for bestemte rammer som inkluderer distribuert, streaming og on-line clustering, med tekst-tilnærming resultater for disse algoritmer. Til slutt presenterer vi en ny applikasjon av k-variater til differensielt personvern. For enten de spesifikke rammene som vurderes her, eller for differensielle personverninnstillinger, er det lite eller ingen tidligere resultater på den direkte anvendelse av k-midler, og dens tilnærmingsgrenser 8212 toppmoderne konkurrenter synes å være betydelig mer komplekse og vise mindre gunstige (tilnærming) egenskaper. Vi understreker at våre algoritmer fortsatt kan kjøres i tilfeller der det finnes tekstit lukket formløsning for befolkningsminimiseringen. Vi demonstrerer bruken av vår analyse via eksperimentell evaluering på flere domener og innstillinger, og viser konkurransedyktige forestillinger mot toppmoderne. Multi-Player Bandits 8212 En Musical Chairs Approach Jonathan Rosenski Weizmann Institutt for naturvitenskap. Ohad Shamir Weizmann Institutt for naturvitenskap. Liran Szlak Weizmann Institutt for Science Paper AbstractVi vurderer en variant av det stokastiske multi-armed banditproblemet, hvor flere spillere samtidig velger fra samme sett med armer og kan kollidere uten å motta belønning. Denne innstillingen er motivert av problemer som oppstår i kognitive radionettverk, og er spesielt utfordrende under den realistiske antagelsen om at kommunikasjon mellom spillere er begrenset. Vi tilbyr en kommunikasjonsfri algoritme (Musical Chairs) som opprettholder konstant anger med stor sannsynlighet, samt en sublinear-regret, kommunikasjonsfri algoritme (Dynamic Musical Chairs) for vanskeligere innstilling av spillere som går dynamisk inn og går gjennom hele spillet . Videre krever begge algoritmer ikke forkunnskaper om antall spillere. Så langt vi vet, er disse de første kommunikasjonsfrie algoritmene med disse formelle garantier. Informasjonssikten Greg Ver Steeg Informasjonsvitenskapsinstitutt. Aram Galstyan Informasjonsvitenskapsinstitutt Paper Abstract Vi introduserer et nytt rammeverk for uobservert læring av representasjoner basert på en roman hierarkisk dekomponering av informasjon. Intuitivt sendes data gjennom en rekke progressivt finkornede siver. Hvert lag av sikten gjenoppretter en enkelt latent faktor som er maksimalt informativ om multivariat avhengighet i dataene. Dataene blir transformert etter hvert pass, slik at den gjenværende uforklarlige informasjonen trickles ned til neste lag. Til slutt er vi igjen med et sett med latente faktorer som forklarer all avhengighet i de opprinnelige dataene og gjenværende informasjon som består av uavhengig støy. Vi presenterer en praktisk implementering av dette rammene for diskrete variabler og bruker den til en rekke grunnleggende oppgaver i ikke-overvåket læring, inkludert uavhengig komponentanalyse, lossy og lossless komprimering, og forutsi manglende verdier i data. Deep Speech 2. End-to-End Speech Recognition på engelsk og Mandarin Dario Amodei. Rishita Anubhai. Eric Battenberg. Carl Case. Jared Casper. Bryan Catanzaro. JingDong Chen. Mike Chrzanowski Baidu USA, Inc. Adam Coates. Greg Diamos Baidu USA, Inc. Erich Elsen Baidu USA, Inc. Jesse Engel. Linxi Fan. Christopher Fougner. Awni Hannun Baidu USA, Inc. Billy Jun. Tony Han. Patrick LeGresley. Xiangang Li Baidu. Libby Lin. Sharan Narang. Andrew Ng. Sherjil Ozair. Ryan Prenger. Sheng Qian Baidu. Jonathan Raiman. Sanjeev Satheesh Baidu SVAIL. David Seetapun. Shubho Sengupta. Chong Wang. Yi Wang. Zhiqian Wang. Bo Xiao. Yan Xie Baidu. Dani Yogatama. Jun Zhan. Zhenyao Zhu Paper AbstractWe viser at en end-to-end dyp læring tilnærming kan brukes til å gjenkjenne enten engelsk eller mandarin kinesisk speechtwo stort forskjellige språk. Fordi den erstatter hele rørledninger av håndkonstruerte komponenter med nevrale nettverk, gir end-to-end læring oss mulighet til å håndtere et mangfoldig utvalg av tale, inkludert støyende miljøer, aksenter og forskjellige språk. Nøkkelen til vår tilnærming er vår anvendelse av HPC-teknikker, som muliggjør eksperimenter som tidligere tok uker for å løpe i dager. Dette gjør at vi kan iterere raskere for å identifisere overlegne arkitekturer og algoritmer. Som et resultat, er vårt system i flere tilfeller konkurransedyktig med transkripsjon av menneskelige arbeidstakere når benchmarked på standard datasett. Til slutt, ved hjelp av en teknikk som kalles Batch Dispatch med GPUer i datasenteret, viser vi at systemet vårt kan bli billig brukt i en nettverksinnstilling, og gir lav ventetid når du betjener brukere i målestokk. Et viktig spørsmål i funksjonsvalg er om en utvalgsstrategi gjenoppretter 8220true8221 sett med funksjoner, gitt nok data. Vi studerer dette spørsmålet i sammenheng med den populære utvalgtstrategien for Least Absolute Shrinkage and Selection Operator (Lasso). Spesielt vurderer vi scenariet når modellen er feil spesifisert slik at den lærte modellen er lineær mens det underliggende virkelige målet er ikke-lineært. Overraskende beviser vi at under visse forhold kan Lasso fortsatt gjenopprette de riktige funksjonene i dette tilfellet. Vi utfører også numeriske studier for å empirisk verifisere de teoretiske resultatene og undersøke nødvendigheten av betingelsene som beviset inneholder. Vi foreslår minimal angret søk (MRS), en ny oppkjøpsfunksjon for Bayesian optimalisering. MRS har likheter med informasjonsteoretiske tilnærminger som entropi-søk (ES). Imidlertid, mens ES har som mål å maksimere informasjonsøkningen med hensyn til det globale maksimumet, har MRS som mål å minimere den forventede enkle angrepet av den ultimate anbefalingen til det optimale. Mens empirisk ES og MRS utfører lignende i de fleste tilfeller, produserer MRS færre avvikere med høy enkel angre enn ES. Vi gir empiriske resultater både for et syntetisk enkeltoppgaveoptimaliseringsproblem, samt for et simulert kontrollproblem med flere oppgaver. CryptoNets: Bruke Neural Networks til krypterte data med høy gjennomstrømning og nøyaktighet Ran Gilad-Bachrach Microsoft Research. Nathan Dowlin Princeton. Kim Laine Microsoft Research. Kristin Lauter Microsoft Research. Michael Naehrig Microsoft Research. John Wernsing Microsoft Research Paper AbstractApplying maskinlæring til et problem som innebærer medisinsk, økonomisk eller annen type sensitiv data, krever ikke bare nøyaktige spådommer, men også forsiktig oppmerksomhet for å opprettholde datasikkerhet og sikkerhet. Juridiske og etiske krav kan forhindre bruk av skybaserte maskinlæringsløsninger for slike oppgaver. I dette arbeidet presenterer vi en metode for å konvertere lærte nevrale nettverk til CryptoNets, neurale nettverk som kan brukes til krypterte data. Dette tillater en dataeier å sende sine data i kryptert form til en skygtjeneste som er vert for nettverket. Krypteringen sikrer at dataene forblir konfidensielle siden skyen ikke har tilgang til nøklene som trengs for å dekryptere den. Likevel vil vi vise at skyttjenesten er i stand til å bruke det neurale nettverket til de krypterte dataene for å lage krypterte spådommer, og returnere dem også i kryptert form. Disse krypterte spådommene kan sendes tilbake til eieren av den hemmelige nøkkelen som kan dekryptere dem. Derfor får ikke skythjelpen informasjon om rådata eller om forutsigelsen den har gjort. Vi demonstrerer CryptoNets på MNIST optiske tegngjenkjenningsoppgaver. CryptoNets oppnår 99 nøyaktighet og kan lage rundt 59000 spådommer per time på en enkelt PC. Derfor tillater de høy gjennomstrømning, nøyaktige og private spådommer. Spektrale metoder for dimensjonal reduksjon og clustering krever løsning av et eget problem definert av en sparsom affinitetsmatrise. Når denne matrisen er stor, søker en omtrentlig løsning. Den vanlige måten å gjøre dette på er Nystrom-metoden, som først løser et lite egetproblem, vurderer bare en delmengde av landemerkepunkter, og bruker deretter en utelukkende formel for å ekstrapolere løsningen til hele datasettet. Vi viser at ved å begrense det opprinnelige problemet for å tilfredsstille Nystrom-formelen, oppnår vi en tilnærming som er beregningsmessig enkel og effektiv, men oppnår en lavere tilnærmelsesfeil ved å bruke færre landemerker og mindre kjøretid. Vi studerer også rollen som normalisering i beregningskostnaden og kvaliteten på den resulterende løsningen. Som en mye brukt ikke-lineær aktivering, separerer Rectified Linear Unit (ReLU) støy og signal i et funksjonsoversikt ved å lære en terskel eller forspenning. Vi argumenterer imidlertid for at klassifiseringen av støy og signal ikke bare avhenger av størrelsen på svarene, men også konteksten av hvordan funksjonene vil brukes til å oppdage mer abstrakte mønstre i høyere lag. For å kunne utføre flere svarkart med størrelsesorden i forskjellige områder for et bestemt visuelt mønster, må eksisterende nettverk som bruker ReLU og dets varianter lære et stort antall redundante filtre. I dette papiret foreslår vi et multi-bias ikke-lineært aktiveringslag (MBA) for å utforske informasjonen skjult i størrelsene av svarene. Den er plassert etter at konvolusjonslaget er koblet til svarene til en konvolusjonskjerne i flere kart ved multitrykkende størrelser, og dermed generere flere mønstre i funksjonsområdet til en lav beregningskostnad. Det gir stor fleksibilitet for å velge svar på forskjellige visuelle mønstre i forskjellige størrelsesområder for å danne rike representasjoner i høyere lag. En slik enkel og likevel effektiv ordning oppnår toppmoderne ytelse på flere benchmarks. Vi foreslår en ny, multi-oppgave-læringsmetode som kan minimere effekten av negativ overføring ved å tillate asymmetrisk overføring mellom oppgavene basert på oppgaveløshet, samt antall individuelle oppgavetap, som vi refererer til som asymmetrisk multi-opplæring (AMTL ). For å takle dette problemet kobler vi flere oppgaver via en sparsom, rettet reguleringsgrafikk, som styrker hver oppgaveparameter for å rekonstrueres som en sparsom kombinasjon av andre oppgaver, som velges ut fra det oppgavebaserte tapet. Vi presenterer to forskjellige algoritmer for å løse denne felles læring av oppgavens spådommer og reguleringsgrafen. Den første algoritmen løser det opprinnelige læringsmålet ved hjelp av alternativ optimalisering, og den andre algoritmen løser en tilnærming av den ved hjelp av læreplaneringsstrategi, som lærer en oppgave om gangen. Vi utfører eksperimenter på flere datasett for klassifisering og regresjon, der vi oppnår betydelige forbedringer i ytelse over enkeltopplærings-læring og symmetriske multitask læring baselinjer. Dette papiret illustrerer en ny tilnærming til estimeringen av generaliseringsfeil av beslutningstrener. Vi satte ut studiet av beslutningstreet feil i sammenheng med konsistensanalyse teori, som viste at Bayes feilen kun kan oppnås hvis når antall data prøver som kastes inn i hver bladnode går til uendelig. For det mer utfordrende og praktiske tilfellet der prøvestørrelsen er endelig eller liten, innføres en ny prøvefeil i dette papiret for å klare det lille prøveproblemet effektivt og effektivt. Omfattende eksperimentelle resultater viser at det foreslåtte feilestimatet er overlegent med de velkjente K-fold-kryssvalideringsmetodene når det gjelder robusthet og nøyaktighet. Videre er det størrelsesordener som er mer effektive enn kryssvalideringsmetoder. Vi studerer konvergensegenskapene til VR-PCA algoritmen introdusert av cite for rask beregning av ledende singulære vektorer. Vi viser flere nye resultater, inkludert en formell analyse av en blokkversjon av algoritmen, og konvergens fra tilfeldig initialisering. Vi gjør også noen få observasjoner av uavhengig interesse, for eksempel hvordan preinitialisering med bare en enkelt eksakt kraft iterasjon kan forbedre analysen betydelig, og hva er konveksiteten og ikke-konveksitetsegenskapene til det underliggende optimaliseringsproblemet. Vi vurderer problemet med hovedkomponentanalyse (PCA) i en streaming-stokastisk innstilling, der målet vårt er å finne en retning av omtrentlig maksimal varians, basert på en strøm av i. i.d. datapunkter i realsd. En enkel og beregningsmessig billig algoritme for dette er stokastisk gradient nedstigning (SGD), som trinnvis oppdaterer sitt estimat basert på hvert nytt datapunkt. På grunn av problemets ikke-konvekse karakter har det imidlertid vært en utfordring å analysere ytelsen. I særdeleshet er eksisterende garantier avhengig av en ikke-triviell egenkapitalforutsetning på kovariansmatrisen, noe som er intuitivt unødvendig. I dette papiret gir vi (til vår beste kunnskap) de første eigengapfrie konvergensgarantiene for SGD i sammenheng med PCA. This also partially resolves an open problem posed in cite . Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap. Dealbreaker: A Nonlinear Latent Variable Model for Educational Data Andrew Lan Rice University . Tom Goldstein University of Maryland . Richard Baraniuk Rice University . Christoph Studer Cornell University Paper AbstractStatistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a students success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretablethey provide key insights into which concepts are critical (i. e. the dealbreaker) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model. We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein8217s identity and the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly. Variable Elimination in the Fourier Domain Yexiang Xue Cornell University . Stefano Ermon . Ronan Le Bras Cornell University . Carla . Bart Paper AbstractThe ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements. Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability 8212 small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task. Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio pq. Two popular discriminative approaches to DRE are KL importance estimation (KLIEP), and least squares importance fitting (LSIF). In this paper, we show that KLIEP and LSIF both employ class-probability estimation (CPE) losses. Motivated by this, we formally relate DRE and CPE, and demonstrate the viability of using existing losses from one problem for the other. For the DRE problem, we show that essentially any CPE loss (eg logistic, exponential) can be used, as this equivalently minimises a Bregman divergence to the true density ratio. We show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. For the CPE problem, we argue that the LSIF loss is useful in the regime where one wishes to rank instances with maximal accuracy at the head of the ranking. In the course of our analysis, we establish a Bregman divergence identity that may be of independent interest. We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD) but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings. Hierarchical Variational Models Rajesh Ranganath . Dustin Tran Columbia University . Blei David Columbia Paper AbstractBlack box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior. The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance relative to a hierarchical pairwise CRF. Binary embeddings with structured hashed projections Anna Choromanska Courant Institute, NYU . Krzysztof Choromanski Google Research NYC . Mariusz Bojarski NVIDIA . Tony Jebara Columbia . Sanjiv Kumar . Yann Paper AbstractWe consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudorandom projection is described by a matrix, where not all entries are independent random variables but instead a fixed budget of randomness is distributed across the matrix. Such matrices can be efficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i. e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. In particular, they generalize previous extensions of the Johnson - Lindenstrauss lemma and prove the plausibility of the approach that was so far only heuristically confirmed for some special structured matrices. Consequently, we show that many structured matrices can be used as an efficient information compression mechanism. Our findings build a better understanding of certain deep architectures, which contain randomly weighted and untrained layers, and yet achieve high performance on different learning tasks. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier. A Variational Analysis of Stochastic Gradient Algorithms Stephan Mandt Columbia University . Matthew Hoffman Adobe Research . Blei David Columbia Paper AbstractStochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models. This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e. g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a textit gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques. Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications. This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets. Online Stochastic Linear Optimization under One-bit Feedback Lijun Zhang Nanjing University . Tianbao Yang University of Iowa . Rong Jin Alibaba Group . Yichi Xiao Nanjing University . Zhi-hua Zhou Paper AbstractIn this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world applications. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(dsqrt ), which matches the optimal result of stochastic linear bandits. We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints, which are constraints that need to be satisfied when accumulated over a finite number of rounds T, but can be violated in intermediate rounds. For some user-defined trade-off parameter beta in (0, 1), the proposed algorithm achieves cumulative regret bounds of O(Tmax ) and O(T ), respectively for the loss and the constraint violations. Our results hold for convex losses, can handle arbitrary convex constraints and rely on a single computationally efficient algorithm. Our contributions improve over the best known cumulative regret bounds of Mahdavi et al. (2012), which are respectively O(T12) and O(T34) for general convex domains, and respectively O(T23) and O(T23) when the domain is further restricted to be a polyhedral set. We supplement the analysis with experiments validating the performance of our algorithm in practice. Motivated by an application of eliciting users8217 preferences, we investigate the problem of learning hemimetrics, i. e. pairwise distances among a set of n items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires Theta(n2) queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into K tight clusters, the sample complexity of our algorithm reduces to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis. We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using Q-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by Q-learning. Learning Physical Intuition of Block Towers by Example Adam Lerer Facebook AI Research . Sam Gross Facebook AI Research . Rob Fergus Facebook AI Research Paper AbstractWooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the trajectories of the blocks. The models are also able to generalize in two important ways: (i) to new physical scenarios, e. g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects. Structure Learning of Partitioned Markov Networks Song Liu The Inst. of Stats. Matte. . Taiji Suzuki . Masashi Sugiyama University of Tokyo . Kenji Fukumizu The Institute of Statistical Mathematics Paper AbstractWe learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the emph whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the emph factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNAtime-series alignments are also reported. This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i. e. the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant8217s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches that is achieved with full information. Beyond CCA: Moment Matching for Multi-View Models Anastasia Podosinnikova INRIA 8211 ENS . Francis Bach Inria . Simon Lacoste-Julien INRIA Paper AbstractWe introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets. We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap. Unsupervised Deep Embedding for Clustering Analysis Junyuan Xie University of Washington . Ross Girshick Facebook . Ali Farhadi University of Washington Paper AbstractClustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods. Dimensionality reduction is a popular approach for dealing with high dimensional data that leads to substantial computational savings. Random projections are a simple and effective method for universal dimensionality reduction with rigorous theoretical guarantees. In this paper, we theoretically study the problem of differentially private empirical risk minimization in the projected subspace (compressed domain). Empirical risk minimization (ERM) is a fundamental technique in statistical machine learning that forms the basis for various learning algorithms. Starting from the results of Chaudhuri et al. (NIPS 2009, JMLR 2011), there is a long line of work in designing differentially private algorithms for empirical risk minimization problems that operate in the original data space. We ask: is it possible to design differentially private algorithms with small excess risk given access to only projected data In this paper, we answer this question in affirmative, by showing that for the class of generalized linear functions, we can obtain excess risk bounds of O(w(Theta) n ) under eps-differential privacy, and O((w(Theta)n) ) under (eps, delta)-differential privacy, given only the projected data and the projection matrix. Here n is the sample size and w(Theta) is the Gaussian width of the parameter space that we optimize over. Our strategy is based on adding noise for privacy in the projected subspace and then lifting the solution to original space by using high-dimensional estimation techniques. A simple consequence of these results is that, for a large class of ERM problems, in the traditional setting (i. e. with access to the original data), under eps-differential privacy, we improve the worst-case risk bounds of Bassily et al. (FOCS 2014). We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e. g. when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets. Large-Margin Softmax Loss for Convolutional Neural Networks Weiyang Liu Peking University . Yandong Wen South China University of Technology . Zhiding Yu Carnegie Mellon University . Meng Yang Shenzhen University Paper AbstractCross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks. A Random Matrix Approach to Echo-State Neural Networks Romain Couillet CentraleSupelec . Gilles Wainrib ENS Ulm, Paris, France . Hafiz Tiomoko Ali CentraleSupelec, Gif-sur-Yvette, France . Harry Sevi ENS Lyon, Lyon, Paris Paper AbstractRecurrent neural networks, especially in their linear version, have provided many qualitative insights on their performance under different configurations. This article provides, through a novel random matrix framework, the quantitative counterpart of these performance results, specifically in the case of echo-state networks. Beyond mere insights, our approach conveys a deeper understanding on the core mechanism under play for both training and testing. One-hot CNN (convolutional neural network) has been shown to be effective for text categorization (Johnson 038 Zhang, 2015). We view it as a special case of a general framework which jointly trains a linear model with a non-linear feature generator consisting of text region embedding pooling8217. Under this framework, we explore a more sophisticated region embedding method using Long Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly large) sizes, whereas the region size needs to be fixed in a CNN. We seek effective and efficient use of LSTM for this purpose in the supervised and semi-supervised settings. The best results were obtained by combining region embeddings in the form of LSTM and convolution layers trained on unlabeled data. The results indicate that on this task, embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation. We report performances exceeding the previous best results on four benchmark datasets. Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from noisy crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap under a simple but canonical scenario where each worker is assigned at most two tasks. In particular, we introduce a tighter lower bound on the fundamental limit and prove that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly la - bel a larger fraction of the tasks. In the general setting, when more than two tasks are assigned to each worker, we establish the dominance result on BP that it outperforms other existing algorithms with known provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while existing state-of-the-art algorithms exhibit suboptimal performances. Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step in this direction, we provide a stability analysis tool for controllers acting on dynamics represented by Gaussian processes (GPs). We consider arbitrary Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For the first case, our tool finds a state space region, where the closed-loop system is provably stable. In the second case, it is well known that infinite horizon stability guarantees cannot exist. Instead, our tool analyzes finite time stability. Empirical evaluations on simulated benchmark problems support our theoretical results. Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any partys private data We propose to transfer the knowledge of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(epsilon M ). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection. Network Morphism Tao Wei University at Buffalo . Changhu Wang Microsoft Research . Yong Rui Microsoft Research . Chang Wen Chen Paper AbstractWe present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme. Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting. Budget constrained optimal design of experiments is a classical problem in statistics. Although the optimal design literature is very mature, few efficient strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning and statistics. In this work, we study experimental design for the setting where the underlying regression model is characterized by a ell1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem and also hold for a more general class of sparse linear models. We perform an extensive set of experiments, on benchmarks and a large multi-site neuroscience study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the short-to-medium term future. Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs Anton Osokin . Jean-Baptiste Alayrac ENS . Isabella Lukasewitz INRIA . Puneet Dokania INRIA and Ecole Centrale Paris . Simon Lacoste-Julien INRIA Paper AbstractIn this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets. Exact Exponent in Optimal Rates for Crowdsourcing Chao Gao Yale University . Yu Lu Yale University . Dengyong Zhou Microsoft Research Paper AbstractCrowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(pi), where m is the number of workers and I(pi) is the average Chernoff information that characterizes the workers8217 collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m ge frac logfrac in order to achieve an epsilon misclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters. Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed 8220what-where8221 autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin. (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n2) to O(pd), with p being the ambient dimension and d being some estimated rank (d 20 reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78 error-rate on CIFAR-10 benchmark. Provable Algorithms for Inference in Topic Models Sanjeev Arora Princeton University . Rong Ge . Frederic Koehler Princeton University . Tengyu Ma Princeton University . Ankur Moitra Paper AbstractRecently, there has been considerable progress on designing algorithms with provable guarantees 8212typically using linear algebraic methods8212for parameter learning in latent variable models. Designing provable algorithms for inference has proved more difficult. Here we tak e a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling. This paper develops an approach for efficiently solving general convex optimization problems specified as disciplined convex programs (DCP), a common general-purpose modeling framework. Specifically we develop an algorithm based upon fast epigraph projections, projections onto the epigraph of a convex function, an approach closely linked to proximal operator methods. We show that by using these operators, we can solve any disciplined convex program without transforming the problem to a standard cone form, as is done by current DCP libraries. We then develop a large library of efficient epigraph projection operators, mirroring and extending work on fast proximal algorithms, for many common convex functions. Finally, we evaluate the performance of the algorithm, and show it often achieves order of magnitude speedups over existing general-purpose optimization solvers. We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function f, we want to recover f up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that 8211 while not being minimax optimal 8211 achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude. Energetic Natural Gradient Descent Philip Thomas CMU . Bruno Castro da Silva . Christoph Dann Carnegie Mellon University . Emma Paper AbstractWe propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient. Partition Functions from Rao-Blackwellized Tempered Sampling David Carlson Columbia University . Patrick Stinson Columbia University . Ari Pakman Columbia University . Liam Paper AbstractPartition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM) moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost. In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any kgeq 2, the mixture of k Plackett-Luce models for no more than 2k-1 alternatives is non-identifiable and this bound is tight for k2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is if kleqlfloorfrac 2rfloor. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm by Gormley 038 Murphy (2008), while achieving competitive statistical efficiency. The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments. Power of Ordered Hypothesis Testing Lihua Lei Lihua . William Fithian UC Berkeley, Department of Statistics Paper AbstractOrdered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li 038 Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber 038 Candes (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different regimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storeys improvement on the Benjamini-Hochberg proce - dure. We compare these methods using the GEO-Query data set analyzed by (Li 038 Barber, 2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings. PHOG: Probabilistic Model for Code Pavol Bielik ETH Zurich . Veselin Raychev ETH Zurich . Martin Vechev ETH Zurich Paper AbstractWe introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code. We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems. Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach for tuning regularization hyperparameters and find that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. The additional computational cost depends on how frequently the hyperparameters are trained, but the tested scheme adds only 30 computational overhead regardless of the model size. Since the method is significantly less computationally demanding compared to similar gradient-based approaches to hyperparameter optimization, and consistently finds good hyperparameter values, it can be a useful tool for training neural network models. Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics. Due to its numerous applications, rank aggregation has become a problem of major interest across many fields of the computer science literature. In the vast majority of situations, Kemeny consensus(es) are considered as the ideal solutions. It is however well known that their computation is NP-hard. Many contributions have thus established various results to apprehend this complexity. In this paper we introduce a practical method to predict, for a ranking and a dataset, how close the Kemeny consensus(es) are to this ranking. A major strength of this method is its generality: it does not require any assumption on the dataset nor the ranking. Furthermore, it relies on a new geometric interpretation of Kemeny aggregation that, we believe, could lead to many other results. Horizontally Scalable Submodular Maximization Mario Lucic ETH Zurich . Olivier Bachem ETH Zurich . Morteza Zadimoghaddam Google Research . Andreas Krause Paper AbstractA variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity 8211 number of instances that can fit in memory 8211 must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution. Group Equivariant Convolutional Networks Taco Cohen University of Amsterdam . Max Welling University of Amsterdam CIFAR Paper AbstractWe introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST. The partition function is fundamental for probabilistic graphical models8212it is required for inference, parameter estimation, and model selection. Evaluating this function corresponds to discrete integration, namely a weighted sum over an exponentially large set. This task quickly becomes intractable as the dimensionality of the problem increases. We propose an approximation scheme that, for any discrete graphical model whose parameter vector has bounded norm, estimates the partition function with arbitrarily small error. Our algorithm relies on a near minimax optimal polynomial approximation to the potential function and a Clenshaw-Curtis style quadrature. Furthermore, we show that this algorithm can be randomized to split the computation into a high-complexity part and a low-complexity part, where the latter may be carried out on small computational devices. Experiments confirm that the new randomized algorithm is highly accurate if the parameter norm is small, and is otherwise comparable to methods with unbounded error. Correcting Forecasts with Multifactor Neural Attention Matthew Riemer IBM . Aditya Vempaty IBM . Flavio Calmon IBM . Fenno Heath IBM . Richard Hull IBM . Elham Khabiri IBM Paper AbstractAutomatic forecasting of time series data is a challenging problem in many industries. Current forecast models adopted by businesses do not provide adequate means for including data representing external factors that may have a significant impact on the time series, such as weather, national events, local events, social media trends, promotions, etc. This paper introduces a novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work. We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America8217s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9 relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model. Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, 8220Would this patient have lower blood sugar had she received a different medication8221. We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art. Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets US stock data, US house price index data and currency exchange rate data. We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings. Slice Sampling on Hamiltonian Trajectories Benjamin Bloem-Reddy Columbia University . John Cunningham Columbia University Paper AbstractHamiltonian Monte Carlo and slice sampling are amongst the most widely used and studied classes of Markov Chain Monte Carlo samplers. We connect these two methods and present Hamiltonian slice sampling, which allows slice sampling to be carried out along Hamiltonian trajectories, or transformations thereof. Hamiltonian slice sampling clarifies a class of model priors that induce closed-form slice samplers. More pragmatically, inheriting properties of slice samplers, it offers advantages over Hamiltonian Monte Carlo, in that it has fewer tunable hyperparameters and does not require gradient information. We demonstrate the utility of Hamiltonian slice sampling out of the box on problems ranging from Gaussian process regression to Pitman-Yor based mixture models. Noisy Activation Functions Caglar Glehre . Marcin Moczulski . Misha Denil . Yoshua Bengio U. of Montreal Paper AbstractCommon nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of this. We propose to exploit the injection of appropriate noise so that the gradients may flow easily, even if the noiseless application of the activation function would yield zero gradients. Large noise will dominate the noise-free gradient and allow stochastic gradient descent to explore more. By adding noise only to the problematic parts of the activation function, we allow the optimization procedure to explore the boundary between the degenerate saturating) and the well-behaved parts of the activation function. We also establish connections to simulated annealing, when the amount of noise is annealed down, making it easier to optimize hard objective functions. We find experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difficult, e. g. when curriculum learning is necessary to obtain good results. PD-Sparse. A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification Ian En-Hsu Yen University of Texas at Austin . Xiangru Huang UTaustin . Pradeep Ravikumar UT Austin . Kai Zhong ICES department, University of Texas at Austin . Inderjit Paper AbstractWe consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time. As more people become interested in Lean ideas and their application to knowledge work and project management, it8217s helpful to find ways that make it easier to get started or learn a few basic concepts that can lead to deeper insights later. For those that are curious about kanban in an office context, it8217s not unusual to find people who are either currently using Scrum, or have some understanding of Scrum as representative of Agile thinking. One way or another, Scrum users are an important constituent of the Kanban audience. Since Scrum can be described as a statement in the language we use to describe kanban systems, it is also fairly easy to elaborate on that case in order to describe ScrumKanban hybrids. This can be useful for existing Scrum teams who are looking to improve their scale or capability. It can also be useful for more cautious new users who find comfort in an 8220established8221 method 1 . The idea of using a simple task board with index cards or sticky notes is as old as Agile itself. A simple variation of this is a task board with a simple Pending - In Process - Complete workflow. The cards represent work items that are in the current scope of work. Names can be associated with the cards to indicate who8217s working on what. Agile teams have been using this sort of method for a long time, and a few people pointed out early on that this had some resemblance to the notion of kanban in lean systems. Of course, a variety of electronic tools exist that perform these functions, but the simple task board represents a couple of lean principles that I find very valuable, simple technology and visual control . The utility of such a simple method of workflow management is that it is easy to manage, and more importantly, it is easy to change . Huddling around a computer monitor, even a very large one, is in no way a substitute for the tactile and social interactivity that accompanies manipulating a large task board. Maybe someday it will. Not today. What electronic tools are good for are managing lists of things, like backlogs and bugs, and producing reports. Simple tools can be a difficult concept to explain to technology fanatics, but then, so can value . A problem with the basic index-card task board is that there is nothing to prevent you from accumulating a big pile of work in process. Time-boxing, by its nature, sets a bound on how much WIP that can be, but it can still allow much more than would be desirable. If a kanban is a token that represents a work request, and our task board can still get out of control, then what is the problem here The problem is that a kanban is more than just a work request on a card, and putting sticky notes on a whiteboard is not enough to implement a pull system. A kanban is more than an index card In a modern economy, the production and distribution of scarce goods and services are regulated by a system of money and prices. Money can be represented by currency notes, which have little intrinsic value, but that by agreement, can be exchanged for real goods and services. The existence of a neutral medium of exchange makes possible a system of economic calculation of the relative scarcity of the supply of goods in an economy. Such a system of prices is a market. Markets communicate the value of economic production and distribution to their participants. If a currency note can be exchanged for an object of real value, then there must be some way to enforce the scarcity of the notes in a way that corresponds to the scarcity of real value in the economy. In practice, some kind of institution must enforce this scarcity. The health of a market economy depends greatly on the ability of its monetary institution to coordinate the supply of money with the supply of goods and services. In an unhealthy economy, unstable prices make economic calculation difficult and disrupt the communication between producers and consumers needed for efficient production and distribution. A kanban represents a portion of the productive capacity of some closed internal economy. It is a medium of exchange for the goods and services provided by the operations of a system of productive resources. The supply of kanban in circulation is controlled by some regulatory function that enforces its value. That is, a kanban is a kind of private currency and the shop floor manager is the bank that issues it, for the purpose of economic calculation. If you carry the currency analogy further, then you might say that kanban is not about the cards at all. Just like money is not about the bills. Kanban is all about the limits, the quantity in circulation. How that is represented in a transaction is mostly incidental. A simple rule for understanding all of this might be: A task card without a limit is not a kanban in the same way that a photocopy of a dollar bill is not money. If you use a durable token like a plastic card, then this is easy to manage: control the number of cards in circulation. If all of the available cards are already in circulation, then the next person who comes looking for one is just going to have to wait until one returns. This is the very purpose of the kanban system. However, if you use a more disposable medium like index cards or sticky notes, then you need another mechanism to regulate the 8220money supply.8221 In our case, we simply write the quantity of kanban in circulation on the task board, and allocate new cards according to that limit. This means that a kanban serves two functions: it is a request to do something in particular, but it is also permission to do something in general. That second notion of permission is where people who are new to lean thinking tend to struggle. But this is precisely how we can 8220optimize the whole8221 or 8220subordinate to the constraint.8221 Crunchy on the outside, chewy on the inside Just as an unregulated index card on a cork board is not a kanban, time-boxed iteration planning is not pull. No reasonable interpretation of Lean involves building to a one-month forecast unless the cycle time for each work order is also a month. One month worth of stuff in process is certainly a much smaller batch size than 3 months or 18 months, but if your iteration backlog contains 20 work items, then that8217s still about 19 more than it needs to be a pull system. Nonetheless, it is not difficult to augment Scrum with a few simple practices that move us towards a more recognizably lean workflow. The most obvious is the reduction of iteration length, although this is not without problems 2. As we8217ll see, it8217s possible to incrementally enhance Scrum with more and more pull-like features until all that remains of the original process is vestigial scaffolding. The simple approach is to start with Scrum-like iterations and iteration planning process, and begin to add pull features to the team8217s internal process. One simple technique that brings us much closer to our kanban definition is to set a multitasking limit for individuals. You might have a simple principle like: prefer completing work to starting new work, or you might express that as a rule that says: try to work on only one item at a time, but if you are blocked, then you can work on a second item, but no more . In our example, that rule gives us an effective WIP limit of 6. Another common technique is the late binding of tasks to owners. Some teams will pre-assign all of the known tasks during iteration planning. That8217s generally not a good idea because it artificially creates a critical path. Waiting until the 8220last responsible moment8221 to assign tasks to people maximizes knowledge and brings you closer to pull. Just because anybody can have more than one item in process doesn8217t mean that everybody should have more than one item in process. A problem with our multitasking rule is that it locally optimizes with no consideration of the whole. An implicit total WIP limit of 6 is still more WIP than we should probably tolerate for our three workers. A limit of 4 of 5 total items in process at one time still allows for some multitasking exceptions, but disallows the obviously dysfunctional behavior of everybody carrying two items. At this step, we have moved beyond a rule about individuals and have made a rule about the task cards themselves. That is, we have made our cards into kanban. Another enhancement we can make to our previous board is to add a ready queue between the backlog and work-in-process. The ready queue contains items that are pending from the backlog, but have high priority. We still haven8217t bound any individual to these tasks, but as soon as somebody becomes available, they should take one of these tasks instead of picking something out of the general backlog. This enables us to decouple the process of assigning work from the process of prioritizing work, and it simplifies assignment. The ready queue also has a kanban limit, and it should be a small limit, since its only purpose is to indicate which work item should be started next. Now we can begin to see some of the mechanics of pull and flow: 1. David completes a task and moves it into the 8220done8221 column. 2. David pulls a new kanban from the ready queue and begins working. 3. The team responds to the pull event and selects the next priority item to go into the ready queue. At this point, we are now operating a simple kanban pull system. We still have our time-boxed iteration and planning cycle, so perhaps we might call such a thing a Scrumban system Now that we have a sense of capacity and pull, it8217s natural to think about flow. Breaking up our nebulous 8220in process8221 state into better defined states can give everybody more visibility into the strengths, weaknesses, and overall health of the team. Even Agile workflows like Extreme Programming have relatively well-defined roles and states, and a smooth flow of work between those states is just as important as a smooth flow of work through the process overall. Here we8217ve broken down in-process into two states: specify and execute . Specify is about defining whatever criteria are necessary to determine when the work item can be considered complete. Execute is about doing the work necessary to bring that work item into a state which satisfies those criteria. We have split our previous WIP limit of 5 across these two states. Specify is considered to take less time in this case, so it is given a limit of 2. Execute consumes the remaining limit of 3. We might change this ratio as time goes on and our performance changes. Since we are now thinking more about flow, the additional workflow detail strongly suggests using a Cumulative Flow Diagram to track the work and measure our performance. A simple burndown tells you something about whether or not you are delivering value, but not very much about why. The CFD communicates a lot of additional information about lead times and inventories that can diagnose problems, or even prevent them. By defining our workflow a little better, we can also account for some functional specialization. In this case, it might be a soft specialization, where some of us prefer doing one type of work more than another, even though we8217re capable of doing it all. It8217s important to understand that this kind of pull workflow system allows specialization but does not enforce specialization. The team owns the work and the workflow, and it8217s up to the team to figure out how to get it done efficiently. If we let the person who8217s best at performing the 8220specify8221 function handle more of that work, then we may also need to coordinate handoffs between ourselves. Adding the specify-complete column communicates to the team that a work item which was previously in the specify state is now ready to be pulled by anyone who wants to move it to the execute state. Work that is still in the specify state is not eligible to be pulled yet. If the owner of a ticket in the specify state wants to hand it off, he can put it in the complete buffer. If he doesn8217t want to hand it off, he can move it directly into the execute state as long as capacity is available. It might be that the execute state is full, and the only eligible work is to pull another ticket from the ready queue into specify. Since we have added a new column for our handoff buffer, we are also increasing the WIP limit by a small amount. The tradeoff is that the increase in lead time due to the new inventory should be offset by the decrease in lead time due to the advantage of specialization. We also mitigate the impact of that new inventory by pooling the WIP limit across the preceding state. This has the very beneficial consequence of making the specify-complete buffer a variable throttle for the preceding station. The more work that piles up in the specify-complete buffer, the less work can be in process in the specify state, until specify is shut down entirely. But we see it coming, it doesn8217t 8220just happen.8221 If we8217re going to allow workflow specialization and the handoffs that result, then we will also need some agreement about what results to expect at each handoff. We can do that by defining some simple work standards or standard procedures for each state. These do not have to be complicated or exhaustive. Here, they are simple bullets or checklists drawn directly on the task board. They only need to be sufficient to avoid misunderstanding between producers and consumers. These standards are themselves made and owned by the team, and they can change them as necessary according the practice of kaizen . Putting them in a soft medium like a whiteboard or a wiki reinforces the notion of team ownership. Level 2 Scrumban In the basic version of Scrumban described so far, the iteration review and planning cycle happens just as it does in ordinary Scrum. But as our production process has matured, we have also given ourselves some tools to make the planning process more efficient, more responsive, and better integrated with the business that it serves. With the pull system in place, our flow will become smoother as our process capability improves. We can use our inter-process buffers and flow diagrams to show us our process weaknesses and opportunities for kaizen. As we get closer to level production, we will start to become less concerned with burndown and more concerned with cycle time, as one is the effect and the other is the cause. Average lead time and cycle time will become the primary focus of performance. If cycle time is under control and the team capacity is balanced against demand, then lead time will also be under control. If cycle time is under control, then burndowns are predictable and uninteresting. If burndowns are uninteresting, then goal-setting and risky heroic efforts are unnecessary. If burndowns are uninteresting, then iteration backlogs are just inventory for the purpose of planning regularity and feeding the pull system. As such, they should be the smallest inventories possible that optimize planning cost. Since the team now pulls work into a small ready queue before pulling it into WIP, then from the team8217s perspective, the utility of the iteration backlog is that it always contains something that is worth doing next. Therefore, we should use the least wasteful mechanism that will satisfy that simple condition. A simple mechanism that fits the bill is a size limit for the iteration backlog. Rather than go through the trouble of estimating a scope of work for every iteration, just make the backlog a fixed size that will occasionally run to zero before the planning interval ends. That8217s a simple calculation. It8217s just the average number of things released per iteration, which in turn is just a multiple of average cycle time. So if you have 5 things in process, on average, and it takes 5 days to complete something, on average, then you8217ll finish 1 thing per day, on average. If your iteration interval is two work weeks, or 10 work days, then the iteration backlog should be 10. You can add one or two for padding if you worry about running out. This might be a point that8217s been lost on the Scrum community: it8217s never necessary to estimate the particular sizes of things in the backlog. It8217s only necessary to estimate the average size of things in the backlog. Most of the effort spent estimating in Scrum is waste. In our final incarnation of Scrumban, iteration planning still happens at a regular interval, synchronized with review and retrospective, but the goal of planning is to fill the slots available, not fill all of the slots, and certainly not determine the number of slots. This greatly reduces the overhead and ceremony of iteration planning. Time spent batch processing for iteration planning estimation can be replaced with a quality control inspection at the time that work is promoted to the ready queue. If a work item is ill-formed, then it gets bounced and repeat offenders get a root cause analysis. Off with the training wheels If you have made it this far in your evolution, you will probably realize that the original mechanisms of Scrum are no longer doing much for you. Scrum can be a useful scaffold to hold a team together while you erect a more optimized solution in place. At some point you can slough off the cocoon and allow the pull system to spread its wings and take flight. The first step beyond Scrum is to decouple the planning and release periods. There may be a convenient interval to batch up features to release, and there may be a convenient interval to get people together to plan. If we have a leaner, more pull-driven planning method, there8217s really no reason why those two intervals should be the same. Your operations team might like to release once a month, and your product managers might like to establish a weekly prioritization routine. No reason not to accommodate them. Once you8217ve broken up the timebox, you can start to get leaner about the construction of the backlog. Agility implies an ability to respond to demand. The backlog should reflect the current understanding of business circumstances as often as possible. Which is to say, the backlog should be event-driven. Timeboxed backlog planning is just that, where the event is a timer, but once we see it that way, we can imagine other sorts of events that allow us to respond more quickly to emerging priorities. Since our system already demonstrates pull and flow, that increased responsiveness should come at no cost to our current efficiency. The problem we are trying to solve is: The ideal work planning process should always provide the development team with best thing to work on next, no more and no less. Further planning beyond this does not add value and is therefore waste. Scrum-style timeboxed planning usually provides a much bigger backlog than what is strictly necessary to pick the next work item, and as such, it is unnecessary inventory and therefore unnecessary waste. The next event we might consider for scheduling planning activities is the concept of an order point . An order point is an inventory level that triggers a process to order new materials. As we pull items from the backlog into the process, the backlog will diminish until the number of items remaining drops below the order point. When this happens, a notice goes out to the responsible parties to organize the next planning meeting. If our current backlog is 10, our throughput is 1day, and we set an order point at 5, then this planning will happen about once a week. Once a week might be reasonable if people are hard to schedule or need some lead time in order to prioritize. However, if they are more available than that, then we can set the order point lower. If the planners can respond within a day, then perhaps we can set the order point at 2. If the order point is 2, then there may be no need to keep a backlog of 10. Perhaps we can reduce the backlog to 48230and reduce our lead time by 6 days in the process. The end state of this evolution is pull, or prioritization-on-demand. If the planners can make a good decision quickly enough, and there is no economy of scale in batching priority decisions together, then the size of the backlog only needs to be 1. At the moment the item is pulled by the development team, the planning team is signaled to begin selecting the next item. If the planning team is fast enough in their response, then the development team will never stall. If there is some variation or delay in reponse, then a backlog of 2 might be necessary to prevent stalls. But 2 is still a lot smaller and leaner than 10. Or 20. Or 50, which is something I8217ve seen more often than I would like. The same kind of logic can be applied to the release interval. There is an optimum batch size for releases and we should first try to find it, and then try to improve it. The result of our efforts will ultimately be features-on-demand. Even at this level, we still have a fairly basic kanban system. From here we can add work item decomposition (swimlanes), or structural dependency flow for additional scale. Along with an enlightened division of labor. this is how we believe that Lean shows the way to scale Agile up to the enterprise. 1. in spite of the fact that the kanban idea is at least 40 years older 2. which I8217ll probably write about in another post sometime An excellent article and some food for thought. Takk. Your post is really interesting and I8217ve done exactly what you8217ve described above with one of my clients who found it massively useful8230despite being really hesitant at first. They were also doing scrum (loosely and in inverted commas) and were struggling so I got them back to Scrum proper in the fall of last year and then introduced the kanban idea8217s almost the way you described above over a month or so with great results. Couple of things I wanted to get your take on: One thing that I didn8217t get round to changing with them was how they sized their stories 8211 they8217re using Story Points but I8217m wondered what your take on that would be Also how does this pull based work fit into a release plan Scrum based teams have their velocity based on completed story points and can then do some basic release management work (e. g. it8217ll be roughly 4 iterations (or sprints) before all these stories would be completed and you can draw the burndown from this information. If you8217re not doing development iterations anymore what happens to velocity and how Scrum teams traditionally use it to track progress I mention this last question as that is one that will come up a lot with Scrum teams and may be worth amending the article to cover. While I 99.44 agree with this post, I think there is still some detail that needs to be filled in. For example 8220Its only necessary to estimate the average size of things i n the backlog. Most of the effort spent estimating in Scrum is waste.8221 I would say this is true if there is managed variation in the size of the backlog. For example, all PBIs could be constrained to be small (e. g. everything is 82203 ideal days or less8221) and all PBIs bigger than that broken into smaller PBIs (I think this helps support single piece flow). At some point the product owner needs to judge PBIs based on value (e. g. ROI). How can they do that if they are just basing each PBI8217s value based on an average First, it8217s really great to hear about your results. We love hearing kanban stories from the field I think story point limits are a legitimate variation on kanban limits. It adds a little complexity to do it that way, but I wouldn8217t object if somebody felt that was best for them. I knew somebody would call me on the release planning question. I will be writing more about that in the near future, and we8217ll be discussing it at the APLN conference in Seattle next week. Throughput is continuously calculated as work items complete. We8217re managing throughput directly by managing work-in-process and cycle time. Kanban is all about fixing work-in-process, and that leaves us with cycle time which we manage with value stream and theory of constraints methods and such. Release planning consumes the historical throughput metric, and can apply methods like Minimum Marketable Features, Staged Delivery, and Rolling Wave Planning. That8217s what we recommend: a rolling wave planning event on a regular cadence. Toyota naturally makes production schedulesJust because we produce just-in-time in response to market needsdoes not mean we can operate without planning First, the Toyota Motor Company has an annual plan. This means the rough number of carsto be produced and sold during the current year. Next there is the monthly production scheduleBased on these plans, the daily production schedule is established in detail and includes production leveling. Limiting the size of work items for downstream scheduling is one of the things we recommend. A kanban workflow can extend the value stream before and after the typical boundaries of a Scrum team, so that some of the work that the Product Owner does is pulled into the workflow and managed accordingly. Size and effort estimates can be produced as a natural consequence of analysis, since that analysis has to be done anyway. Heaven knows I don8217t expect anybody to agree with me 100. You rightly point out the significance of work item sizing to single piece flow, and that is spot on. Mostly, I want to help facilitate a conversation about the implications of pull and flow for software development. That doesn8217t mean that pull and flow are 8220right8221 in some absolute sense, although I personally find them to be extremely compelling. But if we decide that pull and flow are the right answer, then this blog is mostly about figuring out what that really means, and I very much appreciate the feedback that I get here. Scrum works flawlessly for all of my teams. Why overly complicate things by adding kanban constraints From my KISS perspective, Scrum is 8220leaner8221 than kanban will ever be. Lean processes are built for competition. If 8220good enough8221 process is good enough for you, then perhaps textbook Scrum is adequate for your purposes. Comfortable is a luxury, so I hope you enjoy it. Maybe you are not in a competitive situation. Maybe your competitors are inept. If, however, you are under any pressure for systematic performance improvement, then the suggestions in this article address inefficiencies that are built into Scrum. Takk for innlegget ditt. Through years of using Scrum and tweaking to make it more lean, my current team is using techniques very similar to the ones you describe. You8217ve given me some more ideas for future kaizen meetings to further tweak the process. . section of this talk that was about applying Kanban to an existing Scrum process, 8220Scrum-ban8220, can be found on Corey8217s blog. What this does for you is allows you to evolve Scrum . . teams across time zones 8211 Corey Ladas talk about 8220Scrum-ban8221 helped me figure out that one of the key issues with distributing teams across time zones is . How to avoid having too much work in progress8230 Most teams I worked with have at some point had a lot of work in progress. Sometimes virtually the complete8230 Corey Ladas has written an interesting paper titled Scrum-ban in which he describes how a Scrum team8230 . Corey Ladas propose aux quipes Scrum d8217voluer vers le Lean en introduisant un systme de Kanban. . Corey Ladas explains Scrum-ban8230 Cory has a great post titled: Scrum-ban Lean Software Engineering. In it he describes how a team can8230 . Ladas explains Scrum-ban Cory has a enthusiastic place titled: Scrum-ban Lean Software Engineering. In it he describes how a aggroup crapper verify plus of kanban within a Scrum . It looks like it8217s getting closer to a production like system. Do you believe that creating software is a production like activity No, I believe that creating software is often a workflow-like activity. In a pull system, where does one schedule retrospectives Does the team decide how often they should occur Seems like everything else doesn8217t have a particular schedule. Firstly, a pull system creates what I8217d call an 8220event-rich environment,8221 which means there is a great deal of context and opportunity for introspection and process improvement. The pull system is giving you permission to not wait until your next 8220official8221 retrospective to change something. 8220Pink tickets8221 or 8220Andon lights8221 ought to trigger a process that can lead to a root-cause analysis and process improvement. Secondly, there should often be some kind of planning process or rhythm above the level of individual work item scheduling. This particular article showed how you could use the Scrum framework for that purpose. You could also schedule retrospectives around lower-frequency events like release of an MMF. If your MMFs are sufficiently 8220M8221 then you might use their natural rhythm to trigger planning events. Or you could schedule a regular event that may or may not coincide with some other planning or release event. One suggestion is a 2-week integrationrelease cadence, with a 6-week (or semi-quarterly) rolling wave planning event. Corey, I8217m not familiar with using 8220Pink tickets8221 or 8220Andon lights8221, but I understand the purpose. Heh, what is an MMF More jargon for me to learn heh. Great article I8217m applying this to game development where there is a transition from exploring the game mechanics (fun) using Scrum to the production phases where we develop 8-12 hours of assets using a Lean-Kanban approach. Your description of Scrumban is perfect for transitioning the team. The main thing that would prevent us from trying Scrumban across the whole project is the concern oflosing a major benefit of the iteration. Preproduction (Scrum) iterations are ideal for a 8220unifying audacious goal8221 for the team. We don8217t know all of our tasks (often only 50), so we leave room in the schedule for exploration. Leveling development, decoupling iteration planning from review8230this seems to deprecate audacious iteration goals. Am I overlooking something Thanks Clint Firstly, thank you very much Part of the thinking about the Scrumban approach is that it allows you to keep old practices that have value to you, add new practices that have value to you, and drop ones that don8217t. I told one story here about an evolution of a process, but there are other stories that could also be told. You could do a lot of the things that are in the article without giving up iterations. And the point was meant to be that you would only give them up ifwhen you recognized that they no longer have value for you. If that never happens, then you wouldn8217t give them up One thing I left out of this article was project or product planning, which can provide additional context and motivation. It seems like I only write about this in the comments8230but I will have an article soon about the relationship between Minimum Marketable Features, Rolling Wave Planning, Real Options, and Kanban. Corey Ladas has written an interesting paper titled Scrum-ban in which he describes how a Scrum team8230 . montamos o quadro do Scrum Neste vdeo apresento nosso 8220Scrum-ban8220, comentando sobre os materiais utilizados para mont-lo e quais so as informaes . . Scrum-ban Lean Software Engineering . I still don8217t get specify and execute totally. Can you give a specific example. Specifyexecute is only an example. The boundary was meant to approximate what-to-build vs. how-to-build-it. Specify is meant to be the 8220operational definition of the problem statement8221. That could be things like requirements specifications, test cases, and wireframes. In turn, these things could be represented by things like user stories and automated acceptance tests. Execute could be schematics, design verification tests, source code, integration, acceptance testing and other V038V. Corey, I found the article written by you a lot of insights to me. Many more concepts and flaws in the industry to be cleared to make it leaner and give value to the customer . rapid delivery, regular inspection, adaption, customer alignment, quality, etc. by other means (eg: kanban, FDD, etc.) then they pass the test. I8217d perhaps change this to 8220Can teams effectively . Thanks for this article, Corey. I am leading a Production SupportMinor Enhancement team in a development shop where all the features teams are using Scrum and I8217ve been looking for a way to implement an AgileLean methodology which would tolerate the constant interruptions inherent to application support and maintenance. Your approach is the most promising I8217ve found so far. Now if you8217ll excuse me, I8217m off to fight our other Scrum teams for some wall-space . different direction, but I8217m going to read up more on LeanKanban some more, (offhand, Scrum-Ban may prove helpful), and see if there are tweaks that can be borrowed from those ideas that would . Kanban 8211 Pulling Value From The Supplier8230 Before I start talking about how our team is going about our implementations of Lean and Kanban, I wanted8230 Kanban 8211 Pulling Value From The Supplier8230 . Scrum morphen in Kanban. . Corey, Do you have any productivity data to compare the improvements of a project using Scrum-ban vs. textbook Scrum It8217s still pretty early, so much of the evidence is still anecdotal or speculative. David Anderson has real data on pull system performance in general. Clinton Keith has specific data on Kanban vs Scrum. There may be some others, possibly Dave Laribee or Karl Scotland. A good place to ask would be on the Yahoo kanbandev group. . I like Twitter. A lot. Twitter has helped me connect with a diverse group of people, particularly in the Agile community. I consider myself fortunate to chat and learn from people like Lisa Crispin (lisacrispin), Brian Marick (marick), Esther Derby (estherderby), as well as lesser knowns like myself. For instance, I learned about Lisa8217s upcoming book on Agile Testing (pre-ordered), Esther8217s love of gardening and good food and have a ring-side seat between the always colourful Ron Jeffries (RonJeffries) and Bob Martin (unclebobmartin). I also discovered some great articles, including Cory Ladas8217 Scrumban. . Another batch of Interesting Scrum and Agile Blog Posts:8230 Another batch of Interesting Scrum and Agile Blog Posts:8230 . (thinking) tools that can aid in tuning current Agile practices. A perfect example is Corey Ladas8217s Scrum-ban approach a way to upgrade the default Scrum task board using Lean tools. Given the rising . Thanks for this excellent article. We8217re running with Scrum but need to be that little bit more agile so am looking into Kanban. This article is good at helping map out a transition approach. Jubel. . scrum Following Skype, Twitter and e-mail discussions with Rick Cogley, looking for example at Scrum-ban, we thought about how to improve the current Scrum module, creating a more 8220visual8221 . Improving Speed and Quality8230. VisionFramtarsn Viskiptavinir eru ngir me afkst og gi verkefna deildarinnar. 8220In preparing for battle I have always found that plans are useless, but planning is indispensable82308230. Interesting article about how to evolve Kanban8230 Scrum amp Kanban Discussion on the 1712 Goal: Investigate how netkerfi and agangkerfi would start to use Kanban system ( with Scrum ) Main items discussed: Kanban Flow optimization Just in time process Reduce the planning process Kanban8230. . un rsume, vous pouvez lire ce post sur son . . Scrumban is such an approach, read more here . . hier ist nicht mehr ganz frisch, aber immer noch Wert gelesen zu werden: Scrum-ban von Corey Ladas. Es geht dabei um die Verbindung von Scrum und dem ursprnglich vom . We8217re implementing kanban for our IT Operations still in the prototyping phase but it8217s helped so far to visualize our bottlenecks. I8217ve posted a few pictures on my blog with little explanation and hope to expound in the future. Just a quick note on page 55 of your book, I found the passing reference to Axiomatic Design to require a bit more explanation. It8217s a bit coming out of nowhere and I8217d love to have more details. I agree about some of the references in the Scrumban book. I8217m sure I drop a couple of random TRIZ references as well. I do have a couple of related articles here, and more will be coming. Kanban: Some Kanban Resources8230 . are designed to reduce multi-tasking, maximize throughput, and enhance teamwork. Scrum-ban: leansoftwareengineeringkssescrum-ban. a kanban serves two functions: it is a request to do something in particular, but it is also . . anyone that hasn8217t read it Corey Ladas8217 blog post on Scrumban is truly worth the read. It describes how to evolve from Scrum into a more Kanbanesque process. . . Scrum-ban Scrum-ban: leansoftwareengineeringkssescrum-ban. 8220The idea of using a simple task board with index cards or sticky notes is as old as Agile . . conclusion that the best course of action would be to not break the stories down at all, as seen in Scrum-ban, but we8217re not ready yet, and I8217m not sure we will be in this project. So in the mean . . Scrum-ban. Interesting attempt to mix Scrum and Kanban, taking the best from both worlds. Kanban with iterations is possible. . . on is Kanban. I believe Kanban is a great fit to many teams and situations. Specifically doing Scrumban is a great way to get the benefits of Agile project management together with the Lean Flow Kanban . . Scrum-ban Lean Software Engineering . . to you in order to improve an existing Scrum team or as a step in moving towards Kanban (see also Scrumban by Corey . IT (Kanban Development)8230 (Kanban Development)8230 . in Miami is available at seplk2009 with talks from among others Corey Ladas on scrumban, Alan Shalloway on going beyond Toyota, David Anderson on Kanban, Karl Scotland on Kanban flow and . . sich im Vorfeld etwas informieren mchte kann nachlesen bei Henrik Kniberg, Boris Gloger oder Corey Ladas (Autor des Buches . If you use the post-its with the sticky side at on the bottom edge, instead of the top edge), then it8217s easy to see the writing at the top of each note, even if crowded. . Scrum-ban Scrum-ban: leansoftwareengineeringkssescrum-ban. 8220The idea of using a simple task board with index cards or sticky notes is as old as Agile . . Kanban - Feature development is streamlined by moving features through a kanban 8220pull8221 system. Kanban systems can take many forms. Most kanbans are com prised of two primary components: units (i. e. Goal) and cards (i. e. features and user stories). For more information about using kanban systems for software development, check out Scrum-ban Lean Software Engineering. . . Scrum-ban Lean Software Engineering (tags: scrum kanban) . Cory 8211 would like to watch your video on Scrumban at seplk2009core. - evolution. but noticed it never displays for me8230seeing the same thing Hi Brian. It doesn8217t work for me either. Kanban In Time-Boxes: The Cadence of WIP and Sprints8230 A comment that was left on a previous post. and a response that I made to the comment, got me thinking8230 . started throwing out the idea of Kanban instead of Scrum. Really, they are wanting to start with Scrumban, where they ease into Kanban and as Rex likes to stay, 8220kick the ends out of . . phase ourselves into XP principles that suited us when I came across a blog by Corey Ladas called Scrum-ban and now I8217m more confused and intrigued than . . (Kanban resources)Kanban vs. Scrum (Friendly comparison)One day in Kanban Land (Kanban Cartoon)Scrumban (Fully . . I also refrence this postsite: leansoftwareengineeringkssescrum-ban which was our original introduction and starting point. Follows is original description with . . Corey Ladas on Scrumban . . prioritization in any process, but particularly in the most flexible Agile processes8230like ScrumBan, of which I am a . . to never work on more than a certain limited number of items at any given time. Kanban, CONWIP and Scrum-ban are similar techniques to achieve this. I won8217t go into detail about these techniques in this . . leansoftwareengineeringkssescrum-ban Filed under: Uncategorized Leave a comment Comments (0) Trackbacks (0) ( subscribe to . Hi Corey, A wonderful article and a great way to simplify Kanban learning. I don8217t get this line - 8220This might be a point thats been lost on the Scrum community: its never necessary to estimate the particular sizes of things in the backlog. Its only necessary to estimate the average size of things in the backlog8221 What is meant by estimating average size of things All things are not the same. Let us say I have 5 MMFs in my in process (Specify) and my initial task is to find the lead time so that I can identify the limits for my backlog. I find that the lead time for 5 MMFs is 20 days. This means 1 MMF takes 4 days. If my sprint is 10 business days I could complete approx 3 MMFs. This would mean my backlog limit could be 3. But these 5 MMFs have varying sizes. If I have to identify MMFs of similar sizes to fit in my backlog (3) I would have to spend considerable time planning, breaking stories etc. In the process if I don8217t find MMFs that could fit the 3 backlog limit then what happens . Scrumban (Scrum um Kanban angereichert incl. superduper Kanban-Board) . . vs Scrum Interesting read on InfoQ. Henrik provides more info. Also worth reading David8217s . . (Kanban resources)Kanban vs. Scrum (Friendly comparison)One day in Kanban Land (Kanban Cartoon)Scrumban (Fully . . A guide to incrementally introduce Kanban ideas into a Scrum environment, known as Scrum-ban. . . Scrumshyban is a great transhysishytional step for teams and clients when trying to go from ScrumAgile or, shudshyder Watershyfall, to Kanban. . Process description This page contains information on the majority of processes that could be applied to IT project Here are the most well known processes into the PP presentation format SDLC models. ppt SDLC models82308230. . to you in order to improve an existing Scrum team or as a step in moving towards Kanban (see also Scrumban by Corey Ladas). Daily . Wireless Networks description This page contains information on the majority of processes that could be applied to IT project Here are the most well known processes into the PP presentation format SDLC models. ppt SDLC models. ppt Agile8230 . Lean Software Engineering offer a good overview of how ScrumBan differs from Scrum 8211 essentially improving the speed of time-to-market. Agile . . på. ja jos sitten haluaa hmment itsen, niin kannattaa tutustua uuteen ksitteeseen eli scrumban. Trkeint ei ole tekniikka vaan siit saatva hyty. Eli jokaisessa tapauksessa paras . Great post I also really like the Scrumban book 8230 I used similar principles by myself in the role of a scrum master (after studying Theory of Constraints and Lean Software Development theories) and it greatly helped to make a good team even better I am glad you provided this nice introduction and motivation article This is a great mechanism for working. This is not at all saying this can not work, but one contextual item where a pull system may NOT work is if during Sprint Planning, you estimate when you need a particular user to help with refining the requirements (story). You can8217t just pull, because you 8220scheduled8221 that person at a particular time most likely for a reason (perhaps they weren8217t available). At the very least, it could become a problem as you would be giving them last minute notice. Again, this is not to say this technique can8217t work, but just to provide a case where it may be more difficult as a consideration for those thinking of using it. If the user (and perhaps it is the product owner, but often he or she may not be) can be totally dedicated to the team or at least to that Sprint, then this could be problematic. Great post and I8217ll be looking for where I can apply these concepts. ClearWorks New Version 8211 2.4 Released Many our customers asking us about enhancements, and we are doing our best to provide requested features and functionality. Today8217s release is a big update of current Sevenuc best seller product (also known as agile lifecycle tool for hardware amp software project) and at the same time composition with other software configuration management tools, and more automation test tools and build servers. Update contains more elements for Lean RampD real-time collaboration platform and reflects latest innovations in Lean Kanban created by Sevenuc and other platform vendors. What8217s New in 2.4 Workflow define for deferent project with Lean stage management. Event and status driven mechanism by Triggers. Email classfication for effective customer request life-cycle management. Complete release support for Lean agile project. Lean RampD behavior improvements for all type of statistic charts . etwa dem guten alten Wasserfall oder Scrum kombiniert werden. Klar, dass daraus sofort der Begriff Scrumban entstanden . 8220The idea of using a simple task board with index cards or sticky notes is as 823082218230 The idea of using a simple task board with index cards or sticky notes is as old as Agile itself. A simple variation of this is a task board with a simple Pending - gt In Process - gt Complete workflow82308230. . best of both systems youll knock the rough edges off of both.160 (Some people use the term Scrum-ban.)160 One of the best influences Kanban can have on Scrum is to put the concept of a sprint . Kanban und Scrum 8211 Literatur und Links8230 Zitiert aus Wikipedia: Kanban in der IT8230 . Once you get work showing on your Kanban board you will see where work is piling up.160 Excess work in process raises a number of challenges. It increases the time a new item will take to travel through the system. It indicates the likelihood of an overburden on the current performers or the next performers. For example, in Scrum iterations when there is a lot of work in development that all moves to test at the end of the iteration this is undesirable.160 You can limit WIP in a number of ways.160 In Scrum, we limit WIP at the iteration boundary and some teams limit WIP by limiting the number of work items that can be active at any one time. The Kanban board calls for explicit WIP limits and also recommends buffer columns to mitigate the impact of variation to keep work flowing through the system. You can certainly explicitly limit WIP and include a buffer or buffers on a normal Agile board. Here is a essay from Lean Software Engineering showing ScrumBan. . Interesting Agile related links8230 Here you will find links to various blog articles and training opportunities outside of DRW. Articles, Blogs, random musings on Agile Agile development is more culture than process8230 . Scrum ceremony I8217m not entirely sure. It seems to be more of a Scrum Kanban mix for now (Scrumban), and I don8217t see them discarding the rest of the Scrum ceremony anytime . Artificial Critical Paths8230 I was reading through Coreys post on Scrum-ban again, and I really liked his point that assigning all8230 . Scrum-ban Lean Software Engineering 8211 lsaquoPrevious Post Bookmarks for June 22nd from 12:43 to 12:51 . . on the KanBan road needs to remember the importance of the Kanban Board. If you are new to KanBan there is a very good minibook available here. KanBan is probably . great article. We are doing Kanban for Administration teams and Maintenance Teams and Software Architecture Teams soon also. It8217s really successful and we like it. XING AG, xing Susanne . useful for getting the entire team on the same page but it039s painful. Now I prefer something more scrum-ban style and hold weekly 30-45 minute planning sessions. Just enough to fill out the backlog for a . Scrum meeting with Raj 8211 September 28th, 20108230 Raj is fixing row total calculation issue in Cargotec PDF (OFF18) Raj can contact Mikko or Jukka for help if needed There is also new field required for the Parma Email Form layout, Tommi will create new issue for this Some discussion about Scrum,82308230 Kanban 8211 Scrumban8230 Quellen zu Kanban u.. Kanban and Scrum making the most of both (eBook als PDF) Kanban ScrumbanKanbanAndScrumInfoQVersionFINAL. pdf Kanban in der IT (Wikipedia)8230 What are the metrics by which you measure your agile development process8230 You introduced some good points, Parker. I would offer that both volatility and velocity (and your mention of story point inflation) measure story grooming, which is not always a proper measurement. It may be that minimal volatility indicates that a 8230 . new ScrumBan development . . agile systems and creating their continously improving Scrum system, which is also called Scrumban. It is basically a evolution of Scrum using the concepts of the Japanese lean Kanban methods. . February, 2010 8230 I really like this one video: an interview with Henrik Kniberg8230 . be used in order to create hybrid agile-Kanban systems. One example is Scrumban, fully discussed by Corey Ladas in his 2008 paper, in which he describes Scrumban as Incrementally enhancing Scrum with more and . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . releases You can filter stories and bugs on Kanban Board by release and iterations now. It enables Scrum-ban development process and makes Kanban Board useful for iterative development activities like Release . . Others have made a blended process, often Scrum Leankanban hybrids with the Leankanban part being brought in from the ops side and merged with more of a love for Scrum from the dev side. Though some folks say Scrum and kanban are more in opposition, others posit a combined 8220Scrumban.8221 . I work with a team that does both new feature development as well as production support. The 8220pull8221 concept seems to break down for us is when a support issue needs to be addressed immediately which requires reallocation of individuals from their current WIP task to a support task. This type of scenario 8220pushes8221 a high priority item into the flow and could take us beyond our WIP limits that we are trying to adhere to. Maybe this is just an acceptable situation where an item is pushed into the flow instead of pulled by a team member that is available for work. I would be interested to hear from anyone with similar experiences and how they may have adapted their process to create pull out of a push situation. . pas succomb la mode de l8217open space. Tout le monde son Kanban ou driv (Scrumban). du designer au commercial en passant par le dveloppeur. Nous sommes donc adeptes du Post-It, . . 8222Lean Software Engineering8220 von Corey Ladas und Bernie Thompson: Die beiden teilen ihre Erfahrungen aus Teams von u. a. Microsoft und IBM. Ihre Artikel bekommen hohe Aufmerksamkeit und sorgen regelmig fr umfangreiche Diskussionen. Dir hat dieser Beitrag gefallen Ich freue mich ber einen Kommentar. Du kannst auch meinen RSS-Feed abonnieren oder mir auf Twitter folgen: pherwarth. Foto: Von RafaEU . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . I8217m reading the Scrumban book by Corey Ladas. One thing Corey says is that Test-Driven Development is good, but not as good . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . software development) and how it can be better scheduled and delivered. One particular post on scrumban is recommended as it builds a complex visual management board step by step from a simple three . Your article is a must read and a must try I have translated it into french : fabrice-aimetti. fr. 1Scrumban Thank you, you8217re great Corey, I have a hard time accepting your premise that: 8220The ideal work planning process should always provide the development team with best thing to work on next, no more and no less.8221 In a manufacturing environment, you don8217t need to know anything other than, 8220How fast are we putting out widgets8221 But in software development, the 20 items that you deployed to production last month may be completely unlike the 20 items you deploy this month. If you have vendors or customers who need to integrate with your application, users who need training, salespeople who need updated presentation materials, etc. etc. etc. you need to be able to tell people what features will be available when. I8217m not saying there has to be a year-long unalterable roadmap, but there are valid reasons to want to know more than, 8220What are the next five things we8217re doing8221 How do you square your statement about planning with all these competing needs for more planning . vdeo apresento nosso 8220Scrum-ban8220, comentando sobre os materiais utilizados para mont-lo e quais so as informaes . When Is A Sprint A Failure8230 Update: this blog is no longer active. For new posts and RSS subscriptions, please go to saintgimp 8230 . team is mature enough to make the most of it, if you8217re not sure, then try transitioning via scrum-ban, it8217ll help you see the benefits and enable you to get better at the things which make kaban . This page is an old draft Introduction This docume8230 In Scrum, we limit WIP at the iteration boundary and some teams limit WIP by limiting the number of work items that can be active at any one time. The Kanban board calls for explicit WIP limits and also recommends buffer columns to mitigate the impact of variation to keep work flowing through the system. Articles There are some articles that the whole world should read. List them here, in an appropriate topic section. If there isn8217t one, add one Try to keep topics alpha ordered82308230. . Story: Scrum-ban Lean Software Engineering) Like this:LikeBe the first to like this post. agile, scrum agile, kanban, scrum How Yahoo . Although this post was published several years ago I only had the chance to read it today. A very interesting concept. I wonder what happened to the concept of Scrumban now. It doesn8217t seem that it actually worked since no one is using it nor speaking about it8230 PM HUT You are incorrect. I and many others are doing versions of Cory8217s idea. For example, it seems the majority of those using AgileZen (agilezen) are using it 8211 see the discussion boards. . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . . Scrum-ban technique is being adapted by many. It is a combination of Scrum and Kanban method. It mainly categorizes support tasks in to Not started, In progress, Done on a white board. Post its with task description will be used to categorize the current pool of tasks. For more do check this nice article. . . Kind of a simple way to manage things when you think of it. Corey Ladas explains this well in his Scrumban article and . ScrumBan is a combination of Scrum and Kanban, which is highly efficient for teams that perform product support and maintenance work,82308230 Development Processes amp Tools8230 Introduction This document gives a general overvie8230 your article is Great. Is there a way of buying the Scrumban book in a DRM-free version for Kindle (I assume that the one sold in Amazon is DRM) . Scrum-ban by Lean Software Engineering . . looked around at what others have done. On my ScrumMaster training I was introduced to the idea of Scrum-ban (thanks to Corey Ladas), and this excellent text by Henrik Kniberg and Mattias Skarin. Both . . uns optimalen Grundlagen aus dem Scrum mit denen des Kanban-Prinzips verbunden und dies ber ein Scrum-Ban-Board . . the use of a backlog, ready, specify, complete, execute, done. Further reading can be found in this Blogpost about Scrum-ban by Corey Ladas. Possible Scrumban . I adopted Scrumban for a maintenance team which handles new features as well as bugs. Our Scrumban board comprises of To Do, Development, Testing, Deploying and Done columns. We often have bugs escalated by the support team which needs to be handled urgently. The escalations could either involve development and testing together or just testing (for verification) alone. Escalations causes the current work to either stay idle or be moved back to the To Do column (if the columns have reached the WIP limit) which then increases the lead time. I am guessing this is something normal and acceptable. Would like to hear from anyone with similar experience and how they tackle this situation. super article about scrumban. Seems to be a good approach for (software) product development. . the emergence of agile approaches, one of these firms has now adopted an agile model similar to Scrum-ban for projects that fit the agile sweet spot. For other of types projects, NANW continues to be a . I. What is Scrum A light weight frame8230

No comments:

Post a Comment